2023-12-21 10:53:09,258 INFO [train.py:953] (1/4) Training started 2023-12-21 10:53:09,258 INFO [train.py:963] (1/4) Device: cuda:1 2023-12-21 10:53:09,259 INFO [train.py:965] (1/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2b2ac14b326d61d79d04e53fbd69b1ff6d630411', 'k2-git-date': 'Thu Aug 24 05:58:26 2023', 'lhotse-version': '0.0.0+unknown.version', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'audio_tagging', 'icefall-git-sha1': 'bd01c212-dirty', 'icefall-git-date': 'Tue Dec 19 17:20:49 2023', 'icefall-path': '/star-xy/softwares/icefall_development/icefall_audio_tagging', 'k2-path': '/star-xy/softwares/k2_development/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-xy/softwares/lhotse_development/lhotse_at/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-2-1207150844-f49d8c4f4-c49d5', 'IP address': '10.177.22.19'}, 'world_size': 4, 'master_port': 13455, 'tensorboard': True, 'num_epochs': 50, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('zipformer/exp_at_as_full'), 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'num_events': 527, 'audioset_subset': 'full', 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures'} 2023-12-21 10:53:09,259 INFO [train.py:967] (1/4) About to create model 2023-12-21 10:53:14,335 INFO [train.py:971] (1/4) Number of model parameters: 64264454 2023-12-21 10:53:16,885 INFO [train.py:986] (1/4) Using DDP 2023-12-21 10:53:17,380 INFO [at_datamodule.py:398] (1/4) About to get the audioset cuts for KD. 2023-12-21 10:53:17,450 INFO [at_datamodule.py:223] (1/4) Enable MUSAN 2023-12-21 10:53:17,450 INFO [at_datamodule.py:224] (1/4) About to get Musan cuts 2023-12-21 10:53:19,861 INFO [at_datamodule.py:248] (1/4) Enable SpecAugment 2023-12-21 10:53:19,862 INFO [at_datamodule.py:249] (1/4) Time warp factor: 80 2023-12-21 10:53:19,862 INFO [at_datamodule.py:259] (1/4) Num frame mask: 10 2023-12-21 10:53:19,862 INFO [at_datamodule.py:272] (1/4) About to create train dataset 2023-12-21 10:53:19,862 INFO [at_datamodule.py:299] (1/4) Using DynamicBucketingSampler. 2023-12-21 10:53:22,237 INFO [at_datamodule.py:315] (1/4) About to create train dataloader 2023-12-21 10:53:22,239 INFO [at_datamodule.py:410] (1/4) About to get test-other cuts 2023-12-21 10:53:22,241 INFO [at_datamodule.py:346] (1/4) About to create dev dataset 2023-12-21 10:53:22,699 INFO [at_datamodule.py:363] (1/4) About to create dev dataloader 2023-12-21 10:53:49,468 INFO [train.py:886] (1/4) Epoch 1, batch 0, loss[loss=1.842, audio_tagging_loss=1.842, over 24060.00 frames. ], tot_loss[loss=1.842, audio_tagging_loss=1.842, over 24060.00 frames. ], batch size: 100, lr: 2.25e-02, grad_scale: 2.0 2023-12-21 10:53:49,468 INFO [train.py:909] (1/4) Computing validation loss 2023-12-21 10:54:14,725 INFO [train.py:917] (1/4) Epoch 1, validation: loss=1.716, audio_tagging_loss=1.716, over 3737520.00 frames. 2023-12-21 10:54:14,726 INFO [train.py:918] (1/4) Maximum memory allocated so far is 13114MB 2023-12-21 10:54:19,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=0.0, ans=0.3 2023-12-21 10:54:23,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=0.0, ans=0.1 2023-12-21 10:54:25,383 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+02 8.487e+02 9.999e+02 1.363e+03 1.706e+03, threshold=4.000e+03, percent-clipped=0.0 2023-12-21 10:54:28,144 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=15.31 vs. limit=4.026666666666666 2023-12-21 10:54:29,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=66.66666666666667, ans=0.1975 2023-12-21 10:54:36,996 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.136e+01 2.542e+02 7.819e+02 1.187e+03 1.783e+03, threshold=3.128e+03, percent-clipped=0.0 2023-12-21 10:54:41,000 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=169.66 vs. limit=4.053333333333334 2023-12-21 10:55:01,181 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.709e+01 9.013e+01 2.542e+02 8.019e+02 1.783e+03, threshold=1.017e+03, percent-clipped=0.0 2023-12-21 10:55:02,819 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=24.19 vs. limit=3.04 2023-12-21 10:55:05,138 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=510.59 vs. limit=7.7 2023-12-21 10:55:05,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=266.6666666666667, ans=0.29733333333333334 2023-12-21 10:55:08,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=266.6666666666667, ans=0.19 2023-12-21 10:55:08,584 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=449.90 vs. limit=7.6 2023-12-21 10:55:13,427 INFO [train.py:886] (1/4) Epoch 1, batch 50, loss[loss=0.04521, audio_tagging_loss=0.04521, over 25000.00 frames. ], tot_loss[loss=0.2979, audio_tagging_loss=0.2979, over 1112899.92 frames. ], batch size: 100, lr: 2.48e-02, grad_scale: 2.0 2023-12-21 10:55:15,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=333.3333333333333, ans=0.484375 2023-12-21 10:55:19,840 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=480.23 vs. limit=7.625 2023-12-21 10:55:27,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=26.32 vs. limit=4.16 2023-12-21 10:55:28,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=400.0, ans=7.65 2023-12-21 10:55:38,503 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=355.76 vs. limit=7.85 2023-12-21 10:55:45,533 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.22 vs. limit=3.07 2023-12-21 10:55:53,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=533.3333333333334, ans=7.7 2023-12-21 10:55:57,662 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=228.94 vs. limit=7.7 2023-12-21 10:56:07,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=600.0, ans=0.471875 2023-12-21 10:56:08,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=600.0, ans=0.294 2023-12-21 10:56:15,146 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.696e+01 2.804e+01 4.984e+01 1.709e+02 1.783e+03, threshold=9.968e+01, percent-clipped=0.0 2023-12-21 10:56:15,173 INFO [train.py:886] (1/4) Epoch 1, batch 100, loss[loss=0.03049, audio_tagging_loss=0.03049, over 25000.00 frames. ], tot_loss[loss=0.153, audio_tagging_loss=0.153, over 1971753.74 frames. ], batch size: 100, lr: 2.70e-02, grad_scale: 4.0 2023-12-21 10:56:30,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=733.3333333333334, ans=0.7573333333333333 2023-12-21 10:56:32,888 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=43.17 vs. limit=5.183333333333334 2023-12-21 10:56:39,248 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=15.10 vs. limit=5.4 2023-12-21 10:56:44,724 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=286.10 vs. limit=7.8 2023-12-21 10:56:48,311 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=66.97 vs. limit=4.16 2023-12-21 10:56:55,072 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=174.25 vs. limit=8.15 2023-12-21 10:56:57,109 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=201.38 vs. limit=7.825 2023-12-21 10:56:57,209 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=196.72 vs. limit=5.433333333333334 2023-12-21 10:57:03,553 WARNING [optim.py:500] (1/4) Scaling gradients by 0.07864928245544434, model_norm_threshold=99.68033599853516 2023-12-21 10:57:03,712 WARNING [optim.py:572] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.7.weight with proportion 0.45, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.166e+05, grad_sumsq=5.753e+08, orig_rms_sq=1.246e-03 2023-12-21 10:57:07,207 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=27.32 vs. limit=5.233333333333333 2023-12-21 10:57:10,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=933.3333333333334, ans=0.09416666666666668 2023-12-21 10:57:12,046 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=152.29 vs. limit=5.466666666666667 2023-12-21 10:57:15,678 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=44.77 vs. limit=7.875 2023-12-21 10:57:16,098 INFO [train.py:886] (1/4) Epoch 1, batch 150, loss[loss=0.03588, audio_tagging_loss=0.03588, over 21954.00 frames. ], tot_loss[loss=0.1044, audio_tagging_loss=0.1044, over 2629663.09 frames. ], batch size: 107, lr: 2.93e-02, grad_scale: 2.0 2023-12-21 10:57:31,312 WARNING [optim.py:500] (1/4) Scaling gradients by 0.0951763167977333, model_norm_threshold=99.68033599853516 2023-12-21 10:57:31,472 WARNING [optim.py:572] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.7.weight with proportion 0.44, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.792e+05, grad_sumsq=3.739e+08, orig_rms_sq=1.282e-03 2023-12-21 10:57:40,180 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.51 vs. limit=4.453333333333333 2023-12-21 10:57:41,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=109.78 vs. limit=7.925 2023-12-21 10:57:45,490 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=110.17 vs. limit=7.925 2023-12-21 10:57:51,127 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=213.76 vs. limit=7.925 2023-12-21 10:57:51,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=103.47 vs. limit=7.925 2023-12-21 10:57:51,274 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=21.47 vs. limit=4.453333333333333 2023-12-21 10:57:54,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=1200.0, ans=0.04625 2023-12-21 10:58:13,523 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=202.67 vs. limit=7.975 2023-12-21 10:58:14,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1266.6666666666667, ans=5.791666666666667 2023-12-21 10:58:15,718 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=130.80 vs. limit=5.633333333333334 2023-12-21 10:58:18,036 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=71.15 vs. limit=8.0 2023-12-21 10:58:19,091 INFO [train.py:886] (1/4) Epoch 1, batch 200, loss[loss=0.02714, audio_tagging_loss=0.02714, over 24022.00 frames. ], tot_loss[loss=0.07968, audio_tagging_loss=0.07968, over 3142839.66 frames. ], batch size: 100, lr: 3.15e-02, grad_scale: 4.0 2023-12-21 10:58:19,761 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=66.28 vs. limit=8.0 2023-12-21 10:58:20,193 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.593e+01 2.983e+01 3.603e+01 1.267e+03, threshold=5.966e+01, percent-clipped=10.0 2023-12-21 10:58:23,110 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=184.61 vs. limit=8.0 2023-12-21 10:58:29,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1333.3333333333333, ans=0.33333333333333337 2023-12-21 10:58:39,565 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=413.32 vs. limit=8.025 2023-12-21 10:58:40,676 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=21.32 vs. limit=5.35 2023-12-21 10:58:45,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1466.6666666666667, ans=0.31666666666666665 2023-12-21 10:58:49,712 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=44.15 vs. limit=8.05 2023-12-21 10:58:51,986 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.76 vs. limit=4.586666666666667 2023-12-21 10:58:58,915 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.48 vs. limit=4.613333333333333 2023-12-21 10:58:59,840 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=231.30 vs. limit=8.075 2023-12-21 10:59:02,118 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=81.60 vs. limit=8.075 2023-12-21 10:59:02,487 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=38.26 vs. limit=4.306666666666667 2023-12-21 10:59:08,837 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=169.89 vs. limit=8.7 2023-12-21 10:59:10,888 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=218.82 vs. limit=8.1 2023-12-21 10:59:13,339 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=83.70 vs. limit=8.1 2023-12-21 10:59:14,653 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=182.27 vs. limit=8.7 2023-12-21 10:59:21,582 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.42 vs. limit=8.125 2023-12-21 10:59:22,004 INFO [train.py:886] (1/4) Epoch 1, batch 250, loss[loss=0.03285, audio_tagging_loss=0.03285, over 25000.00 frames. ], tot_loss[loss=0.06461, audio_tagging_loss=0.06461, over 3549368.00 frames. ], batch size: 100, lr: 3.38e-02, grad_scale: 2.0 2023-12-21 10:59:24,998 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.97 vs. limit=4.666666666666667 2023-12-21 10:59:28,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1666.6666666666667, ans=0.421875 2023-12-21 10:59:29,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1666.6666666666667, ans=0.421875 2023-12-21 10:59:34,366 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=26.35 vs. limit=8.15 2023-12-21 10:59:37,729 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=68.07 vs. limit=8.15 2023-12-21 10:59:39,009 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=40.07 vs. limit=8.15 2023-12-21 10:59:42,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=1733.3333333333333, ans=6.083333333333333 2023-12-21 10:59:45,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1800.0, ans=0.275 2023-12-21 10:59:46,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1800.0, ans=0.0595 2023-12-21 10:59:50,191 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=70.67 vs. limit=5.9 2023-12-21 10:59:51,386 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=15.28 vs. limit=5.45 2023-12-21 10:59:52,530 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=105.20 vs. limit=8.175 2023-12-21 10:59:57,441 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.90 vs. limit=5.45 2023-12-21 11:00:00,357 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.05 vs. limit=4.746666666666667 2023-12-21 11:00:02,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1866.6666666666667, ans=0.4125 2023-12-21 11:00:02,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1866.6666666666667, ans=0.26666666666666666 2023-12-21 11:00:03,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1866.6666666666667, ans=0.8346666666666667 2023-12-21 11:00:06,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1866.6666666666667, ans=0.2813333333333333 2023-12-21 11:00:10,873 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.22 vs. limit=5.966666666666667 2023-12-21 11:00:14,049 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=51.26 vs. limit=8.225 2023-12-21 11:00:17,548 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=49.70 vs. limit=8.225 2023-12-21 11:00:20,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=1933.3333333333333, ans=8.95 2023-12-21 11:00:21,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1933.3333333333333, ans=0.409375 2023-12-21 11:00:23,439 INFO [train.py:886] (1/4) Epoch 1, batch 300, loss[loss=0.02955, audio_tagging_loss=0.02955, over 24750.00 frames. ], tot_loss[loss=0.05511, audio_tagging_loss=0.05511, over 3862112.50 frames. ], batch size: 99, lr: 3.60e-02, grad_scale: 4.0 2023-12-21 11:00:25,086 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=53.33 vs. limit=8.25 2023-12-21 11:00:25,741 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+01 2.506e+01 2.969e+01 4.379e+01 2.139e+02, threshold=5.939e+01, percent-clipped=11.0 2023-12-21 11:00:33,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2000.0, ans=0.055 2023-12-21 11:00:34,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2000.0, ans=0.40625 2023-12-21 11:00:38,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2066.6666666666665, ans=0.08708333333333335 2023-12-21 11:00:54,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2133.3333333333335, ans=0.12 2023-12-21 11:00:59,021 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=57.63 vs. limit=9.1 2023-12-21 11:01:02,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2200.0, ans=0.396875 2023-12-21 11:01:03,863 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=40.67 vs. limit=8.325 2023-12-21 11:01:08,957 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=109.40 vs. limit=9.15 2023-12-21 11:01:24,017 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=111.10 vs. limit=8.35 2023-12-21 11:01:25,857 INFO [train.py:886] (1/4) Epoch 1, batch 350, loss[loss=0.02701, audio_tagging_loss=0.02701, over 25000.00 frames. ], tot_loss[loss=0.04839, audio_tagging_loss=0.04839, over 4097026.65 frames. ], batch size: 100, lr: 3.83e-02, grad_scale: 4.0 2023-12-21 11:01:26,273 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=133.51 vs. limit=8.375 2023-12-21 11:01:26,430 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=62.58 vs. limit=8.375 2023-12-21 11:01:27,409 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=101.07 vs. limit=8.375 2023-12-21 11:01:29,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=46.27 vs. limit=9.25 2023-12-21 11:01:31,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2333.3333333333335, ans=0.390625 2023-12-21 11:01:37,540 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=39.69 vs. limit=9.3 2023-12-21 11:01:37,695 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=68.67 vs. limit=9.3 2023-12-21 11:01:39,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2400.0, ans=0.3875 2023-12-21 11:01:42,028 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=143.53 vs. limit=8.4 2023-12-21 11:01:49,746 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=265.54 vs. limit=8.4 2023-12-21 11:01:53,382 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.51 vs. limit=4.986666666666666 2023-12-21 11:01:58,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2466.6666666666665, ans=0.8136666666666666 2023-12-21 11:02:03,490 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=28.99 vs. limit=8.45 2023-12-21 11:02:06,801 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=45.67 vs. limit=8.45 2023-12-21 11:02:08,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=49.41 vs. limit=8.45 2023-12-21 11:02:15,622 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.66 vs. limit=5.04 2023-12-21 11:02:17,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=2600.0, ans=0.041875 2023-12-21 11:02:26,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2600.0, ans=0.041499999999999995 2023-12-21 11:02:28,578 INFO [train.py:886] (1/4) Epoch 1, batch 400, loss[loss=0.0298, audio_tagging_loss=0.0298, over 22354.00 frames. ], tot_loss[loss=0.04281, audio_tagging_loss=0.04281, over 4288887.50 frames. ], batch size: 107, lr: 4.05e-02, grad_scale: 8.0 2023-12-21 11:02:28,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2666.6666666666665, ans=0.04166666666666667 2023-12-21 11:02:30,850 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.972e+01 3.451e+01 4.422e+01 2.511e+02, threshold=6.902e+01, percent-clipped=7.0 2023-12-21 11:02:42,822 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=20.31 vs. limit=8.525 2023-12-21 11:02:44,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2733.3333333333335, ans=0.371875 2023-12-21 11:02:51,470 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.32 vs. limit=5.7 2023-12-21 11:02:53,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2800.0, ans=0.36875 2023-12-21 11:02:53,747 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=73.15 vs. limit=8.55 2023-12-21 11:02:58,694 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.070e+00 2023-12-21 11:02:59,033 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.11 vs. limit=5.7 2023-12-21 11:02:59,160 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=26.32 vs. limit=9.6 2023-12-21 11:03:03,999 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.87 vs. limit=8.55 2023-12-21 11:03:04,828 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=85.10 vs. limit=8.575 2023-12-21 11:03:09,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2866.6666666666665, ans=0.2713333333333333 2023-12-21 11:03:09,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=2866.6666666666665, ans=9.65 2023-12-21 11:03:12,120 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=151.22 vs. limit=8.575 2023-12-21 11:03:19,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2933.3333333333335, ans=0.3625 2023-12-21 11:03:24,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=122.23 vs. limit=8.6 2023-12-21 11:03:24,500 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.73 vs. limit=9.7 2023-12-21 11:03:25,481 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=80.00 vs. limit=8.6 2023-12-21 11:03:26,898 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.57 vs. limit=6.466666666666667 2023-12-21 11:03:29,119 INFO [train.py:886] (1/4) Epoch 1, batch 450, loss[loss=0.02352, audio_tagging_loss=0.02352, over 25000.00 frames. ], tot_loss[loss=0.03879, audio_tagging_loss=0.03879, over 4436292.09 frames. ], batch size: 100, lr: 4.28e-02, grad_scale: 8.0 2023-12-21 11:03:29,675 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=30.92 vs. limit=9.75 2023-12-21 11:03:37,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=39.28 vs. limit=8.625 2023-12-21 11:03:37,964 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=30.94 vs. limit=8.625 2023-12-21 11:03:39,093 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=25.48 vs. limit=9.75 2023-12-21 11:03:42,775 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=72.57 vs. limit=8.65 2023-12-21 11:03:43,073 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.71 vs. limit=8.65 2023-12-21 11:03:48,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3066.6666666666665, ans=0.246 2023-12-21 11:03:54,185 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=60.07 vs. limit=8.675 2023-12-21 11:03:55,036 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=54.87 vs. limit=8.675 2023-12-21 11:04:03,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3200.0, ans=0.08 2023-12-21 11:04:05,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3200.0, ans=0.35 2023-12-21 11:04:07,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3200.0, ans=0.09999999999999998 2023-12-21 11:04:11,177 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.32 vs. limit=8.7 2023-12-21 11:04:22,173 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=5.306666666666667 2023-12-21 11:04:29,729 INFO [train.py:886] (1/4) Epoch 1, batch 500, loss[loss=0.0269, audio_tagging_loss=0.0269, over 25000.00 frames. ], tot_loss[loss=0.03561, audio_tagging_loss=0.03561, over 4553317.18 frames. ], batch size: 100, lr: 4.49e-02, grad_scale: 8.0 2023-12-21 11:04:30,205 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.10 vs. limit=8.75 2023-12-21 11:04:31,971 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.916e+01 3.389e+01 4.125e+01 8.969e+01, threshold=6.779e+01, percent-clipped=3.0 2023-12-21 11:04:32,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3333.3333333333335, ans=0.09899494936611666 2023-12-21 11:04:36,299 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.36 vs. limit=10.0 2023-12-21 11:04:37,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3333.3333333333335, ans=0.08333333333333331 2023-12-21 11:04:42,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3400.0, ans=0.340625 2023-12-21 11:04:58,727 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.85 vs. limit=10.1 2023-12-21 11:05:02,217 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.65 vs. limit=10.1 2023-12-21 11:05:05,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3533.3333333333335, ans=0.7763333333333333 2023-12-21 11:05:08,900 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.09 vs. limit=6.766666666666667 2023-12-21 11:05:12,352 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.63 vs. limit=8.825 2023-12-21 11:05:21,079 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.92 vs. limit=5.4399999999999995 2023-12-21 11:05:29,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3666.6666666666665, ans=0.328125 2023-12-21 11:05:31,204 INFO [train.py:886] (1/4) Epoch 1, batch 550, loss[loss=0.02647, audio_tagging_loss=0.02647, over 25000.00 frames. ], tot_loss[loss=0.0334, audio_tagging_loss=0.0334, over 4641236.60 frames. ], batch size: 100, lr: 4.49e-02, grad_scale: 8.0 2023-12-21 11:05:35,566 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=7.44 vs. limit=5.0 2023-12-21 11:05:57,351 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=87.85 vs. limit=8.925 2023-12-21 11:05:58,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3800.0, ans=0.321875 2023-12-21 11:06:01,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3800.0, ans=0.262 2023-12-21 11:06:03,221 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=17.70 vs. limit=8.925 2023-12-21 11:06:04,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=23.12 vs. limit=6.9 2023-12-21 11:06:17,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3933.3333333333335, ans=8.975 2023-12-21 11:06:18,727 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.72 vs. limit=10.45 2023-12-21 11:06:29,519 INFO [train.py:886] (1/4) Epoch 1, batch 600, loss[loss=0.02855, audio_tagging_loss=0.02855, over 24750.00 frames. ], tot_loss[loss=0.03183, audio_tagging_loss=0.03183, over 4709792.59 frames. ], batch size: 99, lr: 4.49e-02, grad_scale: 8.0 2023-12-21 11:06:29,974 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.98 vs. limit=9.0 2023-12-21 11:06:31,738 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.295e+01 3.660e+01 5.036e+01 7.229e+01 1.228e+02, threshold=1.007e+02, percent-clipped=27.0 2023-12-21 11:06:32,224 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=48.21 vs. limit=9.0 2023-12-21 11:06:32,356 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.27 vs. limit=10.5 2023-12-21 11:06:54,892 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.58 vs. limit=7.066666666666666 2023-12-21 11:06:59,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4133.333333333333, ans=0.009971014492753623 2023-12-21 11:07:00,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=4133.333333333333, ans=9.05 2023-12-21 11:07:04,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4200.0, ans=0.04916666666666667 2023-12-21 11:07:05,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=67.52 vs. limit=9.075 2023-12-21 11:07:09,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4200.0, ans=0.753 2023-12-21 11:07:10,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=4200.0, ans=9.075 2023-12-21 11:07:12,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.66 vs. limit=9.075 2023-12-21 11:07:16,204 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=9.1 2023-12-21 11:07:25,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=66.58 vs. limit=9.1 2023-12-21 11:07:27,741 INFO [train.py:886] (1/4) Epoch 1, batch 650, loss[loss=0.02491, audio_tagging_loss=0.02491, over 24750.00 frames. ], tot_loss[loss=0.03046, audio_tagging_loss=0.03046, over 4760044.90 frames. ], batch size: 99, lr: 4.49e-02, grad_scale: 8.0 2023-12-21 11:07:30,456 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=55.96 vs. limit=9.125 2023-12-21 11:07:42,095 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.58 vs. limit=9.15 2023-12-21 11:07:54,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4466.666666666667, ans=0.290625 2023-12-21 11:07:56,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4466.666666666667, ans=0.04805555555555556 2023-12-21 11:07:57,796 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=57.83 vs. limit=9.175 2023-12-21 11:08:09,616 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.15 vs. limit=10.9 2023-12-21 11:08:12,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4600.0, ans=0.0475 2023-12-21 11:08:26,817 INFO [train.py:886] (1/4) Epoch 1, batch 700, loss[loss=0.02268, audio_tagging_loss=0.02268, over 24750.00 frames. ], tot_loss[loss=0.02923, audio_tagging_loss=0.02923, over 4797140.62 frames. ], batch size: 99, lr: 4.49e-02, grad_scale: 8.0 2023-12-21 11:08:28,283 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=64.48 vs. limit=9.25 2023-12-21 11:08:28,959 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.878e+01 4.940e+01 6.035e+01 7.817e+01 1.849e+02, threshold=1.207e+02, percent-clipped=12.0 2023-12-21 11:08:31,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.92 vs. limit=6.166666666666667 2023-12-21 11:08:33,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=4666.666666666667, ans=11.0 2023-12-21 11:08:44,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4733.333333333333, ans=0.009840579710144928 2023-12-21 11:08:45,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=4733.333333333333, ans=9.275 2023-12-21 11:08:49,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4800.0, ans=0.275 2023-12-21 11:09:01,204 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=38.28 vs. limit=11.15 2023-12-21 11:09:09,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4866.666666666667, ans=8.041666666666668 2023-12-21 11:09:11,838 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.98 vs. limit=9.35 2023-12-21 11:09:18,709 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.79 vs. limit=4.986666666666666 2023-12-21 11:09:22,054 INFO [train.py:886] (1/4) Epoch 1, batch 750, loss[loss=0.02331, audio_tagging_loss=0.02331, over 25000.00 frames. ], tot_loss[loss=0.02811, audio_tagging_loss=0.02811, over 4822174.31 frames. ], batch size: 100, lr: 4.49e-02, grad_scale: 8.0 2023-12-21 11:09:29,793 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.63 vs. limit=9.375 2023-12-21 11:09:32,170 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=61.89 vs. limit=9.375 2023-12-21 11:09:33,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=26.17 vs. limit=9.4 2023-12-21 11:09:38,278 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=25.34 vs. limit=11.3 2023-12-21 11:09:39,317 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=60.67 vs. limit=9.4 2023-12-21 11:09:40,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=5066.666666666667, ans=0.7226666666666667 2023-12-21 11:09:43,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=5066.666666666667, ans=0.7226666666666667 2023-12-21 11:09:47,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5133.333333333333, ans=0.259375 2023-12-21 11:09:57,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=5200.0, ans=0.03375 2023-12-21 11:09:59,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5200.0, ans=0.248 2023-12-21 11:10:11,803 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=37.76 vs. limit=9.475 2023-12-21 11:10:19,873 INFO [train.py:886] (1/4) Epoch 1, batch 800, loss[loss=0.02449, audio_tagging_loss=0.02449, over 25000.00 frames. ], tot_loss[loss=0.02718, audio_tagging_loss=0.02718, over 4853834.23 frames. ], batch size: 100, lr: 4.49e-02, grad_scale: 16.0 2023-12-21 11:10:22,010 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.496e+01 3.313e+01 4.173e+01 5.276e+01 1.022e+02, threshold=8.346e+01, percent-clipped=0.0 2023-12-21 11:10:31,049 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.19 vs. limit=9.525 2023-12-21 11:10:32,401 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=3.81 2023-12-21 11:10:38,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5400.0, ans=0.246 2023-12-21 11:10:47,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5466.666666666667, ans=0.24533333333333332 2023-12-21 11:10:55,370 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.10 vs. limit=7.766666666666667 2023-12-21 11:11:02,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=5533.333333333333, ans=0.7063333333333334 2023-12-21 11:11:16,880 INFO [train.py:886] (1/4) Epoch 1, batch 850, loss[loss=0.02501, audio_tagging_loss=0.02501, over 25000.00 frames. ], tot_loss[loss=0.02653, audio_tagging_loss=0.02653, over 4877115.73 frames. ], batch size: 100, lr: 4.49e-02, grad_scale: 16.0 2023-12-21 11:11:21,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5666.666666666667, ans=0.234375 2023-12-21 11:11:23,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=5666.666666666667, ans=0.07 2023-12-21 11:11:26,185 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.48 vs. limit=9.625 2023-12-21 11:11:27,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.43 vs. limit=9.625 2023-12-21 11:11:48,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=5800.0, ans=0.009608695652173913 2023-12-21 11:11:52,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=5866.666666666667, ans=0.009594202898550725 2023-12-21 11:11:54,048 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.56 vs. limit=11.9 2023-12-21 11:12:04,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=5933.333333333333, ans=8.708333333333332 2023-12-21 11:12:07,159 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.10 vs. limit=9.725 2023-12-21 11:12:11,337 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=20.02 vs. limit=9.725 2023-12-21 11:12:12,831 INFO [train.py:886] (1/4) Epoch 1, batch 900, loss[loss=0.02047, audio_tagging_loss=0.02047, over 24750.00 frames. ], tot_loss[loss=0.02607, audio_tagging_loss=0.02607, over 4896254.77 frames. ], batch size: 99, lr: 4.48e-02, grad_scale: 16.0 2023-12-21 11:12:14,798 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.371e+01 3.238e+01 4.010e+01 4.970e+01 2.854e+02, threshold=8.021e+01, percent-clipped=5.0 2023-12-21 11:12:17,222 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.29 vs. limit=9.75 2023-12-21 11:12:23,083 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=41.74 vs. limit=9.775 2023-12-21 11:12:27,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=6066.666666666667, ans=0.0 2023-12-21 11:12:30,780 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.60 vs. limit=12.05 2023-12-21 11:12:39,662 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=26.06 vs. limit=12.1 2023-12-21 11:12:47,943 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=38.42 vs. limit=9.825 2023-12-21 11:12:52,172 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.36 vs. limit=9.825 2023-12-21 11:12:57,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=6266.666666666667, ans=0.20625 2023-12-21 11:12:57,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=22.97 vs. limit=9.85 2023-12-21 11:13:00,005 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.03 vs. limit=5.253333333333334 2023-12-21 11:13:09,786 INFO [train.py:886] (1/4) Epoch 1, batch 950, loss[loss=0.02915, audio_tagging_loss=0.02915, over 24750.00 frames. ], tot_loss[loss=0.0259, audio_tagging_loss=0.0259, over 4905181.89 frames. ], batch size: 99, lr: 4.48e-02, grad_scale: 16.0 2023-12-21 11:13:23,605 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=24.45 vs. limit=9.9 2023-12-21 11:13:25,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=6400.0, ans=0.2 2023-12-21 11:13:38,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.73 vs. limit=9.925 2023-12-21 11:13:48,890 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=36.04 vs. limit=9.95 2023-12-21 11:13:53,953 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.23 vs. limit=8.3 2023-12-21 11:13:57,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=6600.0, ans=0.23399999999999999 2023-12-21 11:14:00,726 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=35.65 vs. limit=12.45 2023-12-21 11:14:04,973 INFO [train.py:886] (1/4) Epoch 1, batch 1000, loss[loss=0.01957, audio_tagging_loss=0.01957, over 25000.00 frames. ], tot_loss[loss=0.0254, audio_tagging_loss=0.0254, over 4909776.86 frames. ], batch size: 100, lr: 4.48e-02, grad_scale: 16.0 2023-12-21 11:14:07,686 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.029e+01 2.858e+01 3.323e+01 3.988e+01 7.077e+01, threshold=6.647e+01, percent-clipped=0.0 2023-12-21 11:14:09,113 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.61 vs. limit=8.333333333333334 2023-12-21 11:14:19,076 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=21.50 vs. limit=10.025 2023-12-21 11:14:38,423 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=23.97 vs. limit=10.075 2023-12-21 11:14:46,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=6866.666666666667, ans=0.0093768115942029 2023-12-21 11:14:56,048 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=18.86 vs. limit=10.1 2023-12-21 11:14:59,171 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.03 vs. limit=10.125 2023-12-21 11:14:59,655 INFO [train.py:886] (1/4) Epoch 1, batch 1050, loss[loss=0.0224, audio_tagging_loss=0.0224, over 25000.00 frames. ], tot_loss[loss=0.02481, audio_tagging_loss=0.02481, over 4918379.78 frames. ], batch size: 100, lr: 4.48e-02, grad_scale: 16.0 2023-12-21 11:15:00,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=7000.0, ans=0.171875 2023-12-21 11:15:03,231 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=19.54 vs. limit=10.125 2023-12-21 11:15:05,229 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.52 vs. limit=12.75 2023-12-21 11:15:10,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=7066.666666666667, ans=0.037222222222222226 2023-12-21 11:15:18,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=7066.666666666667, ans=0.6526666666666667 2023-12-21 11:15:20,262 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=30.00 vs. limit=10.15 2023-12-21 11:15:26,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=7133.333333333333, ans=0.009318840579710145 2023-12-21 11:15:31,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=7200.0, ans=0.22799999999999998 2023-12-21 11:15:53,168 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.86 vs. limit=4.09 2023-12-21 11:15:54,681 INFO [train.py:886] (1/4) Epoch 1, batch 1100, loss[loss=0.02137, audio_tagging_loss=0.02137, over 25000.00 frames. ], tot_loss[loss=0.0243, audio_tagging_loss=0.0243, over 4928431.65 frames. ], batch size: 100, lr: 4.48e-02, grad_scale: 16.0 2023-12-21 11:15:56,242 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.60 vs. limit=13.0 2023-12-21 11:15:56,647 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.590e+01 3.009e+01 3.352e+01 1.810e+02, threshold=6.019e+01, percent-clipped=1.0 2023-12-21 11:16:00,952 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.19 vs. limit=10.25 2023-12-21 11:16:10,803 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=17.44 vs. limit=10.275 2023-12-21 11:16:11,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=7400.0, ans=0.153125 2023-12-21 11:16:20,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=7466.666666666667, ans=0.15000000000000002 2023-12-21 11:16:23,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=7466.666666666667, ans=0.312 2023-12-21 11:16:24,090 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=20.10 vs. limit=10.3 2023-12-21 11:16:25,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=7466.666666666667, ans=0.22533333333333333 2023-12-21 11:16:30,005 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.24 vs. limit=6.883333333333333 2023-12-21 11:16:36,717 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.49 vs. limit=8.766666666666666 2023-12-21 11:16:41,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=7600.0, ans=0.14375 2023-12-21 11:16:42,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=7600.0, ans=0.14375 2023-12-21 11:16:46,601 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.45 vs. limit=13.2 2023-12-21 11:16:48,212 INFO [train.py:886] (1/4) Epoch 1, batch 1150, loss[loss=0.02319, audio_tagging_loss=0.02319, over 25000.00 frames. ], tot_loss[loss=0.02405, audio_tagging_loss=0.02405, over 4931259.95 frames. ], batch size: 100, lr: 4.47e-02, grad_scale: 16.0 2023-12-21 11:16:55,756 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=21.61 vs. limit=10.375 2023-12-21 11:16:59,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=7733.333333333333, ans=0.1375 2023-12-21 11:17:04,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=7733.333333333333, ans=0.1375 2023-12-21 11:17:05,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=7733.333333333333, ans=0.12846666666666667 2023-12-21 11:17:13,356 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=13.35 2023-12-21 11:17:17,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=7800.0, ans=0.009173913043478261 2023-12-21 11:17:18,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=7800.0, ans=0.222 2023-12-21 11:17:33,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=7933.333333333333, ans=0.128125 2023-12-21 11:17:40,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=7933.333333333333, ans=0.22066666666666668 2023-12-21 11:17:43,361 INFO [train.py:886] (1/4) Epoch 1, batch 1200, loss[loss=0.02428, audio_tagging_loss=0.02428, over 25000.00 frames. ], tot_loss[loss=0.02368, audio_tagging_loss=0.02368, over 4940074.45 frames. ], batch size: 100, lr: 4.47e-02, grad_scale: 32.0 2023-12-21 11:17:45,264 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.423e+01 2.660e+01 3.185e+01 5.087e+01, threshold=5.319e+01, percent-clipped=0.0 2023-12-21 11:17:49,476 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.200e+00 2023-12-21 11:18:00,691 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=42.56 vs. limit=10.525 2023-12-21 11:18:03,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=8066.666666666667, ans=0.321 2023-12-21 11:18:10,371 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.74 vs. limit=13.6 2023-12-21 11:18:19,044 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.11 vs. limit=10.575 2023-12-21 11:18:24,954 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.67 vs. limit=7.066666666666666 2023-12-21 11:18:37,384 INFO [train.py:886] (1/4) Epoch 1, batch 1250, loss[loss=0.02634, audio_tagging_loss=0.02634, over 24750.00 frames. ], tot_loss[loss=0.02369, audio_tagging_loss=0.02369, over 4945258.27 frames. ], batch size: 99, lr: 4.47e-02, grad_scale: 32.0 2023-12-21 11:18:56,148 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.30 vs. limit=7.1 2023-12-21 11:19:01,877 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.33 vs. limit=10.675 2023-12-21 11:19:03,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=8466.666666666666, ans=0.0 2023-12-21 11:19:07,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=8466.666666666666, ans=0.0313888888888889 2023-12-21 11:19:20,276 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.49 vs. limit=10.725 2023-12-21 11:19:30,115 INFO [train.py:886] (1/4) Epoch 1, batch 1300, loss[loss=0.02274, audio_tagging_loss=0.02274, over 24750.00 frames. ], tot_loss[loss=0.02353, audio_tagging_loss=0.02353, over 4940758.65 frames. ], batch size: 99, lr: 4.47e-02, grad_scale: 32.0 2023-12-21 11:19:32,768 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.751e+01 2.283e+01 2.585e+01 3.133e+01 4.200e+01, threshold=5.169e+01, percent-clipped=0.0 2023-12-21 11:19:42,705 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.60 vs. limit=10.775 2023-12-21 11:19:44,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=8733.333333333334, ans=0.125 2023-12-21 11:19:46,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=8733.333333333334, ans=0.125 2023-12-21 11:19:51,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=8800.0, ans=0.212 2023-12-21 11:19:52,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=8800.0, ans=0.008956521739130436 2023-12-21 11:20:02,771 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.71 vs. limit=7.546666666666667 2023-12-21 11:20:09,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8866.666666666666, ans=0.21133333333333332 2023-12-21 11:20:24,102 INFO [train.py:886] (1/4) Epoch 1, batch 1350, loss[loss=0.01867, audio_tagging_loss=0.01867, over 25000.00 frames. ], tot_loss[loss=0.02316, audio_tagging_loss=0.02316, over 4936970.40 frames. ], batch size: 100, lr: 4.46e-02, grad_scale: 32.0 2023-12-21 11:20:27,748 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.61 vs. limit=14.25 2023-12-21 11:20:32,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=9000.0, ans=0.125 2023-12-21 11:20:55,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=9200.0, ans=0.125 2023-12-21 11:21:00,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=9200.0, ans=0.5780000000000001 2023-12-21 11:21:03,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=9200.0, ans=0.125 2023-12-21 11:21:03,187 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=9.638e+00 2023-12-21 11:21:05,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=9200.0, ans=0.125 2023-12-21 11:21:08,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=9266.666666666666, ans=0.20733333333333334 2023-12-21 11:21:11,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=26.96 vs. limit=10.975 2023-12-21 11:21:12,740 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.22 vs. limit=7.316666666666666 2023-12-21 11:21:16,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=9266.666666666666, ans=0.125 2023-12-21 11:21:18,837 INFO [train.py:886] (1/4) Epoch 1, batch 1400, loss[loss=0.02203, audio_tagging_loss=0.02203, over 25000.00 frames. ], tot_loss[loss=0.02287, audio_tagging_loss=0.02287, over 4936507.89 frames. ], batch size: 100, lr: 4.46e-02, grad_scale: 32.0 2023-12-21 11:21:20,784 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.552e+01 2.157e+01 2.468e+01 2.846e+01 4.252e+01, threshold=4.936e+01, percent-clipped=0.0 2023-12-21 11:21:25,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=9333.333333333334, ans=0.125 2023-12-21 11:21:27,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=9333.333333333334, ans=0.125 2023-12-21 11:21:32,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=9400.0, ans=0.571 2023-12-21 11:21:34,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=9400.0, ans=0.125 2023-12-21 11:21:50,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=9533.333333333334, ans=0.125 2023-12-21 11:21:53,449 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.82 vs. limit=14.65 2023-12-21 11:22:03,471 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.78 vs. limit=11.1 2023-12-21 11:22:03,984 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.310e+01 2023-12-21 11:22:09,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=9666.666666666666, ans=0.125 2023-12-21 11:22:11,322 INFO [train.py:886] (1/4) Epoch 1, batch 1450, loss[loss=0.02342, audio_tagging_loss=0.02342, over 24068.00 frames. ], tot_loss[loss=0.02256, audio_tagging_loss=0.02256, over 4941956.95 frames. ], batch size: 100, lr: 4.46e-02, grad_scale: 32.0 2023-12-21 11:22:20,956 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.00 vs. limit=14.75 2023-12-21 11:22:24,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.37 vs. limit=11.15 2023-12-21 11:22:46,542 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.845e+00 2023-12-21 11:22:49,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=9866.666666666666, ans=0.20133333333333334 2023-12-21 11:22:57,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=9933.333333333334, ans=0.125 2023-12-21 11:22:58,833 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.84 vs. limit=11.225 2023-12-21 11:23:04,955 INFO [train.py:886] (1/4) Epoch 1, batch 1500, loss[loss=0.02158, audio_tagging_loss=0.02158, over 25000.00 frames. ], tot_loss[loss=0.02252, audio_tagging_loss=0.02252, over 4946744.87 frames. ], batch size: 100, lr: 4.46e-02, grad_scale: 32.0 2023-12-21 11:23:06,867 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.537e+01 2.171e+01 2.518e+01 2.989e+01 4.492e+01, threshold=5.036e+01, percent-clipped=0.0 2023-12-21 11:23:07,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=10000.0, ans=0.125 2023-12-21 11:23:20,252 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.52 vs. limit=11.275 2023-12-21 11:23:24,066 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=11.275 2023-12-21 11:23:24,083 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.30 vs. limit=11.275 2023-12-21 11:23:29,246 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.39 vs. limit=11.3 2023-12-21 11:23:30,147 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=17.06 vs. limit=10.066666666666666 2023-12-21 11:23:30,281 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.32 vs. limit=15.1 2023-12-21 11:23:37,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=10200.0, ans=0.0 2023-12-21 11:23:38,880 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.88 vs. limit=10.1 2023-12-21 11:23:39,918 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.82 vs. limit=11.325 2023-12-21 11:23:48,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=10266.666666666666, ans=0.023888888888888894 2023-12-21 11:23:52,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=10266.666666666666, ans=0.19733333333333333 2023-12-21 11:23:57,369 INFO [train.py:886] (1/4) Epoch 1, batch 1550, loss[loss=0.02254, audio_tagging_loss=0.02254, over 24750.00 frames. ], tot_loss[loss=0.02258, audio_tagging_loss=0.02258, over 4949180.64 frames. ], batch size: 99, lr: 4.45e-02, grad_scale: 32.0 2023-12-21 11:23:57,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=10333.333333333334, ans=0.125 2023-12-21 11:24:00,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=10333.333333333334, ans=0.125 2023-12-21 11:24:12,646 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=20.15 vs. limit=11.4 2023-12-21 11:24:15,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=10400.0, ans=0.023333333333333334 2023-12-21 11:24:21,383 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.93 vs. limit=7.616666666666666 2023-12-21 11:24:26,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=10466.666666666666, ans=0.125 2023-12-21 11:24:41,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=10600.0, ans=0.07 2023-12-21 11:24:44,045 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.27 vs. limit=11.475 2023-12-21 11:24:49,632 INFO [train.py:886] (1/4) Epoch 1, batch 1600, loss[loss=0.01933, audio_tagging_loss=0.01933, over 24750.00 frames. ], tot_loss[loss=0.02244, audio_tagging_loss=0.02244, over 4940313.07 frames. ], batch size: 99, lr: 4.45e-02, grad_scale: 32.0 2023-12-21 11:24:51,543 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.281e+01 2.645e+01 2.930e+01 5.191e+01, threshold=5.289e+01, percent-clipped=1.0 2023-12-21 11:24:51,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=10666.666666666666, ans=0.125 2023-12-21 11:24:52,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=10666.666666666666, ans=0.125 2023-12-21 11:24:56,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=10666.666666666666, ans=0.19333333333333336 2023-12-21 11:25:05,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=10733.333333333334, ans=0.025 2023-12-21 11:25:06,636 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.65 vs. limit=15.55 2023-12-21 11:25:10,851 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.18 vs. limit=11.55 2023-12-21 11:25:16,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=10800.0, ans=0.362 2023-12-21 11:25:21,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=10866.666666666666, ans=0.021388888888888895 2023-12-21 11:25:23,079 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.459e+01 2023-12-21 11:25:28,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.79 vs. limit=11.575 2023-12-21 11:25:32,705 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.46 vs. limit=15.7 2023-12-21 11:25:42,710 INFO [train.py:886] (1/4) Epoch 1, batch 1650, loss[loss=0.02307, audio_tagging_loss=0.02307, over 22814.00 frames. ], tot_loss[loss=0.02216, audio_tagging_loss=0.02216, over 4934278.38 frames. ], batch size: 107, lr: 4.45e-02, grad_scale: 32.0 2023-12-21 11:25:44,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=11000.0, ans=0.125 2023-12-21 11:25:45,237 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.13 vs. limit=15.75 2023-12-21 11:25:49,993 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.62 vs. limit=11.625 2023-12-21 11:25:53,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=11066.666666666666, ans=0.00846376811594203 2023-12-21 11:25:58,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=11066.666666666666, ans=0.125 2023-12-21 11:26:10,425 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.52 vs. limit=11.675 2023-12-21 11:26:16,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=11200.0, ans=0.125 2023-12-21 11:26:16,283 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=11.7 2023-12-21 11:26:26,061 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.43 vs. limit=8.506666666666668 2023-12-21 11:26:33,613 INFO [train.py:886] (1/4) Epoch 1, batch 1700, loss[loss=0.02052, audio_tagging_loss=0.02052, over 25000.00 frames. ], tot_loss[loss=0.02183, audio_tagging_loss=0.02183, over 4936353.89 frames. ], batch size: 100, lr: 4.44e-02, grad_scale: 32.0 2023-12-21 11:26:37,007 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.242e+01 2.541e+01 2.981e+01 4.448e+01, threshold=5.082e+01, percent-clipped=0.0 2023-12-21 11:26:37,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=11333.333333333334, ans=0.5033333333333334 2023-12-21 11:26:44,043 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=23.11 vs. limit=11.75 2023-12-21 11:26:46,101 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=18.60 vs. limit=11.775 2023-12-21 11:26:47,060 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=20.05 vs. limit=11.775 2023-12-21 11:26:49,167 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.09 vs. limit=16.05 2023-12-21 11:26:54,991 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.92 vs. limit=16.1 2023-12-21 11:27:14,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=11533.333333333334, ans=0.125 2023-12-21 11:27:26,580 INFO [train.py:886] (1/4) Epoch 1, batch 1750, loss[loss=0.01836, audio_tagging_loss=0.01836, over 25000.00 frames. ], tot_loss[loss=0.02154, audio_tagging_loss=0.02154, over 4942812.68 frames. ], batch size: 100, lr: 4.44e-02, grad_scale: 32.0 2023-12-21 11:27:30,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=11666.666666666666, ans=16.25 2023-12-21 11:27:42,930 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.77 vs. limit=11.9 2023-12-21 11:27:44,549 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=36.35 vs. limit=11.9 2023-12-21 11:27:47,391 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.890e+00 2023-12-21 11:27:47,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.48 vs. limit=11.925 2023-12-21 11:27:59,486 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.62 vs. limit=11.95 2023-12-21 11:28:12,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=11933.333333333334, ans=0.18066666666666667 2023-12-21 11:28:16,276 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.79 vs. limit=16.45 2023-12-21 11:28:19,023 INFO [train.py:886] (1/4) Epoch 1, batch 1800, loss[loss=0.02274, audio_tagging_loss=0.02274, over 25000.00 frames. ], tot_loss[loss=0.02143, audio_tagging_loss=0.02143, over 4950210.52 frames. ], batch size: 100, lr: 4.44e-02, grad_scale: 32.0 2023-12-21 11:28:20,988 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.620e+01 2.137e+01 2.486e+01 2.786e+01 3.987e+01, threshold=4.972e+01, percent-clipped=0.0 2023-12-21 11:28:22,542 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.96 vs. limit=12.0 2023-12-21 11:28:25,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=12000.0, ans=0.48000000000000004 2023-12-21 11:28:34,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=12066.666666666666, ans=0.025 2023-12-21 11:28:41,395 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.02 vs. limit=12.05 2023-12-21 11:28:43,959 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.91 vs. limit=12.05 2023-12-21 11:28:46,836 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.05 vs. limit=8.033333333333333 2023-12-21 11:28:52,104 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.94 vs. limit=12.075 2023-12-21 11:28:58,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=12200.0, ans=0.07 2023-12-21 11:29:02,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=12266.666666666666, ans=0.07 2023-12-21 11:29:10,294 INFO [train.py:886] (1/4) Epoch 1, batch 1850, loss[loss=0.02239, audio_tagging_loss=0.02239, over 24750.00 frames. ], tot_loss[loss=0.02144, audio_tagging_loss=0.02144, over 4954680.58 frames. ], batch size: 99, lr: 4.43e-02, grad_scale: 32.0 2023-12-21 11:29:16,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=12333.333333333334, ans=0.125 2023-12-21 11:29:19,985 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.99 vs. limit=16.75 2023-12-21 11:29:31,741 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.10 vs. limit=12.175 2023-12-21 11:29:39,589 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.91 vs. limit=11.233333333333333 2023-12-21 11:29:45,317 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=17.21 vs. limit=12.2 2023-12-21 11:29:48,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=12533.333333333334, ans=0.17466666666666666 2023-12-21 11:30:03,671 INFO [train.py:886] (1/4) Epoch 1, batch 1900, loss[loss=0.01771, audio_tagging_loss=0.01771, over 24750.00 frames. ], tot_loss[loss=0.02136, audio_tagging_loss=0.02136, over 4949718.16 frames. ], batch size: 99, lr: 4.43e-02, grad_scale: 32.0 2023-12-21 11:30:05,590 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.668e+01 2.185e+01 2.601e+01 3.008e+01 6.428e+01, threshold=5.202e+01, percent-clipped=3.0 2023-12-21 11:30:05,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=12666.666666666666, ans=0.4566666666666667 2023-12-21 11:30:06,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=12666.666666666666, ans=0.125 2023-12-21 11:30:07,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=12666.666666666666, ans=0.125 2023-12-21 11:30:09,976 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.09 vs. limit=12.25 2023-12-21 11:30:20,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=12733.333333333334, ans=0.0 2023-12-21 11:30:23,804 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.64 vs. limit=12.275 2023-12-21 11:30:38,025 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.36 vs. limit=11.433333333333334 2023-12-21 11:30:38,126 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.65 vs. limit=17.15 2023-12-21 11:30:53,130 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.08 vs. limit=17.2 2023-12-21 11:30:57,341 INFO [train.py:886] (1/4) Epoch 1, batch 1950, loss[loss=0.01917, audio_tagging_loss=0.01917, over 24750.00 frames. ], tot_loss[loss=0.02122, audio_tagging_loss=0.02122, over 4945337.56 frames. ], batch size: 99, lr: 4.43e-02, grad_scale: 32.0 2023-12-21 11:31:02,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.35 vs. limit=17.25 2023-12-21 11:31:04,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=13000.0, ans=0.125 2023-12-21 11:31:05,323 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.59 vs. limit=17.25 2023-12-21 11:31:12,117 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.08 vs. limit=11.533333333333333 2023-12-21 11:31:20,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=13133.333333333334, ans=0.125 2023-12-21 11:31:20,908 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.23 vs. limit=17.35 2023-12-21 11:31:25,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=13133.333333333334, ans=0.125 2023-12-21 11:31:27,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=13133.333333333334, ans=0.4403333333333333 2023-12-21 11:31:27,756 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.42 vs. limit=12.425 2023-12-21 11:31:31,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=13200.0, ans=0.125 2023-12-21 11:31:42,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=13266.666666666666, ans=0.125 2023-12-21 11:31:43,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=13266.666666666666, ans=0.125 2023-12-21 11:31:49,736 INFO [train.py:886] (1/4) Epoch 1, batch 2000, loss[loss=0.02369, audio_tagging_loss=0.02369, over 24750.00 frames. ], tot_loss[loss=0.02097, audio_tagging_loss=0.02097, over 4945668.78 frames. ], batch size: 99, lr: 4.42e-02, grad_scale: 32.0 2023-12-21 11:31:51,644 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.518e+01 2.224e+01 2.549e+01 2.855e+01 5.920e+01, threshold=5.098e+01, percent-clipped=1.0 2023-12-21 11:32:07,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=13400.0, ans=0.125 2023-12-21 11:32:09,642 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.84 vs. limit=17.55 2023-12-21 11:32:11,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=13466.666666666666, ans=0.125 2023-12-21 11:32:18,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=13466.666666666666, ans=0.0 2023-12-21 11:32:21,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=13533.333333333334, ans=0.16466666666666666 2023-12-21 11:32:38,112 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.70 vs. limit=17.7 2023-12-21 11:32:44,302 INFO [train.py:886] (1/4) Epoch 1, batch 2050, loss[loss=0.01992, audio_tagging_loss=0.01992, over 25000.00 frames. ], tot_loss[loss=0.02081, audio_tagging_loss=0.02081, over 4949041.35 frames. ], batch size: 100, lr: 4.42e-02, grad_scale: 32.0 2023-12-21 11:32:51,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=13666.666666666666, ans=0.125 2023-12-21 11:33:01,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=13733.333333333334, ans=0.16266666666666665 2023-12-21 11:33:29,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=13933.333333333334, ans=0.16066666666666665 2023-12-21 11:33:37,331 INFO [train.py:886] (1/4) Epoch 1, batch 2100, loss[loss=0.01684, audio_tagging_loss=0.01684, over 25000.00 frames. ], tot_loss[loss=0.02062, audio_tagging_loss=0.02062, over 4950771.45 frames. ], batch size: 100, lr: 4.42e-02, grad_scale: 32.0 2023-12-21 11:33:39,943 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.625e+01 2.173e+01 2.502e+01 2.826e+01 4.812e+01, threshold=5.003e+01, percent-clipped=0.0 2023-12-21 11:33:41,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=14000.0, ans=0.125 2023-12-21 11:33:55,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=14066.666666666666, ans=0.008055555555555559 2023-12-21 11:34:02,337 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.37 vs. limit=18.1 2023-12-21 11:34:06,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.41 vs. limit=18.1 2023-12-21 11:34:13,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=14200.0, ans=0.125 2023-12-21 11:34:21,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=14266.666666666666, ans=0.125 2023-12-21 11:34:30,109 INFO [train.py:886] (1/4) Epoch 1, batch 2150, loss[loss=0.01842, audio_tagging_loss=0.01842, over 24750.00 frames. ], tot_loss[loss=0.02059, audio_tagging_loss=0.02059, over 4954961.04 frames. ], batch size: 99, lr: 4.41e-02, grad_scale: 32.0 2023-12-21 11:34:30,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=14333.333333333334, ans=0.125 2023-12-21 11:34:36,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=14333.333333333334, ans=0.025 2023-12-21 11:34:37,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=14333.333333333334, ans=0.125 2023-12-21 11:34:48,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.50 vs. limit=18.3 2023-12-21 11:34:50,256 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.51 vs. limit=12.9 2023-12-21 11:34:57,953 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.06 vs. limit=12.233333333333333 2023-12-21 11:35:08,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=14533.333333333334, ans=0.025 2023-12-21 11:35:08,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=14533.333333333334, ans=0.15466666666666667 2023-12-21 11:35:09,385 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.36 vs. limit=12.266666666666667 2023-12-21 11:35:11,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=14600.0, ans=0.125 2023-12-21 11:35:22,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=14666.666666666666, ans=0.125 2023-12-21 11:35:23,973 INFO [train.py:886] (1/4) Epoch 1, batch 2200, loss[loss=0.02242, audio_tagging_loss=0.02242, over 24750.00 frames. ], tot_loss[loss=0.02062, audio_tagging_loss=0.02062, over 4949645.01 frames. ], batch size: 99, lr: 4.41e-02, grad_scale: 32.0 2023-12-21 11:35:25,955 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.719e+01 2.344e+01 2.656e+01 2.983e+01 4.042e+01, threshold=5.311e+01, percent-clipped=0.0 2023-12-21 11:35:26,301 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=24.79 vs. limit=13.0 2023-12-21 11:35:27,544 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.01 vs. limit=5.2 2023-12-21 11:35:40,999 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.51 vs. limit=18.55 2023-12-21 11:35:42,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=14733.333333333334, ans=0.3843333333333333 2023-12-21 11:35:58,094 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.90 vs. limit=13.075 2023-12-21 11:36:12,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=14933.333333333334, ans=0.004444444444444438 2023-12-21 11:36:16,363 INFO [train.py:886] (1/4) Epoch 1, batch 2250, loss[loss=0.01898, audio_tagging_loss=0.01898, over 24750.00 frames. ], tot_loss[loss=0.02054, audio_tagging_loss=0.02054, over 4947491.84 frames. ], batch size: 99, lr: 4.40e-02, grad_scale: 64.0 2023-12-21 11:36:21,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=15000.0, ans=0.375 2023-12-21 11:36:22,500 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.95 vs. limit=13.125 2023-12-21 11:36:23,643 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=13.125 2023-12-21 11:36:26,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=15066.666666666666, ans=0.09899494936611666 2023-12-21 11:36:53,086 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.76 vs. limit=18.9 2023-12-21 11:36:55,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=15200.0, ans=0.007565217391304348 2023-12-21 11:37:05,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=15266.666666666666, ans=0.125 2023-12-21 11:37:07,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=15333.333333333334, ans=0.125 2023-12-21 11:37:08,451 INFO [train.py:886] (1/4) Epoch 1, batch 2300, loss[loss=0.02018, audio_tagging_loss=0.02018, over 25000.00 frames. ], tot_loss[loss=0.02033, audio_tagging_loss=0.02033, over 4948028.60 frames. ], batch size: 100, lr: 4.40e-02, grad_scale: 64.0 2023-12-21 11:37:10,374 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.341e+01 2.591e+01 2.957e+01 4.107e+01, threshold=5.182e+01, percent-clipped=0.0 2023-12-21 11:37:13,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=15333.333333333334, ans=0.125 2023-12-21 11:37:20,405 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.52 vs. limit=13.275 2023-12-21 11:37:22,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=15400.0, ans=0.125 2023-12-21 11:37:35,219 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.38 vs. limit=8.866666666666667 2023-12-21 11:37:50,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=15600.0, ans=0.125 2023-12-21 11:37:51,221 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.75 vs. limit=13.35 2023-12-21 11:38:01,169 INFO [train.py:886] (1/4) Epoch 1, batch 2350, loss[loss=0.0211, audio_tagging_loss=0.0211, over 22545.00 frames. ], tot_loss[loss=0.02023, audio_tagging_loss=0.02023, over 4946245.30 frames. ], batch size: 107, lr: 4.40e-02, grad_scale: 64.0 2023-12-21 11:38:02,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=15666.666666666666, ans=0.14333333333333334 2023-12-21 11:38:06,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=15666.666666666666, ans=0.125 2023-12-21 11:38:07,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=15666.666666666666, ans=0.14333333333333334 2023-12-21 11:38:36,639 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.73 vs. limit=19.4 2023-12-21 11:38:38,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=15866.666666666666, ans=0.125 2023-12-21 11:38:46,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=15933.333333333334, ans=0.07 2023-12-21 11:38:53,118 INFO [train.py:886] (1/4) Epoch 1, batch 2400, loss[loss=0.02071, audio_tagging_loss=0.02071, over 24750.00 frames. ], tot_loss[loss=0.02026, audio_tagging_loss=0.02026, over 4947942.64 frames. ], batch size: 99, lr: 4.39e-02, grad_scale: 64.0 2023-12-21 11:38:54,989 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.711e+01 2.350e+01 2.627e+01 2.967e+01 3.953e+01, threshold=5.253e+01, percent-clipped=0.0 2023-12-21 11:38:55,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=16000.0, ans=0.125 2023-12-21 11:39:27,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=16200.0, ans=0.125 2023-12-21 11:39:38,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=16266.666666666666, ans=0.125 2023-12-21 11:39:39,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=16266.666666666666, ans=0.13733333333333334 2023-12-21 11:39:39,136 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=13.6 2023-12-21 11:39:42,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=16266.666666666666, ans=0.125 2023-12-21 11:39:44,701 INFO [train.py:886] (1/4) Epoch 1, batch 2450, loss[loss=0.02006, audio_tagging_loss=0.02006, over 25000.00 frames. ], tot_loss[loss=0.02034, audio_tagging_loss=0.02034, over 4951020.43 frames. ], batch size: 100, lr: 4.39e-02, grad_scale: 64.0 2023-12-21 11:39:48,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=16333.333333333334, ans=0.0 2023-12-21 11:39:55,443 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.08 vs. limit=19.8 2023-12-21 11:40:12,099 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.41 vs. limit=13.233333333333334 2023-12-21 11:40:15,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=16533.333333333332, ans=0.125 2023-12-21 11:40:20,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=16533.333333333332, ans=0.125 2023-12-21 11:40:34,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=16600.0, ans=10.0 2023-12-21 11:40:36,792 INFO [train.py:886] (1/4) Epoch 1, batch 2500, loss[loss=0.02179, audio_tagging_loss=0.02179, over 24750.00 frames. ], tot_loss[loss=0.02039, audio_tagging_loss=0.02039, over 4950494.75 frames. ], batch size: 99, lr: 4.38e-02, grad_scale: 64.0 2023-12-21 11:40:38,712 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.472e+01 2.667e+01 3.044e+01 4.269e+01, threshold=5.334e+01, percent-clipped=0.0 2023-12-21 11:40:43,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=16666.666666666668, ans=0.31666666666666676 2023-12-21 11:40:46,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=16733.333333333332, ans=0.125 2023-12-21 11:40:47,961 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.62 vs. limit=9.183333333333334 2023-12-21 11:41:02,873 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.90 vs. limit=20.1 2023-12-21 11:41:03,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=16800.0, ans=0.132 2023-12-21 11:41:07,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=16866.666666666668, ans=0.0 2023-12-21 11:41:14,909 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=2.565e-03 2023-12-21 11:41:27,286 INFO [train.py:886] (1/4) Epoch 1, batch 2550, loss[loss=0.02343, audio_tagging_loss=0.02343, over 24750.00 frames. ], tot_loss[loss=0.02043, audio_tagging_loss=0.02043, over 4950000.40 frames. ], batch size: 99, lr: 4.38e-02, grad_scale: 64.0 2023-12-21 11:41:29,413 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.28 vs. limit=5.55 2023-12-21 11:41:39,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=17066.666666666668, ans=10.826666666666668 2023-12-21 11:41:41,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=17066.666666666668, ans=0.12933333333333333 2023-12-21 11:41:44,699 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.267e+01 2023-12-21 11:41:52,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.64 vs. limit=13.924999999999999 2023-12-21 11:42:10,850 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.75 vs. limit=20.450000000000003 2023-12-21 11:42:15,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=17266.666666666668, ans=0.0 2023-12-21 11:42:21,178 INFO [train.py:886] (1/4) Epoch 1, batch 2600, loss[loss=0.02114, audio_tagging_loss=0.02114, over 25000.00 frames. ], tot_loss[loss=0.02024, audio_tagging_loss=0.02024, over 4951346.55 frames. ], batch size: 100, lr: 4.37e-02, grad_scale: 64.0 2023-12-21 11:42:23,105 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.054e+01 2.528e+01 2.807e+01 3.292e+01 4.352e+01, threshold=5.614e+01, percent-clipped=0.0 2023-12-21 11:42:30,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=17400.0, ans=0.0 2023-12-21 11:42:30,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=17400.0, ans=0.125 2023-12-21 11:42:41,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=17400.0, ans=0.125 2023-12-21 11:42:43,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.10 vs. limit=20.6 2023-12-21 11:42:44,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=17466.666666666668, ans=0.28866666666666674 2023-12-21 11:42:49,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=17466.666666666668, ans=0.12533333333333332 2023-12-21 11:42:49,711 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.27 vs. limit=5.62 2023-12-21 11:42:53,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=17533.333333333332, ans=0.125 2023-12-21 11:42:54,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=17533.333333333332, ans=0.0 2023-12-21 11:42:55,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=17533.333333333332, ans=0.05 2023-12-21 11:43:05,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=17600.0, ans=0.28400000000000003 2023-12-21 11:43:13,608 INFO [train.py:886] (1/4) Epoch 1, batch 2650, loss[loss=0.019, audio_tagging_loss=0.019, over 24111.00 frames. ], tot_loss[loss=0.02005, audio_tagging_loss=0.02005, over 4952331.57 frames. ], batch size: 100, lr: 4.37e-02, grad_scale: 64.0 2023-12-21 11:43:14,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=17666.666666666668, ans=0.0 2023-12-21 11:43:16,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=17666.666666666668, ans=0.125 2023-12-21 11:43:20,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.05 vs. limit=11.066666666666666 2023-12-21 11:43:22,603 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.84 vs. limit=14.125 2023-12-21 11:43:26,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=17733.333333333332, ans=0.0 2023-12-21 11:43:42,399 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.52 vs. limit=20.85 2023-12-21 11:43:50,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=17866.666666666668, ans=0.125 2023-12-21 11:44:04,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=18000.0, ans=0.125 2023-12-21 11:44:05,200 INFO [train.py:886] (1/4) Epoch 1, batch 2700, loss[loss=0.01963, audio_tagging_loss=0.01963, over 25000.00 frames. ], tot_loss[loss=0.02002, audio_tagging_loss=0.02002, over 4957247.05 frames. ], batch size: 100, lr: 4.36e-02, grad_scale: 64.0 2023-12-21 11:44:07,123 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.044e+01 2.548e+01 2.795e+01 3.093e+01 4.851e+01, threshold=5.589e+01, percent-clipped=0.0 2023-12-21 11:44:07,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=18000.0, ans=0.125 2023-12-21 11:44:12,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=18000.0, ans=0.27 2023-12-21 11:44:14,942 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.38 vs. limit=11.226666666666667 2023-12-21 11:44:49,321 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.72 vs. limit=14.35 2023-12-21 11:44:58,038 INFO [train.py:886] (1/4) Epoch 1, batch 2750, loss[loss=0.02067, audio_tagging_loss=0.02067, over 25000.00 frames. ], tot_loss[loss=0.01997, audio_tagging_loss=0.01997, over 4960747.72 frames. ], batch size: 100, lr: 4.36e-02, grad_scale: 64.0 2023-12-21 11:45:04,239 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.78 vs. limit=9.583333333333332 2023-12-21 11:45:32,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=18533.333333333332, ans=0.2513333333333334 2023-12-21 11:45:42,937 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.33 vs. limit=14.475 2023-12-21 11:45:49,228 INFO [train.py:886] (1/4) Epoch 1, batch 2800, loss[loss=0.01515, audio_tagging_loss=0.01515, over 25000.00 frames. ], tot_loss[loss=0.02002, audio_tagging_loss=0.02002, over 4959119.51 frames. ], batch size: 100, lr: 4.36e-02, grad_scale: 64.0 2023-12-21 11:45:51,130 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.719e+01 3.067e+01 3.329e+01 4.208e+01, threshold=6.133e+01, percent-clipped=0.0 2023-12-21 11:45:53,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=18666.666666666668, ans=0.0 2023-12-21 11:45:58,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=18666.666666666668, ans=0.11333333333333331 2023-12-21 11:46:01,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=18733.333333333332, ans=0.125 2023-12-21 11:46:31,747 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.51 vs. limit=21.7 2023-12-21 11:46:41,868 INFO [train.py:886] (1/4) Epoch 1, batch 2850, loss[loss=0.0191, audio_tagging_loss=0.0191, over 24750.00 frames. ], tot_loss[loss=0.01995, audio_tagging_loss=0.01995, over 4953220.57 frames. ], batch size: 99, lr: 4.35e-02, grad_scale: 64.0 2023-12-21 11:46:51,751 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.67 vs. limit=21.8 2023-12-21 11:47:00,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=19066.666666666668, ans=0.10933333333333331 2023-12-21 11:47:01,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=19066.666666666668, ans=0.125 2023-12-21 11:47:03,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=19133.333333333332, ans=0.125 2023-12-21 11:47:06,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=19133.333333333332, ans=0.0 2023-12-21 11:47:14,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=19200.0, ans=0.125 2023-12-21 11:47:18,638 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.98 vs. limit=14.6 2023-12-21 11:47:24,131 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.573e+00 2023-12-21 11:47:35,409 INFO [train.py:886] (1/4) Epoch 1, batch 2900, loss[loss=0.01578, audio_tagging_loss=0.01578, over 25000.00 frames. ], tot_loss[loss=0.01983, audio_tagging_loss=0.01983, over 4953450.68 frames. ], batch size: 100, lr: 4.35e-02, grad_scale: 64.0 2023-12-21 11:47:35,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=19333.333333333332, ans=0.0 2023-12-21 11:47:37,296 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.029e+01 2.573e+01 2.906e+01 3.283e+01 4.730e+01, threshold=5.812e+01, percent-clipped=0.0 2023-12-21 11:47:39,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=19333.333333333332, ans=0.125 2023-12-21 11:47:39,656 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.34 vs. limit=11.733333333333333 2023-12-21 11:47:40,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=19333.333333333332, ans=0.125 2023-12-21 11:47:56,822 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.57 vs. limit=14.8 2023-12-21 11:48:00,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=19466.666666666668, ans=0.125 2023-12-21 11:48:13,362 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.91 vs. limit=22.15 2023-12-21 11:48:17,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=19600.0, ans=0.125 2023-12-21 11:48:26,188 INFO [train.py:886] (1/4) Epoch 1, batch 2950, loss[loss=0.01844, audio_tagging_loss=0.01844, over 24750.00 frames. ], tot_loss[loss=0.01961, audio_tagging_loss=0.01961, over 4953777.82 frames. ], batch size: 99, lr: 4.34e-02, grad_scale: 64.0 2023-12-21 11:48:39,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=19733.333333333332, ans=0.0 2023-12-21 11:48:42,838 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.02 vs. limit=22.3 2023-12-21 11:48:49,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=19800.0, ans=0.125 2023-12-21 11:48:52,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=19800.0, ans=0.0 2023-12-21 11:48:58,377 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.08 vs. limit=14.95 2023-12-21 11:49:00,252 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.86 vs. limit=22.4 2023-12-21 11:49:20,068 INFO [train.py:886] (1/4) Epoch 1, batch 3000, loss[loss=0.01937, audio_tagging_loss=0.01937, over 24750.00 frames. ], tot_loss[loss=0.01959, audio_tagging_loss=0.01959, over 4951463.07 frames. ], batch size: 99, lr: 4.34e-02, grad_scale: 64.0 2023-12-21 11:49:20,069 INFO [train.py:909] (1/4) Computing validation loss 2023-12-21 11:49:45,414 INFO [train.py:917] (1/4) Epoch 1, validation: loss=0.04441, audio_tagging_loss=0.04441, over 3737520.00 frames. 2023-12-21 11:49:45,415 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-21 11:49:46,775 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.42 vs. limit=15.0 2023-12-21 11:49:47,302 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.148e+01 2.623e+01 2.967e+01 3.286e+01 5.413e+01, threshold=5.933e+01, percent-clipped=0.0 2023-12-21 11:49:48,873 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.90 vs. limit=10.0 2023-12-21 11:50:01,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=20066.666666666668, ans=0.0 2023-12-21 11:50:06,016 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.79 vs. limit=15.0 2023-12-21 11:50:09,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=20133.333333333332, ans=0.125 2023-12-21 11:50:22,934 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.10 vs. limit=15.0 2023-12-21 11:50:34,448 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.81 vs. limit=15.0 2023-12-21 11:50:35,813 INFO [train.py:886] (1/4) Epoch 1, batch 3050, loss[loss=0.02287, audio_tagging_loss=0.02287, over 25000.00 frames. ], tot_loss[loss=0.01951, audio_tagging_loss=0.01951, over 4956473.51 frames. ], batch size: 100, lr: 4.33e-02, grad_scale: 64.0 2023-12-21 11:50:39,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=20333.333333333332, ans=0.5 2023-12-21 11:50:49,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=20400.0, ans=0.1 2023-12-21 11:50:49,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=20400.0, ans=0.125 2023-12-21 11:50:53,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=20400.0, ans=0.0 2023-12-21 11:51:11,249 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=6.300e-02 2023-12-21 11:51:14,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=20533.333333333332, ans=0.125 2023-12-21 11:51:21,308 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.23 vs. limit=10.0 2023-12-21 11:51:23,181 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.63 vs. limit=15.0 2023-12-21 11:51:27,593 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.97 vs. limit=10.0 2023-12-21 11:51:29,138 INFO [train.py:886] (1/4) Epoch 1, batch 3100, loss[loss=0.01769, audio_tagging_loss=0.01769, over 24750.00 frames. ], tot_loss[loss=0.01953, audio_tagging_loss=0.01953, over 4960721.09 frames. ], batch size: 99, lr: 4.33e-02, grad_scale: 64.0 2023-12-21 11:51:31,049 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.101e+01 2.608e+01 2.817e+01 3.164e+01 4.242e+01, threshold=5.634e+01, percent-clipped=0.0 2023-12-21 11:51:37,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=20733.333333333332, ans=0.125 2023-12-21 11:51:39,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=20733.333333333332, ans=0.125 2023-12-21 11:52:00,918 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.30 vs. limit=15.0 2023-12-21 11:52:06,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=20866.666666666668, ans=0.0 2023-12-21 11:52:06,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=20866.666666666668, ans=0.2 2023-12-21 11:52:13,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=20933.333333333332, ans=0.125 2023-12-21 11:52:20,862 INFO [train.py:886] (1/4) Epoch 1, batch 3150, loss[loss=0.01788, audio_tagging_loss=0.01788, over 24750.00 frames. ], tot_loss[loss=0.01966, audio_tagging_loss=0.01966, over 4952502.56 frames. ], batch size: 99, lr: 4.32e-02, grad_scale: 64.0 2023-12-21 11:52:22,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=21000.0, ans=0.2 2023-12-21 11:52:24,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=21000.0, ans=0.125 2023-12-21 11:52:25,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=21000.0, ans=0.125 2023-12-21 11:52:56,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=21200.0, ans=0.006260869565217392 2023-12-21 11:52:57,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=16.35 vs. limit=15.0 2023-12-21 11:53:07,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=21266.666666666668, ans=0.006246376811594203 2023-12-21 11:53:09,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=21266.666666666668, ans=0.125 2023-12-21 11:53:11,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=21266.666666666668, ans=0.125 2023-12-21 11:53:13,091 INFO [train.py:886] (1/4) Epoch 1, batch 3200, loss[loss=0.01744, audio_tagging_loss=0.01744, over 24750.00 frames. ], tot_loss[loss=0.01975, audio_tagging_loss=0.01975, over 4950608.21 frames. ], batch size: 99, lr: 4.32e-02, grad_scale: 64.0 2023-12-21 11:53:14,986 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.099e+01 2.755e+01 2.973e+01 3.408e+01 4.303e+01, threshold=5.945e+01, percent-clipped=0.0 2023-12-21 11:53:17,399 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.68 vs. limit=10.0 2023-12-21 11:53:22,840 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.83 vs. limit=15.0 2023-12-21 11:53:33,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=21466.666666666668, ans=0.125 2023-12-21 11:53:37,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.32 vs. limit=15.0 2023-12-21 11:53:39,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=21466.666666666668, ans=0.0 2023-12-21 11:53:43,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=21533.333333333332, ans=0.2 2023-12-21 11:53:47,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=21533.333333333332, ans=0.125 2023-12-21 11:54:05,820 INFO [train.py:886] (1/4) Epoch 1, batch 3250, loss[loss=0.01843, audio_tagging_loss=0.01843, over 25000.00 frames. ], tot_loss[loss=0.01947, audio_tagging_loss=0.01947, over 4948648.97 frames. ], batch size: 100, lr: 4.31e-02, grad_scale: 64.0 2023-12-21 11:54:06,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=21666.666666666668, ans=0.125 2023-12-21 11:54:10,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=21666.666666666668, ans=0.125 2023-12-21 11:54:27,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=21800.0, ans=0.2 2023-12-21 11:54:29,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=21800.0, ans=0.07 2023-12-21 11:54:32,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=21800.0, ans=0.0 2023-12-21 11:54:56,783 INFO [train.py:886] (1/4) Epoch 1, batch 3300, loss[loss=0.01804, audio_tagging_loss=0.01804, over 24750.00 frames. ], tot_loss[loss=0.01932, audio_tagging_loss=0.01932, over 4948975.46 frames. ], batch size: 99, lr: 4.31e-02, grad_scale: 64.0 2023-12-21 11:54:59,350 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.047e+01 2.622e+01 2.937e+01 3.224e+01 4.411e+01, threshold=5.874e+01, percent-clipped=0.0 2023-12-21 11:55:01,279 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.465e+00 2023-12-21 11:55:04,299 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.31 vs. limit=6.0 2023-12-21 11:55:11,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=22066.666666666668, ans=0.0 2023-12-21 11:55:14,636 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=18.95 vs. limit=15.0 2023-12-21 11:55:26,744 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.29 vs. limit=15.0 2023-12-21 11:55:28,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=22200.0, ans=0.125 2023-12-21 11:55:29,586 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.55 vs. limit=22.5 2023-12-21 11:55:29,902 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=20.51 vs. limit=22.5 2023-12-21 11:55:33,127 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=44.28 vs. limit=22.5 2023-12-21 11:55:50,062 INFO [train.py:886] (1/4) Epoch 1, batch 3350, loss[loss=0.01942, audio_tagging_loss=0.01942, over 25000.00 frames. ], tot_loss[loss=0.01936, audio_tagging_loss=0.01936, over 4948662.49 frames. ], batch size: 100, lr: 4.30e-02, grad_scale: 64.0 2023-12-21 11:56:01,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=22400.0, ans=0.0 2023-12-21 11:56:04,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=22400.0, ans=0.125 2023-12-21 11:56:17,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=22466.666666666668, ans=0.0059855072463768115 2023-12-21 11:56:43,115 INFO [train.py:886] (1/4) Epoch 1, batch 3400, loss[loss=0.02111, audio_tagging_loss=0.02111, over 25000.00 frames. ], tot_loss[loss=0.01941, audio_tagging_loss=0.01941, over 4952076.20 frames. ], batch size: 100, lr: 4.29e-02, grad_scale: 64.0 2023-12-21 11:56:45,032 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.311e+01 2.691e+01 2.947e+01 3.309e+01 4.555e+01, threshold=5.894e+01, percent-clipped=0.0 2023-12-21 11:56:45,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=22666.666666666668, ans=0.005942028985507246 2023-12-21 11:56:46,597 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.16 vs. limit=15.0 2023-12-21 11:56:53,353 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=27.46 vs. limit=22.5 2023-12-21 11:57:00,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=22733.333333333332, ans=0.07 2023-12-21 11:57:02,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=22800.0, ans=0.2 2023-12-21 11:57:17,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=22866.666666666668, ans=0.95 2023-12-21 11:57:29,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=22933.333333333332, ans=0.09899494936611666 2023-12-21 11:57:30,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=22933.333333333332, ans=0.125 2023-12-21 11:57:33,814 INFO [train.py:886] (1/4) Epoch 1, batch 3450, loss[loss=0.0201, audio_tagging_loss=0.0201, over 24750.00 frames. ], tot_loss[loss=0.01947, audio_tagging_loss=0.01947, over 4945144.19 frames. ], batch size: 99, lr: 4.29e-02, grad_scale: 64.0 2023-12-21 11:57:49,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=23066.666666666668, ans=0.125 2023-12-21 11:57:54,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=23133.333333333332, ans=0.0 2023-12-21 11:57:58,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=23133.333333333332, ans=0.125 2023-12-21 11:58:09,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=23200.0, ans=0.00582608695652174 2023-12-21 11:58:24,318 INFO [train.py:886] (1/4) Epoch 1, batch 3500, loss[loss=0.02127, audio_tagging_loss=0.02127, over 25000.00 frames. ], tot_loss[loss=0.01948, audio_tagging_loss=0.01948, over 4941720.49 frames. ], batch size: 100, lr: 4.28e-02, grad_scale: 64.0 2023-12-21 11:58:26,217 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.180e+01 2.654e+01 2.914e+01 3.165e+01 4.933e+01, threshold=5.829e+01, percent-clipped=0.0 2023-12-21 11:58:32,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=23333.333333333332, ans=0.07 2023-12-21 11:59:00,014 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.28 vs. limit=15.0 2023-12-21 11:59:02,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=23533.333333333332, ans=0.2 2023-12-21 11:59:02,924 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.05 vs. limit=12.0 2023-12-21 11:59:15,526 INFO [train.py:886] (1/4) Epoch 1, batch 3550, loss[loss=0.01904, audio_tagging_loss=0.01904, over 24750.00 frames. ], tot_loss[loss=0.01931, audio_tagging_loss=0.01931, over 4936917.01 frames. ], batch size: 99, lr: 4.28e-02, grad_scale: 64.0 2023-12-21 11:59:26,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=23733.333333333332, ans=0.07 2023-12-21 11:59:31,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=23733.333333333332, ans=0.125 2023-12-21 11:59:48,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=23866.666666666668, ans=0.1 2023-12-21 12:00:05,898 INFO [train.py:886] (1/4) Epoch 1, batch 3600, loss[loss=0.01997, audio_tagging_loss=0.01997, over 24750.00 frames. ], tot_loss[loss=0.01912, audio_tagging_loss=0.01912, over 4941542.19 frames. ], batch size: 99, lr: 4.27e-02, grad_scale: 64.0 2023-12-21 12:00:07,818 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.059e+01 2.526e+01 2.849e+01 3.295e+01 5.645e+01, threshold=5.698e+01, percent-clipped=0.0 2023-12-21 12:00:12,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=24000.0, ans=0.1 2023-12-21 12:00:19,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=24066.666666666668, ans=0.2 2023-12-21 12:00:25,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=24133.333333333332, ans=0.125 2023-12-21 12:00:32,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=18.26 vs. limit=15.0 2023-12-21 12:00:35,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=24200.0, ans=0.0 2023-12-21 12:00:45,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=24266.666666666668, ans=0.125 2023-12-21 12:00:57,630 INFO [train.py:886] (1/4) Epoch 1, batch 3650, loss[loss=0.01568, audio_tagging_loss=0.01568, over 25000.00 frames. ], tot_loss[loss=0.01909, audio_tagging_loss=0.01909, over 4941855.38 frames. ], batch size: 100, lr: 4.27e-02, grad_scale: 64.0 2023-12-21 12:01:01,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=24333.333333333332, ans=0.125 2023-12-21 12:01:47,747 INFO [train.py:886] (1/4) Epoch 1, batch 3700, loss[loss=0.02222, audio_tagging_loss=0.02222, over 25000.00 frames. ], tot_loss[loss=0.01919, audio_tagging_loss=0.01919, over 4942413.41 frames. ], batch size: 100, lr: 4.26e-02, grad_scale: 64.0 2023-12-21 12:01:49,608 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.529e+01 2.856e+01 3.179e+01 4.127e+01, threshold=5.712e+01, percent-clipped=0.0 2023-12-21 12:02:00,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=24733.333333333332, ans=0.005492753623188407 2023-12-21 12:02:04,254 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.62 vs. limit=15.0 2023-12-21 12:02:08,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=24800.0, ans=0.125 2023-12-21 12:02:12,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=24800.0, ans=0.1 2023-12-21 12:02:18,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.47 vs. limit=15.0 2023-12-21 12:02:20,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=24866.666666666668, ans=0.0 2023-12-21 12:02:21,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=24866.666666666668, ans=0.1 2023-12-21 12:02:33,247 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.24 vs. limit=6.0 2023-12-21 12:02:34,269 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=27.47 vs. limit=22.5 2023-12-21 12:02:38,544 INFO [train.py:886] (1/4) Epoch 1, batch 3750, loss[loss=0.01813, audio_tagging_loss=0.01813, over 24750.00 frames. ], tot_loss[loss=0.0193, audio_tagging_loss=0.0193, over 4944346.72 frames. ], batch size: 99, lr: 4.26e-02, grad_scale: 64.0 2023-12-21 12:02:38,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=25000.0, ans=0.1 2023-12-21 12:02:40,719 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=12.0 2023-12-21 12:02:48,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=25066.666666666668, ans=0.125 2023-12-21 12:02:56,421 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.26 vs. limit=22.5 2023-12-21 12:03:02,030 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.40 vs. limit=15.0 2023-12-21 12:03:07,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=25133.333333333332, ans=0.0 2023-12-21 12:03:17,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=25200.0, ans=0.125 2023-12-21 12:03:22,051 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.82 vs. limit=22.5 2023-12-21 12:03:28,301 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.98 vs. limit=15.0 2023-12-21 12:03:28,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=25266.666666666668, ans=0.2 2023-12-21 12:03:30,820 INFO [train.py:886] (1/4) Epoch 1, batch 3800, loss[loss=0.02315, audio_tagging_loss=0.02315, over 21573.00 frames. ], tot_loss[loss=0.01935, audio_tagging_loss=0.01935, over 4935085.06 frames. ], batch size: 107, lr: 4.25e-02, grad_scale: 64.0 2023-12-21 12:03:32,664 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.106e+01 2.585e+01 2.873e+01 3.239e+01 4.281e+01, threshold=5.745e+01, percent-clipped=0.0 2023-12-21 12:03:37,824 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.26 vs. limit=15.0 2023-12-21 12:04:00,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=25533.333333333332, ans=0.0 2023-12-21 12:04:07,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=25533.333333333332, ans=0.125 2023-12-21 12:04:09,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=25533.333333333332, ans=0.125 2023-12-21 12:04:12,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.05 vs. limit=15.0 2023-12-21 12:04:13,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=25600.0, ans=0.0 2023-12-21 12:04:13,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=25600.0, ans=0.1 2023-12-21 12:04:14,742 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=17.89 vs. limit=15.0 2023-12-21 12:04:20,856 INFO [train.py:886] (1/4) Epoch 1, batch 3850, loss[loss=0.01914, audio_tagging_loss=0.01914, over 25000.00 frames. ], tot_loss[loss=0.01916, audio_tagging_loss=0.01916, over 4938181.05 frames. ], batch size: 100, lr: 4.24e-02, grad_scale: 64.0 2023-12-21 12:04:40,602 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.26 vs. limit=6.0 2023-12-21 12:04:43,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=25800.0, ans=0.1 2023-12-21 12:04:45,322 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.87 vs. limit=15.0 2023-12-21 12:04:47,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=25800.0, ans=0.1 2023-12-21 12:04:54,205 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.52 vs. limit=12.0 2023-12-21 12:05:07,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=25933.333333333332, ans=0.1 2023-12-21 12:05:12,882 INFO [train.py:886] (1/4) Epoch 1, batch 3900, loss[loss=0.01866, audio_tagging_loss=0.01866, over 24750.00 frames. ], tot_loss[loss=0.01904, audio_tagging_loss=0.01904, over 4938257.73 frames. ], batch size: 99, lr: 4.24e-02, grad_scale: 64.0 2023-12-21 12:05:14,770 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.184e+01 2.615e+01 2.835e+01 3.211e+01 6.050e+01, threshold=5.671e+01, percent-clipped=1.0 2023-12-21 12:05:18,090 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.91 vs. limit=15.0 2023-12-21 12:05:20,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=26000.0, ans=0.1 2023-12-21 12:05:23,166 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=16.84 vs. limit=15.0 2023-12-21 12:05:25,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=26066.666666666668, ans=0.125 2023-12-21 12:05:27,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=26066.666666666668, ans=0.125 2023-12-21 12:05:28,647 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.78 vs. limit=15.0 2023-12-21 12:05:32,649 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=12.0 2023-12-21 12:05:45,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=26200.0, ans=0.0 2023-12-21 12:05:47,141 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.83 vs. limit=6.0 2023-12-21 12:05:50,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=26200.0, ans=0.125 2023-12-21 12:05:55,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=26266.666666666668, ans=0.0 2023-12-21 12:05:59,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=26266.666666666668, ans=0.0 2023-12-21 12:06:01,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=26266.666666666668, ans=0.1 2023-12-21 12:06:05,270 INFO [train.py:886] (1/4) Epoch 1, batch 3950, loss[loss=0.02254, audio_tagging_loss=0.02254, over 24750.00 frames. ], tot_loss[loss=0.01904, audio_tagging_loss=0.01904, over 4943255.78 frames. ], batch size: 99, lr: 4.23e-02, grad_scale: 64.0 2023-12-21 12:06:08,386 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.546e+01 2023-12-21 12:06:16,285 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.15 vs. limit=15.0 2023-12-21 12:06:26,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=26466.666666666668, ans=0.0 2023-12-21 12:06:44,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=26533.333333333332, ans=0.125 2023-12-21 12:06:44,421 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=5.234e+00 2023-12-21 12:06:49,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=26600.0, ans=0.0 2023-12-21 12:06:53,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=26600.0, ans=0.125 2023-12-21 12:06:54,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=26666.666666666668, ans=0.125 2023-12-21 12:06:54,976 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.510e+01 2023-12-21 12:06:57,689 INFO [train.py:886] (1/4) Epoch 1, batch 4000, loss[loss=0.01952, audio_tagging_loss=0.01952, over 25000.00 frames. ], tot_loss[loss=0.01901, audio_tagging_loss=0.01901, over 4950043.08 frames. ], batch size: 100, lr: 4.23e-02, grad_scale: 64.0 2023-12-21 12:06:59,528 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.146e+01 2.600e+01 2.855e+01 3.213e+01 4.653e+01, threshold=5.711e+01, percent-clipped=0.0 2023-12-21 12:07:00,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=26666.666666666668, ans=0.005072463768115942 2023-12-21 12:07:02,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.38 vs. limit=15.0 2023-12-21 12:07:14,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=26733.333333333332, ans=0.2 2023-12-21 12:07:31,360 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.49 vs. limit=22.5 2023-12-21 12:07:32,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=26866.666666666668, ans=0.125 2023-12-21 12:07:50,892 INFO [train.py:886] (1/4) Epoch 1, batch 4050, loss[loss=0.02168, audio_tagging_loss=0.02168, over 25000.00 frames. ], tot_loss[loss=0.01909, audio_tagging_loss=0.01909, over 4948845.43 frames. ], batch size: 100, lr: 4.22e-02, grad_scale: 64.0 2023-12-21 12:07:51,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=27000.0, ans=0.04949747468305833 2023-12-21 12:07:58,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=27000.0, ans=0.1 2023-12-21 12:08:05,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=27066.666666666668, ans=0.025 2023-12-21 12:08:06,551 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.68 vs. limit=6.0 2023-12-21 12:08:10,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=27133.333333333332, ans=0.0 2023-12-21 12:08:16,253 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.936e+00 2023-12-21 12:08:21,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=27200.0, ans=0.04949747468305833 2023-12-21 12:08:25,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=27200.0, ans=0.125 2023-12-21 12:08:26,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=27200.0, ans=0.0 2023-12-21 12:08:26,550 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.45 vs. limit=15.0 2023-12-21 12:08:27,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=27200.0, ans=0.125 2023-12-21 12:08:36,276 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.38 vs. limit=15.0 2023-12-21 12:08:41,512 INFO [train.py:886] (1/4) Epoch 1, batch 4100, loss[loss=0.02063, audio_tagging_loss=0.02063, over 24750.00 frames. ], tot_loss[loss=0.01927, audio_tagging_loss=0.01927, over 4949295.64 frames. ], batch size: 99, lr: 4.22e-02, grad_scale: 64.0 2023-12-21 12:08:44,142 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.558e+01 2.802e+01 3.131e+01 4.356e+01, threshold=5.603e+01, percent-clipped=0.0 2023-12-21 12:08:58,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=27400.0, ans=0.2 2023-12-21 12:09:07,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=27466.666666666668, ans=0.1 2023-12-21 12:09:09,743 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.38 vs. limit=15.0 2023-12-21 12:09:22,014 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.138e+01 2023-12-21 12:09:27,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=27600.0, ans=0.125 2023-12-21 12:09:34,835 INFO [train.py:886] (1/4) Epoch 1, batch 4150, loss[loss=0.01767, audio_tagging_loss=0.01767, over 24750.00 frames. ], tot_loss[loss=0.01929, audio_tagging_loss=0.01929, over 4946375.03 frames. ], batch size: 99, lr: 4.21e-02, grad_scale: 64.0 2023-12-21 12:09:38,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=27666.666666666668, ans=0.0 2023-12-21 12:09:42,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=27666.666666666668, ans=0.125 2023-12-21 12:09:43,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=27733.333333333332, ans=0.2 2023-12-21 12:09:46,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=27733.333333333332, ans=0.125 2023-12-21 12:10:00,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=27800.0, ans=0.125 2023-12-21 12:10:01,980 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.12 vs. limit=22.5 2023-12-21 12:10:03,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=27800.0, ans=0.0 2023-12-21 12:10:04,829 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=19.99 vs. limit=15.0 2023-12-21 12:10:05,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=27866.666666666668, ans=0.1 2023-12-21 12:10:07,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=27866.666666666668, ans=0.00481159420289855 2023-12-21 12:10:10,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=27866.666666666668, ans=0.125 2023-12-21 12:10:12,441 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.55 vs. limit=22.5 2023-12-21 12:10:27,468 INFO [train.py:886] (1/4) Epoch 1, batch 4200, loss[loss=0.01754, audio_tagging_loss=0.01754, over 25000.00 frames. ], tot_loss[loss=0.01922, audio_tagging_loss=0.01922, over 4945081.37 frames. ], batch size: 100, lr: 4.20e-02, grad_scale: 64.0 2023-12-21 12:10:30,046 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+01 2.568e+01 2.812e+01 3.182e+01 3.944e+01, threshold=5.625e+01, percent-clipped=0.0 2023-12-21 12:10:58,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=28200.0, ans=0.004739130434782609 2023-12-21 12:11:18,582 INFO [train.py:886] (1/4) Epoch 1, batch 4250, loss[loss=0.0207, audio_tagging_loss=0.0207, over 23999.00 frames. ], tot_loss[loss=0.01918, audio_tagging_loss=0.01918, over 4942378.43 frames. ], batch size: 100, lr: 4.20e-02, grad_scale: 128.0 2023-12-21 12:11:31,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=28400.0, ans=0.125 2023-12-21 12:11:31,692 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.81 vs. limit=15.0 2023-12-21 12:11:38,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=28400.0, ans=0.0 2023-12-21 12:11:47,678 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.73 vs. limit=12.0 2023-12-21 12:11:55,340 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.45 vs. limit=22.5 2023-12-21 12:11:59,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=28533.333333333332, ans=0.05 2023-12-21 12:12:04,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=28600.0, ans=0.125 2023-12-21 12:12:11,738 INFO [train.py:886] (1/4) Epoch 1, batch 4300, loss[loss=0.01803, audio_tagging_loss=0.01803, over 25000.00 frames. ], tot_loss[loss=0.01906, audio_tagging_loss=0.01906, over 4947158.39 frames. ], batch size: 100, lr: 4.19e-02, grad_scale: 128.0 2023-12-21 12:12:13,641 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.582e+01 2.869e+01 3.269e+01 4.965e+01, threshold=5.738e+01, percent-clipped=0.0 2023-12-21 12:12:14,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=28666.666666666668, ans=0.125 2023-12-21 12:12:18,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=28666.666666666668, ans=0.0 2023-12-21 12:12:21,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=28733.333333333332, ans=0.2 2023-12-21 12:12:28,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=28733.333333333332, ans=0.2 2023-12-21 12:12:30,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=28800.0, ans=0.0 2023-12-21 12:12:31,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=28800.0, ans=0.125 2023-12-21 12:12:36,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=28800.0, ans=0.0 2023-12-21 12:12:51,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=28866.666666666668, ans=0.1 2023-12-21 12:13:03,097 INFO [train.py:886] (1/4) Epoch 1, batch 4350, loss[loss=0.0188, audio_tagging_loss=0.0188, over 24750.00 frames. ], tot_loss[loss=0.01913, audio_tagging_loss=0.01913, over 4946477.36 frames. ], batch size: 99, lr: 4.19e-02, grad_scale: 128.0 2023-12-21 12:13:03,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=29000.0, ans=0.125 2023-12-21 12:13:09,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=29000.0, ans=0.125 2023-12-21 12:13:16,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=29066.666666666668, ans=0.1 2023-12-21 12:13:40,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=29200.0, ans=0.1 2023-12-21 12:13:56,074 INFO [train.py:886] (1/4) Epoch 1, batch 4400, loss[loss=0.01845, audio_tagging_loss=0.01845, over 24750.00 frames. ], tot_loss[loss=0.01922, audio_tagging_loss=0.01922, over 4945624.14 frames. ], batch size: 99, lr: 4.18e-02, grad_scale: 128.0 2023-12-21 12:13:57,949 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.042e+01 2.596e+01 2.831e+01 3.124e+01 4.949e+01, threshold=5.662e+01, percent-clipped=0.0 2023-12-21 12:13:58,618 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.47 vs. limit=12.0 2023-12-21 12:13:59,592 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=22.22 vs. limit=22.5 2023-12-21 12:14:02,353 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.31 vs. limit=10.0 2023-12-21 12:14:03,336 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=15.0 2023-12-21 12:14:06,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=29400.0, ans=0.0 2023-12-21 12:14:12,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=29400.0, ans=0.125 2023-12-21 12:14:16,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=29466.666666666668, ans=0.125 2023-12-21 12:14:28,038 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.34 vs. limit=15.0 2023-12-21 12:14:42,528 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.95 vs. limit=22.5 2023-12-21 12:14:48,780 INFO [train.py:886] (1/4) Epoch 1, batch 4450, loss[loss=0.02142, audio_tagging_loss=0.02142, over 25000.00 frames. ], tot_loss[loss=0.01919, audio_tagging_loss=0.01919, over 4942252.89 frames. ], batch size: 100, lr: 4.17e-02, grad_scale: 128.0 2023-12-21 12:14:51,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=29666.666666666668, ans=15.0 2023-12-21 12:15:15,298 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.74 vs. limit=15.0 2023-12-21 12:15:16,981 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=27.09 vs. limit=22.5 2023-12-21 12:15:17,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=29800.0, ans=0.004391304347826087 2023-12-21 12:15:22,056 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=23.87 vs. limit=22.5 2023-12-21 12:15:24,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=29866.666666666668, ans=0.125 2023-12-21 12:15:32,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=29933.333333333332, ans=0.0043623188405797104 2023-12-21 12:15:40,551 INFO [train.py:886] (1/4) Epoch 1, batch 4500, loss[loss=0.01876, audio_tagging_loss=0.01876, over 25000.00 frames. ], tot_loss[loss=0.01903, audio_tagging_loss=0.01903, over 4943590.87 frames. ], batch size: 100, lr: 4.17e-02, grad_scale: 128.0 2023-12-21 12:15:43,807 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.606e+01 2.897e+01 3.074e+01 4.883e+01, threshold=5.793e+01, percent-clipped=0.0 2023-12-21 12:16:10,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=30133.333333333332, ans=0.004318840579710145 2023-12-21 12:16:18,706 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.97 vs. limit=22.5 2023-12-21 12:16:25,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.34 vs. limit=22.5 2023-12-21 12:16:34,306 INFO [train.py:886] (1/4) Epoch 1, batch 4550, loss[loss=0.01992, audio_tagging_loss=0.01992, over 25000.00 frames. ], tot_loss[loss=0.01903, audio_tagging_loss=0.01903, over 4946880.95 frames. ], batch size: 100, lr: 4.16e-02, grad_scale: 128.0 2023-12-21 12:16:37,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=30333.333333333332, ans=0.0 2023-12-21 12:16:48,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=19.56 vs. limit=15.0 2023-12-21 12:16:50,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=30400.0, ans=0.125 2023-12-21 12:16:51,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=30400.0, ans=0.1 2023-12-21 12:16:55,972 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.78 vs. limit=15.0 2023-12-21 12:17:06,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=30533.333333333332, ans=0.0 2023-12-21 12:17:11,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=30533.333333333332, ans=0.07 2023-12-21 12:17:15,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=30600.0, ans=0.0 2023-12-21 12:17:27,228 INFO [train.py:886] (1/4) Epoch 1, batch 4600, loss[loss=0.02143, audio_tagging_loss=0.02143, over 25000.00 frames. ], tot_loss[loss=0.01899, audio_tagging_loss=0.01899, over 4946640.33 frames. ], batch size: 100, lr: 4.15e-02, grad_scale: 128.0 2023-12-21 12:17:29,157 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.032e+01 2.518e+01 2.768e+01 3.063e+01 4.476e+01, threshold=5.536e+01, percent-clipped=0.0 2023-12-21 12:17:30,626 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.72 vs. limit=22.5 2023-12-21 12:17:35,535 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.83 vs. limit=15.0 2023-12-21 12:17:36,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=30733.333333333332, ans=0.125 2023-12-21 12:17:44,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=30733.333333333332, ans=0.0 2023-12-21 12:17:46,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=30800.0, ans=0.1 2023-12-21 12:17:49,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=30800.0, ans=0.125 2023-12-21 12:17:58,361 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.66 vs. limit=22.5 2023-12-21 12:18:08,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=30933.333333333332, ans=0.125 2023-12-21 12:18:17,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=30933.333333333332, ans=0.125 2023-12-21 12:18:17,162 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=15.0 2023-12-21 12:18:18,732 INFO [train.py:886] (1/4) Epoch 1, batch 4650, loss[loss=0.01807, audio_tagging_loss=0.01807, over 25000.00 frames. ], tot_loss[loss=0.01894, audio_tagging_loss=0.01894, over 4953257.32 frames. ], batch size: 100, lr: 4.15e-02, grad_scale: 128.0 2023-12-21 12:18:21,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=31000.0, ans=0.2 2023-12-21 12:18:28,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=31066.666666666668, ans=0.125 2023-12-21 12:18:54,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=31200.0, ans=0.125 2023-12-21 12:19:10,783 INFO [train.py:886] (1/4) Epoch 1, batch 4700, loss[loss=0.01727, audio_tagging_loss=0.01727, over 24750.00 frames. ], tot_loss[loss=0.01898, audio_tagging_loss=0.01898, over 4949585.18 frames. ], batch size: 99, lr: 4.14e-02, grad_scale: 128.0 2023-12-21 12:19:12,552 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.160e+01 2.530e+01 2.695e+01 2.950e+01 3.950e+01, threshold=5.391e+01, percent-clipped=0.0 2023-12-21 12:19:14,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=31333.333333333332, ans=0.125 2023-12-21 12:19:18,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=31333.333333333332, ans=0.1 2023-12-21 12:19:22,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=31400.0, ans=0.125 2023-12-21 12:19:42,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=31533.333333333332, ans=0.125 2023-12-21 12:19:46,861 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.02 vs. limit=15.0 2023-12-21 12:19:49,582 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.10 vs. limit=10.0 2023-12-21 12:19:57,270 INFO [train.py:886] (1/4) Epoch 1, batch 4750, loss[loss=0.01944, audio_tagging_loss=0.01944, over 24750.00 frames. ], tot_loss[loss=0.01906, audio_tagging_loss=0.01906, over 4948588.55 frames. ], batch size: 99, lr: 4.14e-02, grad_scale: 128.0 2023-12-21 12:20:03,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=31666.666666666668, ans=0.125 2023-12-21 12:20:36,408 INFO [train.py:886] (1/4) Epoch 2, batch 0, loss[loss=0.04267, audio_tagging_loss=0.04267, over 25000.00 frames. ], tot_loss[loss=0.04267, audio_tagging_loss=0.04267, over 25000.00 frames. ], batch size: 100, lr: 4.05e-02, grad_scale: 128.0 2023-12-21 12:20:36,409 INFO [train.py:909] (1/4) Computing validation loss 2023-12-21 12:20:51,087 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.3386, 2.6639, 2.6063, 2.5027], device='cuda:1') 2023-12-21 12:20:59,083 INFO [train.py:917] (1/4) Epoch 2, validation: loss=0.0423, audio_tagging_loss=0.0423, over 3737520.00 frames. 2023-12-21 12:20:59,083 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-21 12:21:02,446 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=25.34 vs. limit=15.0 2023-12-21 12:21:09,046 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.42 vs. limit=22.5 2023-12-21 12:21:15,898 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.22 vs. limit=22.5 2023-12-21 12:21:19,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=31906.666666666668, ans=0.125 2023-12-21 12:21:19,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=31906.666666666668, ans=0.125 2023-12-21 12:21:20,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=31906.666666666668, ans=0.07 2023-12-21 12:21:35,980 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.191e+01 2.667e+01 2.944e+01 3.472e+01 1.120e+02, threshold=5.887e+01, percent-clipped=2.0 2023-12-21 12:21:45,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=32040.0, ans=0.0 2023-12-21 12:21:49,437 INFO [train.py:886] (1/4) Epoch 2, batch 50, loss[loss=0.02435, audio_tagging_loss=0.02435, over 25000.00 frames. ], tot_loss[loss=0.03056, audio_tagging_loss=0.03056, over 1123175.39 frames. ], batch size: 100, lr: 4.05e-02, grad_scale: 128.0 2023-12-21 12:21:54,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=32106.666666666668, ans=0.0038898550724637687 2023-12-21 12:21:55,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=32106.666666666668, ans=0.0 2023-12-21 12:21:57,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=32106.666666666668, ans=0.025 2023-12-21 12:21:58,664 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.89 vs. limit=15.0 2023-12-21 12:22:10,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=32240.0, ans=0.125 2023-12-21 12:22:21,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=32306.666666666668, ans=0.2 2023-12-21 12:22:27,603 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.10 vs. limit=15.0 2023-12-21 12:22:41,791 INFO [train.py:886] (1/4) Epoch 2, batch 100, loss[loss=0.01838, audio_tagging_loss=0.01838, over 25000.00 frames. ], tot_loss[loss=0.02605, audio_tagging_loss=0.02605, over 1976891.24 frames. ], batch size: 100, lr: 4.04e-02, grad_scale: 128.0 2023-12-21 12:22:46,336 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.71 vs. limit=15.0 2023-12-21 12:22:58,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=32506.666666666668, ans=0.07 2023-12-21 12:23:00,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=32573.333333333332, ans=0.125 2023-12-21 12:23:18,387 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+01 2.835e+01 3.088e+01 3.489e+01 4.316e+01, threshold=6.177e+01, percent-clipped=0.0 2023-12-21 12:23:22,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=32706.666666666668, ans=0.125 2023-12-21 12:23:31,829 INFO [train.py:886] (1/4) Epoch 2, batch 150, loss[loss=0.01781, audio_tagging_loss=0.01781, over 24750.00 frames. ], tot_loss[loss=0.02349, audio_tagging_loss=0.02349, over 2638904.72 frames. ], batch size: 99, lr: 4.04e-02, grad_scale: 128.0 2023-12-21 12:23:38,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=32773.333333333336, ans=0.2 2023-12-21 12:23:44,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=32840.0, ans=0.0 2023-12-21 12:24:00,453 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.56 vs. limit=15.0 2023-12-21 12:24:08,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=32973.333333333336, ans=0.0037014492753623188 2023-12-21 12:24:23,410 INFO [train.py:886] (1/4) Epoch 2, batch 200, loss[loss=0.02027, audio_tagging_loss=0.02027, over 25000.00 frames. ], tot_loss[loss=0.02195, audio_tagging_loss=0.02195, over 3152089.34 frames. ], batch size: 100, lr: 4.03e-02, grad_scale: 128.0 2023-12-21 12:24:25,685 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.15 vs. limit=15.0 2023-12-21 12:24:27,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=33106.666666666664, ans=10.0 2023-12-21 12:24:46,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.81 vs. limit=10.0 2023-12-21 12:24:50,040 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.13 vs. limit=15.0 2023-12-21 12:24:59,199 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.015e+01 2.541e+01 2.777e+01 3.081e+01 4.614e+01, threshold=5.554e+01, percent-clipped=0.0 2023-12-21 12:25:02,475 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.42 vs. limit=15.0 2023-12-21 12:25:06,468 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.96 vs. limit=6.0 2023-12-21 12:25:09,161 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.61 vs. limit=15.0 2023-12-21 12:25:12,701 INFO [train.py:886] (1/4) Epoch 2, batch 250, loss[loss=0.0243, audio_tagging_loss=0.0243, over 24939.00 frames. ], tot_loss[loss=0.02124, audio_tagging_loss=0.02124, over 3555481.07 frames. ], batch size: 100, lr: 4.02e-02, grad_scale: 128.0 2023-12-21 12:25:23,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=33506.666666666664, ans=0.125 2023-12-21 12:25:25,369 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.14 vs. limit=6.0 2023-12-21 12:25:39,771 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.07 vs. limit=15.0 2023-12-21 12:25:42,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=33640.0, ans=0.125 2023-12-21 12:25:45,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=33640.0, ans=0.1 2023-12-21 12:25:48,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=33640.0, ans=0.0 2023-12-21 12:26:01,517 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.093e+01 2023-12-21 12:26:02,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=33706.666666666664, ans=0.125 2023-12-21 12:26:05,322 INFO [train.py:886] (1/4) Epoch 2, batch 300, loss[loss=0.01949, audio_tagging_loss=0.01949, over 24750.00 frames. ], tot_loss[loss=0.02059, audio_tagging_loss=0.02059, over 3861733.32 frames. ], batch size: 99, lr: 4.02e-02, grad_scale: 128.0 2023-12-21 12:26:06,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=33773.333333333336, ans=0.1 2023-12-21 12:26:14,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=33840.0, ans=0.1 2023-12-21 12:26:21,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=33840.0, ans=0.1 2023-12-21 12:26:25,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=33906.666666666664, ans=0.2 2023-12-21 12:26:27,962 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.14 vs. limit=10.0 2023-12-21 12:26:29,005 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.23 vs. limit=15.0 2023-12-21 12:26:32,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=33906.666666666664, ans=0.0 2023-12-21 12:26:41,964 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.636e+01 2.849e+01 3.270e+01 4.493e+01, threshold=5.697e+01, percent-clipped=0.0 2023-12-21 12:26:55,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=34040.0, ans=0.0 2023-12-21 12:26:57,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.83 vs. limit=6.0 2023-12-21 12:26:58,350 INFO [train.py:886] (1/4) Epoch 2, batch 350, loss[loss=0.01429, audio_tagging_loss=0.01429, over 24750.00 frames. ], tot_loss[loss=0.02021, audio_tagging_loss=0.02021, over 4099639.84 frames. ], batch size: 99, lr: 4.01e-02, grad_scale: 128.0 2023-12-21 12:27:05,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=34106.666666666664, ans=0.125 2023-12-21 12:27:10,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=34173.333333333336, ans=0.0 2023-12-21 12:27:34,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=34306.666666666664, ans=0.003411594202898551 2023-12-21 12:27:48,705 INFO [train.py:886] (1/4) Epoch 2, batch 400, loss[loss=0.02173, audio_tagging_loss=0.02173, over 24750.00 frames. ], tot_loss[loss=0.01973, audio_tagging_loss=0.01973, over 4282294.11 frames. ], batch size: 99, lr: 4.00e-02, grad_scale: 128.0 2023-12-21 12:28:05,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=34506.666666666664, ans=0.125 2023-12-21 12:28:06,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=34506.666666666664, ans=0.2 2023-12-21 12:28:10,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=34573.333333333336, ans=0.125 2023-12-21 12:28:12,394 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=20.61 vs. limit=22.5 2023-12-21 12:28:21,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=34640.0, ans=0.125 2023-12-21 12:28:21,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.02 vs. limit=15.0 2023-12-21 12:28:24,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=34640.0, ans=0.0033391304347826084 2023-12-21 12:28:27,071 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.066e+01 2.614e+01 2.832e+01 3.282e+01 4.627e+01, threshold=5.664e+01, percent-clipped=0.0 2023-12-21 12:28:42,753 INFO [train.py:886] (1/4) Epoch 2, batch 450, loss[loss=0.01869, audio_tagging_loss=0.01869, over 24910.00 frames. ], tot_loss[loss=0.01937, audio_tagging_loss=0.01937, over 4432517.08 frames. ], batch size: 100, lr: 4.00e-02, grad_scale: 128.0 2023-12-21 12:28:49,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=34773.333333333336, ans=0.09899494936611666 2023-12-21 12:29:04,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=34906.666666666664, ans=0.0032811594202898555 2023-12-21 12:29:04,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=34906.666666666664, ans=0.2 2023-12-21 12:29:35,148 INFO [train.py:886] (1/4) Epoch 2, batch 500, loss[loss=0.02407, audio_tagging_loss=0.02407, over 24750.00 frames. ], tot_loss[loss=0.01915, audio_tagging_loss=0.01915, over 4552238.94 frames. ], batch size: 99, lr: 3.99e-02, grad_scale: 128.0 2023-12-21 12:29:38,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=35106.666666666664, ans=0.125 2023-12-21 12:29:50,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=35173.333333333336, ans=0.2 2023-12-21 12:30:13,255 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.483e+01 2.716e+01 2.937e+01 3.953e+01, threshold=5.433e+01, percent-clipped=0.0 2023-12-21 12:30:23,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=35373.333333333336, ans=0.0031797101449275358 2023-12-21 12:30:27,309 INFO [train.py:886] (1/4) Epoch 2, batch 550, loss[loss=0.02056, audio_tagging_loss=0.02056, over 24750.00 frames. ], tot_loss[loss=0.01905, audio_tagging_loss=0.01905, over 4641560.18 frames. ], batch size: 99, lr: 3.99e-02, grad_scale: 128.0 2023-12-21 12:30:33,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=35440.0, ans=0.003165217391304347 2023-12-21 12:30:51,108 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.57 vs. limit=15.0 2023-12-21 12:30:52,213 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.00 vs. limit=10.0 2023-12-21 12:30:59,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=35640.0, ans=0.125 2023-12-21 12:31:16,240 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.93 vs. limit=6.0 2023-12-21 12:31:17,393 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=23.22 vs. limit=22.5 2023-12-21 12:31:20,663 INFO [train.py:886] (1/4) Epoch 2, batch 600, loss[loss=0.02108, audio_tagging_loss=0.02108, over 24750.00 frames. ], tot_loss[loss=0.0191, audio_tagging_loss=0.0191, over 4709787.95 frames. ], batch size: 99, lr: 3.98e-02, grad_scale: 128.0 2023-12-21 12:31:23,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=35773.333333333336, ans=0.125 2023-12-21 12:31:24,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=35773.333333333336, ans=0.125 2023-12-21 12:31:27,685 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.96 vs. limit=15.0 2023-12-21 12:31:28,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=35773.333333333336, ans=0.09899494936611666 2023-12-21 12:31:31,470 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.73 vs. limit=15.0 2023-12-21 12:31:39,140 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=25.93 vs. limit=22.5 2023-12-21 12:31:47,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=35906.666666666664, ans=0.125 2023-12-21 12:31:48,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=35906.666666666664, ans=0.003063768115942029 2023-12-21 12:31:58,312 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.175e+01 2.564e+01 2.794e+01 3.187e+01 4.110e+01, threshold=5.587e+01, percent-clipped=0.0 2023-12-21 12:31:59,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=35973.333333333336, ans=0.125 2023-12-21 12:32:04,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=36040.0, ans=15.0 2023-12-21 12:32:09,285 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.54 vs. limit=22.5 2023-12-21 12:32:09,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=36040.0, ans=0.1 2023-12-21 12:32:11,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=36106.666666666664, ans=0.2 2023-12-21 12:32:12,565 INFO [train.py:886] (1/4) Epoch 2, batch 650, loss[loss=0.01666, audio_tagging_loss=0.01666, over 24750.00 frames. ], tot_loss[loss=0.01914, audio_tagging_loss=0.01914, over 4756459.19 frames. ], batch size: 99, lr: 3.97e-02, grad_scale: 128.0 2023-12-21 12:32:24,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=36173.333333333336, ans=0.2 2023-12-21 12:32:30,518 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.23 vs. limit=22.5 2023-12-21 12:32:34,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=36240.0, ans=22.5 2023-12-21 12:32:59,128 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.75 vs. limit=15.0 2023-12-21 12:33:03,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=36373.333333333336, ans=0.125 2023-12-21 12:33:06,076 INFO [train.py:886] (1/4) Epoch 2, batch 700, loss[loss=0.01797, audio_tagging_loss=0.01797, over 25000.00 frames. ], tot_loss[loss=0.01899, audio_tagging_loss=0.01899, over 4800492.34 frames. ], batch size: 100, lr: 3.97e-02, grad_scale: 128.0 2023-12-21 12:33:18,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=36506.666666666664, ans=10.0 2023-12-21 12:33:18,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=36506.666666666664, ans=0.0 2023-12-21 12:33:29,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=36573.333333333336, ans=0.07 2023-12-21 12:33:30,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=36573.333333333336, ans=0.125 2023-12-21 12:33:39,123 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.99 vs. limit=22.5 2023-12-21 12:33:41,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=36640.0, ans=0.1 2023-12-21 12:33:43,498 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.539e+01 2.879e+01 3.158e+01 4.912e+01, threshold=5.759e+01, percent-clipped=0.0 2023-12-21 12:33:59,166 INFO [train.py:886] (1/4) Epoch 2, batch 750, loss[loss=0.01909, audio_tagging_loss=0.01909, over 25000.00 frames. ], tot_loss[loss=0.01888, audio_tagging_loss=0.01888, over 4833685.04 frames. ], batch size: 100, lr: 3.96e-02, grad_scale: 128.0 2023-12-21 12:34:00,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=36773.333333333336, ans=0.125 2023-12-21 12:34:07,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=36773.333333333336, ans=0.2 2023-12-21 12:34:09,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=36840.0, ans=0.0 2023-12-21 12:34:21,784 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=19.59 vs. limit=22.5 2023-12-21 12:34:34,412 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.11 vs. limit=22.5 2023-12-21 12:34:42,406 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.38 vs. limit=15.0 2023-12-21 12:34:50,617 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=20.09 vs. limit=15.0 2023-12-21 12:34:51,117 INFO [train.py:886] (1/4) Epoch 2, batch 800, loss[loss=0.01784, audio_tagging_loss=0.01784, over 25000.00 frames. ], tot_loss[loss=0.01879, audio_tagging_loss=0.01879, over 4862546.23 frames. ], batch size: 100, lr: 3.95e-02, grad_scale: 128.0 2023-12-21 12:34:51,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=37106.666666666664, ans=0.0 2023-12-21 12:34:56,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=37106.666666666664, ans=0.0028028985507246376 2023-12-21 12:34:56,992 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=16.47 vs. limit=15.0 2023-12-21 12:34:59,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.96 vs. limit=15.0 2023-12-21 12:35:05,333 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.97 vs. limit=15.0 2023-12-21 12:35:08,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=37173.333333333336, ans=0.125 2023-12-21 12:35:14,116 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2023-12-21 12:35:25,701 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=21.96 vs. limit=22.5 2023-12-21 12:35:29,688 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.984e+01 2.588e+01 2.884e+01 3.147e+01 4.791e+01, threshold=5.768e+01, percent-clipped=0.0 2023-12-21 12:35:36,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=37373.333333333336, ans=0.125 2023-12-21 12:35:44,795 INFO [train.py:886] (1/4) Epoch 2, batch 850, loss[loss=0.01766, audio_tagging_loss=0.01766, over 25000.00 frames. ], tot_loss[loss=0.01864, audio_tagging_loss=0.01864, over 4888461.64 frames. ], batch size: 100, lr: 3.95e-02, grad_scale: 128.0 2023-12-21 12:35:47,428 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.89 vs. limit=15.0 2023-12-21 12:35:53,095 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.82 vs. limit=15.0 2023-12-21 12:36:04,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=37573.333333333336, ans=0.125 2023-12-21 12:36:06,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=37573.333333333336, ans=0.125 2023-12-21 12:36:13,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.19 vs. limit=6.0 2023-12-21 12:36:21,459 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.04 vs. limit=15.0 2023-12-21 12:36:30,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=37706.666666666664, ans=0.125 2023-12-21 12:36:37,463 INFO [train.py:886] (1/4) Epoch 2, batch 900, loss[loss=0.02088, audio_tagging_loss=0.02088, over 24750.00 frames. ], tot_loss[loss=0.01876, audio_tagging_loss=0.01876, over 4902269.49 frames. ], batch size: 99, lr: 3.94e-02, grad_scale: 128.0 2023-12-21 12:36:59,056 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.96 vs. limit=15.0 2023-12-21 12:37:04,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=37906.666666666664, ans=0.002628985507246377 2023-12-21 12:37:10,098 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.65 vs. limit=15.0 2023-12-21 12:37:14,919 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.584e+01 2.867e+01 3.127e+01 3.908e+01, threshold=5.734e+01, percent-clipped=0.0 2023-12-21 12:37:23,187 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.86 vs. limit=22.5 2023-12-21 12:37:28,452 INFO [train.py:886] (1/4) Epoch 2, batch 950, loss[loss=0.01967, audio_tagging_loss=0.01967, over 24750.00 frames. ], tot_loss[loss=0.0189, audio_tagging_loss=0.0189, over 4907018.22 frames. ], batch size: 99, lr: 3.94e-02, grad_scale: 128.0 2023-12-21 12:37:47,451 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=27.24 vs. limit=22.5 2023-12-21 12:38:04,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=38306.666666666664, ans=0.1 2023-12-21 12:38:05,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=38306.666666666664, ans=0.125 2023-12-21 12:38:06,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=38306.666666666664, ans=0.125 2023-12-21 12:38:15,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=38373.333333333336, ans=0.0 2023-12-21 12:38:18,011 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.89 vs. limit=22.5 2023-12-21 12:38:22,433 INFO [train.py:886] (1/4) Epoch 2, batch 1000, loss[loss=0.019, audio_tagging_loss=0.019, over 25000.00 frames. ], tot_loss[loss=0.0189, audio_tagging_loss=0.0189, over 4911780.87 frames. ], batch size: 100, lr: 3.93e-02, grad_scale: 128.0 2023-12-21 12:38:29,676 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.19 vs. limit=22.5 2023-12-21 12:38:43,428 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.90 vs. limit=15.0 2023-12-21 12:38:59,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=38640.0, ans=0.125 2023-12-21 12:39:00,023 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.080e+01 2.513e+01 2.801e+01 3.177e+01 4.242e+01, threshold=5.602e+01, percent-clipped=0.0 2023-12-21 12:39:04,369 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=42.15 vs. limit=15.0 2023-12-21 12:39:05,480 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2023-12-21 12:39:07,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=38706.666666666664, ans=0.125 2023-12-21 12:39:08,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=38706.666666666664, ans=0.125 2023-12-21 12:39:14,307 INFO [train.py:886] (1/4) Epoch 2, batch 1050, loss[loss=0.01797, audio_tagging_loss=0.01797, over 25000.00 frames. ], tot_loss[loss=0.01866, audio_tagging_loss=0.01866, over 4921867.52 frames. ], batch size: 100, lr: 3.92e-02, grad_scale: 128.0 2023-12-21 12:39:14,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=38773.333333333336, ans=0.125 2023-12-21 12:39:25,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=38840.0, ans=0.035 2023-12-21 12:39:53,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.03 vs. limit=22.5 2023-12-21 12:39:55,936 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.58 vs. limit=22.5 2023-12-21 12:39:57,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=39040.0, ans=0.0 2023-12-21 12:40:02,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=39040.0, ans=0.95 2023-12-21 12:40:04,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=39040.0, ans=0.0023826086956521735 2023-12-21 12:40:06,878 INFO [train.py:886] (1/4) Epoch 2, batch 1100, loss[loss=0.01491, audio_tagging_loss=0.01491, over 25000.00 frames. ], tot_loss[loss=0.0186, audio_tagging_loss=0.0186, over 4928121.95 frames. ], batch size: 100, lr: 3.92e-02, grad_scale: 128.0 2023-12-21 12:40:08,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=39106.666666666664, ans=0.0 2023-12-21 12:40:17,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=39173.333333333336, ans=0.125 2023-12-21 12:40:29,309 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.903e+00 2023-12-21 12:40:35,014 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.321e+01 2023-12-21 12:40:43,541 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.542e+01 2.826e+01 3.168e+01 4.060e+01, threshold=5.651e+01, percent-clipped=0.0 2023-12-21 12:40:45,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=39306.666666666664, ans=0.0 2023-12-21 12:40:59,812 INFO [train.py:886] (1/4) Epoch 2, batch 1150, loss[loss=0.02, audio_tagging_loss=0.02, over 25000.00 frames. ], tot_loss[loss=0.01856, audio_tagging_loss=0.01856, over 4930035.23 frames. ], batch size: 100, lr: 3.91e-02, grad_scale: 128.0 2023-12-21 12:41:01,146 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=24.07 vs. limit=22.5 2023-12-21 12:41:03,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=39440.0, ans=0.2 2023-12-21 12:41:13,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=39506.666666666664, ans=0.1 2023-12-21 12:41:22,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.87 vs. limit=15.0 2023-12-21 12:41:30,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=39640.0, ans=0.125 2023-12-21 12:41:46,422 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=6.134e+00 2023-12-21 12:41:50,041 INFO [train.py:886] (1/4) Epoch 2, batch 1200, loss[loss=0.01685, audio_tagging_loss=0.01685, over 25000.00 frames. ], tot_loss[loss=0.01866, audio_tagging_loss=0.01866, over 4939299.43 frames. ], batch size: 100, lr: 3.90e-02, grad_scale: 128.0 2023-12-21 12:42:07,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=39840.0, ans=0.002208695652173912 2023-12-21 12:42:26,680 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.270e+01 2.581e+01 2.851e+01 3.035e+01 4.083e+01, threshold=5.702e+01, percent-clipped=0.0 2023-12-21 12:42:42,774 INFO [train.py:886] (1/4) Epoch 2, batch 1250, loss[loss=0.01731, audio_tagging_loss=0.01731, over 24057.00 frames. ], tot_loss[loss=0.01885, audio_tagging_loss=0.01885, over 4938037.32 frames. ], batch size: 100, lr: 3.90e-02, grad_scale: 128.0 2023-12-21 12:42:46,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=40106.666666666664, ans=0.1 2023-12-21 12:43:07,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=40240.0, ans=0.125 2023-12-21 12:43:07,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=40240.0, ans=0.2 2023-12-21 12:43:13,200 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=19.71 vs. limit=15.0 2023-12-21 12:43:18,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=40306.666666666664, ans=0.2 2023-12-21 12:43:23,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=40373.333333333336, ans=0.125 2023-12-21 12:43:24,798 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.78 vs. limit=15.0 2023-12-21 12:43:25,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=40373.333333333336, ans=0.125 2023-12-21 12:43:28,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=40373.333333333336, ans=0.125 2023-12-21 12:43:28,768 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.60 vs. limit=15.0 2023-12-21 12:43:34,326 INFO [train.py:886] (1/4) Epoch 2, batch 1300, loss[loss=0.01839, audio_tagging_loss=0.01839, over 25000.00 frames. ], tot_loss[loss=0.01888, audio_tagging_loss=0.01888, over 4940079.91 frames. ], batch size: 100, lr: 3.89e-02, grad_scale: 128.0 2023-12-21 12:43:37,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=40440.0, ans=15.0 2023-12-21 12:43:39,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=40440.0, ans=0.125 2023-12-21 12:43:43,953 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.24 vs. limit=15.0 2023-12-21 12:43:44,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=40506.666666666664, ans=0.125 2023-12-21 12:44:00,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=40573.333333333336, ans=0.1 2023-12-21 12:44:07,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=40640.0, ans=0.95 2023-12-21 12:44:07,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=40640.0, ans=0.125 2023-12-21 12:44:09,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=40640.0, ans=0.1 2023-12-21 12:44:10,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=40640.0, ans=0.125 2023-12-21 12:44:11,099 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.17 vs. limit=22.5 2023-12-21 12:44:11,692 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.195e+01 2.485e+01 2.836e+01 3.251e+01 4.235e+01, threshold=5.672e+01, percent-clipped=0.0 2023-12-21 12:44:25,326 INFO [train.py:886] (1/4) Epoch 2, batch 1350, loss[loss=0.01957, audio_tagging_loss=0.01957, over 25000.00 frames. ], tot_loss[loss=0.01874, audio_tagging_loss=0.01874, over 4944409.91 frames. ], batch size: 100, lr: 3.88e-02, grad_scale: 128.0 2023-12-21 12:44:27,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=40773.333333333336, ans=0.015 2023-12-21 12:44:31,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=40773.333333333336, ans=0.1 2023-12-21 12:44:45,326 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.84 vs. limit=15.0 2023-12-21 12:44:55,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.08 vs. limit=15.0 2023-12-21 12:45:10,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=41040.0, ans=0.0 2023-12-21 12:45:16,519 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.93 vs. limit=15.0 2023-12-21 12:45:17,963 INFO [train.py:886] (1/4) Epoch 2, batch 1400, loss[loss=0.01725, audio_tagging_loss=0.01725, over 25000.00 frames. ], tot_loss[loss=0.01852, audio_tagging_loss=0.01852, over 4947541.60 frames. ], batch size: 100, lr: 3.88e-02, grad_scale: 128.0 2023-12-21 12:45:25,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=41106.666666666664, ans=0.0019333333333333338 2023-12-21 12:45:43,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=41240.0, ans=0.125 2023-12-21 12:45:54,848 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.022e+01 2.439e+01 2.675e+01 2.970e+01 3.748e+01, threshold=5.350e+01, percent-clipped=0.0 2023-12-21 12:46:05,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=41373.333333333336, ans=0.1 2023-12-21 12:46:08,313 INFO [train.py:886] (1/4) Epoch 2, batch 1450, loss[loss=0.01875, audio_tagging_loss=0.01875, over 24750.00 frames. ], tot_loss[loss=0.01846, audio_tagging_loss=0.01846, over 4955006.03 frames. ], batch size: 99, lr: 3.87e-02, grad_scale: 128.0 2023-12-21 12:46:15,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=41440.0, ans=0.125 2023-12-21 12:46:19,560 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.54 vs. limit=22.5 2023-12-21 12:46:26,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=41506.666666666664, ans=0.0 2023-12-21 12:46:35,073 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.39 vs. limit=10.0 2023-12-21 12:46:37,081 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.08 vs. limit=15.0 2023-12-21 12:46:48,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=41640.0, ans=0.2 2023-12-21 12:46:51,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=41706.666666666664, ans=0.1 2023-12-21 12:46:53,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=41706.666666666664, ans=10.0 2023-12-21 12:46:58,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=41706.666666666664, ans=0.0 2023-12-21 12:47:01,210 INFO [train.py:886] (1/4) Epoch 2, batch 1500, loss[loss=0.02314, audio_tagging_loss=0.02314, over 25000.00 frames. ], tot_loss[loss=0.01856, audio_tagging_loss=0.01856, over 4960017.89 frames. ], batch size: 100, lr: 3.87e-02, grad_scale: 256.0 2023-12-21 12:47:03,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=41773.333333333336, ans=0.125 2023-12-21 12:47:12,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=41840.0, ans=0.125 2023-12-21 12:47:12,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=41840.0, ans=0.125 2023-12-21 12:47:19,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=41840.0, ans=0.2 2023-12-21 12:47:19,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=41840.0, ans=0.125 2023-12-21 12:47:20,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=41840.0, ans=0.125 2023-12-21 12:47:32,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=41973.333333333336, ans=0.07 2023-12-21 12:47:37,916 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.225e+01 2.550e+01 2.764e+01 3.124e+01 4.346e+01, threshold=5.529e+01, percent-clipped=0.0 2023-12-21 12:47:40,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=41973.333333333336, ans=10.0 2023-12-21 12:47:42,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=42040.0, ans=0.2 2023-12-21 12:47:52,771 INFO [train.py:886] (1/4) Epoch 2, batch 1550, loss[loss=0.01859, audio_tagging_loss=0.01859, over 24750.00 frames. ], tot_loss[loss=0.01871, audio_tagging_loss=0.01871, over 4955315.87 frames. ], batch size: 99, lr: 3.86e-02, grad_scale: 256.0 2023-12-21 12:47:55,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=42106.666666666664, ans=0.0 2023-12-21 12:47:57,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=42106.666666666664, ans=0.2 2023-12-21 12:47:58,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=42106.666666666664, ans=10.0 2023-12-21 12:47:59,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=42106.666666666664, ans=0.125 2023-12-21 12:48:02,704 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.43 vs. limit=15.0 2023-12-21 12:48:05,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=42173.333333333336, ans=0.0 2023-12-21 12:48:10,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=42173.333333333336, ans=0.125 2023-12-21 12:48:27,646 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.71 vs. limit=15.0 2023-12-21 12:48:27,787 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=17.33 vs. limit=15.0 2023-12-21 12:48:30,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=42306.666666666664, ans=0.1 2023-12-21 12:48:35,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=42373.333333333336, ans=0.07 2023-12-21 12:48:40,921 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.21 vs. limit=15.0 2023-12-21 12:48:43,468 INFO [train.py:886] (1/4) Epoch 2, batch 1600, loss[loss=0.01861, audio_tagging_loss=0.01861, over 24750.00 frames. ], tot_loss[loss=0.01869, audio_tagging_loss=0.01869, over 4952857.05 frames. ], batch size: 99, lr: 3.85e-02, grad_scale: 256.0 2023-12-21 12:48:49,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=42440.0, ans=0.2 2023-12-21 12:48:52,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=42440.0, ans=0.125 2023-12-21 12:48:55,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=42506.666666666664, ans=0.0 2023-12-21 12:48:57,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=42506.666666666664, ans=0.95 2023-12-21 12:49:07,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=42573.333333333336, ans=0.0 2023-12-21 12:49:10,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=42573.333333333336, ans=0.1 2023-12-21 12:49:10,370 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.76 vs. limit=15.0 2023-12-21 12:49:21,558 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.093e+01 2.625e+01 2.827e+01 3.147e+01 4.034e+01, threshold=5.654e+01, percent-clipped=0.0 2023-12-21 12:49:33,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=42706.666666666664, ans=0.1 2023-12-21 12:49:36,918 INFO [train.py:886] (1/4) Epoch 2, batch 1650, loss[loss=0.01718, audio_tagging_loss=0.01718, over 24750.00 frames. ], tot_loss[loss=0.01851, audio_tagging_loss=0.01851, over 4950897.36 frames. ], batch size: 99, lr: 3.85e-02, grad_scale: 256.0 2023-12-21 12:49:37,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=42773.333333333336, ans=0.125 2023-12-21 12:49:46,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=42840.0, ans=0.04949747468305833 2023-12-21 12:49:56,891 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=15.0 2023-12-21 12:49:59,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=42906.666666666664, ans=0.125 2023-12-21 12:50:00,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=42906.666666666664, ans=0.125 2023-12-21 12:50:11,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=42973.333333333336, ans=0.0 2023-12-21 12:50:22,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.88 vs. limit=15.0 2023-12-21 12:50:23,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=43040.0, ans=0.0 2023-12-21 12:50:25,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=43040.0, ans=0.125 2023-12-21 12:50:29,839 INFO [train.py:886] (1/4) Epoch 2, batch 1700, loss[loss=0.01818, audio_tagging_loss=0.01818, over 25000.00 frames. ], tot_loss[loss=0.01853, audio_tagging_loss=0.01853, over 4940627.21 frames. ], batch size: 100, lr: 3.84e-02, grad_scale: 256.0 2023-12-21 12:50:43,447 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=12.0 2023-12-21 12:50:53,522 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.46 vs. limit=15.0 2023-12-21 12:51:01,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=43306.666666666664, ans=0.125 2023-12-21 12:51:07,475 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.174e+01 2.509e+01 2.799e+01 3.084e+01 4.189e+01, threshold=5.598e+01, percent-clipped=0.0 2023-12-21 12:51:16,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=43373.333333333336, ans=0.125 2023-12-21 12:51:21,605 INFO [train.py:886] (1/4) Epoch 2, batch 1750, loss[loss=0.0149, audio_tagging_loss=0.0149, over 25000.00 frames. ], tot_loss[loss=0.01838, audio_tagging_loss=0.01838, over 4951076.86 frames. ], batch size: 100, lr: 3.83e-02, grad_scale: 256.0 2023-12-21 12:51:30,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=43506.666666666664, ans=0.125 2023-12-21 12:51:58,098 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.86 vs. limit=6.0 2023-12-21 12:51:59,396 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.50 vs. limit=6.0 2023-12-21 12:52:02,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=43706.666666666664, ans=0.0 2023-12-21 12:52:14,321 INFO [train.py:886] (1/4) Epoch 2, batch 1800, loss[loss=0.01692, audio_tagging_loss=0.01692, over 25000.00 frames. ], tot_loss[loss=0.01839, audio_tagging_loss=0.01839, over 4959233.61 frames. ], batch size: 100, lr: 3.83e-02, grad_scale: 256.0 2023-12-21 12:52:20,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=43773.333333333336, ans=0.125 2023-12-21 12:52:30,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=43840.0, ans=0.0 2023-12-21 12:52:35,015 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=30.52 vs. limit=15.0 2023-12-21 12:52:35,198 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=16.31 vs. limit=15.0 2023-12-21 12:52:49,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=43973.333333333336, ans=0.1 2023-12-21 12:52:51,538 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.458e+01 2.715e+01 2.989e+01 4.266e+01, threshold=5.430e+01, percent-clipped=0.0 2023-12-21 12:52:51,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=43973.333333333336, ans=0.0 2023-12-21 12:53:05,769 INFO [train.py:886] (1/4) Epoch 2, batch 1850, loss[loss=0.01744, audio_tagging_loss=0.01744, over 25000.00 frames. ], tot_loss[loss=0.01853, audio_tagging_loss=0.01853, over 4951498.67 frames. ], batch size: 100, lr: 3.82e-02, grad_scale: 256.0 2023-12-21 12:53:05,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=44106.666666666664, ans=0.0 2023-12-21 12:53:13,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=44106.666666666664, ans=0.05 2023-12-21 12:53:24,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=44173.333333333336, ans=0.125 2023-12-21 12:53:33,102 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.69 vs. limit=22.5 2023-12-21 12:53:34,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=44240.0, ans=0.2 2023-12-21 12:53:35,337 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=21.49 vs. limit=22.5 2023-12-21 12:53:52,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=44373.333333333336, ans=0.0 2023-12-21 12:53:53,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=44373.333333333336, ans=0.125 2023-12-21 12:53:53,653 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.76 vs. limit=12.0 2023-12-21 12:53:54,613 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.96 vs. limit=10.0 2023-12-21 12:53:57,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=44373.333333333336, ans=0.1 2023-12-21 12:53:59,542 INFO [train.py:886] (1/4) Epoch 2, batch 1900, loss[loss=0.01728, audio_tagging_loss=0.01728, over 25000.00 frames. ], tot_loss[loss=0.0187, audio_tagging_loss=0.0187, over 4942580.34 frames. ], batch size: 100, lr: 3.81e-02, grad_scale: 256.0 2023-12-21 12:54:05,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=44440.0, ans=0.125 2023-12-21 12:54:15,344 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.35 vs. limit=10.0 2023-12-21 12:54:26,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=44573.333333333336, ans=0.125 2023-12-21 12:54:36,232 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+01 2.600e+01 2.818e+01 3.089e+01 5.483e+01, threshold=5.636e+01, percent-clipped=1.0 2023-12-21 12:54:42,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=44706.666666666664, ans=0.125 2023-12-21 12:54:52,155 INFO [train.py:886] (1/4) Epoch 2, batch 1950, loss[loss=0.01664, audio_tagging_loss=0.01664, over 25000.00 frames. ], tot_loss[loss=0.01858, audio_tagging_loss=0.01858, over 4937883.51 frames. ], batch size: 100, lr: 3.81e-02, grad_scale: 256.0 2023-12-21 12:55:25,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=20.12 vs. limit=22.5 2023-12-21 12:55:32,739 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.33 vs. limit=12.0 2023-12-21 12:55:44,295 INFO [train.py:886] (1/4) Epoch 2, batch 2000, loss[loss=0.01643, audio_tagging_loss=0.01643, over 24750.00 frames. ], tot_loss[loss=0.01834, audio_tagging_loss=0.01834, over 4946164.33 frames. ], batch size: 99, lr: 3.80e-02, grad_scale: 256.0 2023-12-21 12:55:44,830 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.92 vs. limit=15.0 2023-12-21 12:55:59,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=45173.333333333336, ans=0.125 2023-12-21 12:56:10,979 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=26.39 vs. limit=22.5 2023-12-21 12:56:17,327 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.295e+01 2023-12-21 12:56:19,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=45306.666666666664, ans=0.2 2023-12-21 12:56:23,061 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.490e+01 2.748e+01 3.106e+01 5.965e+01, threshold=5.495e+01, percent-clipped=1.0 2023-12-21 12:56:38,164 INFO [train.py:886] (1/4) Epoch 2, batch 2050, loss[loss=0.01442, audio_tagging_loss=0.01442, over 25000.00 frames. ], tot_loss[loss=0.01828, audio_tagging_loss=0.01828, over 4947888.40 frames. ], batch size: 100, lr: 3.80e-02, grad_scale: 256.0 2023-12-21 12:56:39,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=45440.0, ans=0.0009913043478260858 2023-12-21 12:56:43,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=45440.0, ans=0.1 2023-12-21 12:56:48,784 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.47 vs. limit=10.0 2023-12-21 12:57:09,554 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.72 vs. limit=15.0 2023-12-21 12:57:17,491 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.80 vs. limit=22.5 2023-12-21 12:57:31,674 INFO [train.py:886] (1/4) Epoch 2, batch 2100, loss[loss=0.01995, audio_tagging_loss=0.01995, over 25000.00 frames. ], tot_loss[loss=0.01823, audio_tagging_loss=0.01823, over 4952637.49 frames. ], batch size: 100, lr: 3.79e-02, grad_scale: 256.0 2023-12-21 12:57:44,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=45840.0, ans=0.5 2023-12-21 12:57:59,119 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2023-12-21 12:58:02,116 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.06 vs. limit=22.5 2023-12-21 12:58:10,348 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.165e+01 2.523e+01 2.813e+01 3.062e+01 4.027e+01, threshold=5.625e+01, percent-clipped=0.0 2023-12-21 12:58:21,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=46040.0, ans=0.125 2023-12-21 12:58:23,695 INFO [train.py:886] (1/4) Epoch 2, batch 2150, loss[loss=0.01729, audio_tagging_loss=0.01729, over 24750.00 frames. ], tot_loss[loss=0.01832, audio_tagging_loss=0.01832, over 4951078.55 frames. ], batch size: 99, lr: 3.78e-02, grad_scale: 256.0 2023-12-21 12:58:33,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=46173.333333333336, ans=0.125 2023-12-21 12:58:34,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=46173.333333333336, ans=0.125 2023-12-21 12:58:36,502 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.92 vs. limit=6.0 2023-12-21 12:58:57,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=46306.666666666664, ans=0.2 2023-12-21 12:59:03,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=46306.666666666664, ans=0.125 2023-12-21 12:59:03,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=46306.666666666664, ans=0.2 2023-12-21 12:59:16,523 INFO [train.py:886] (1/4) Epoch 2, batch 2200, loss[loss=0.02193, audio_tagging_loss=0.02193, over 24750.00 frames. ], tot_loss[loss=0.01848, audio_tagging_loss=0.01848, over 4947029.49 frames. ], batch size: 99, lr: 3.78e-02, grad_scale: 256.0 2023-12-21 12:59:21,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=46440.0, ans=0.1 2023-12-21 12:59:27,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=46506.666666666664, ans=22.5 2023-12-21 12:59:36,884 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.35 vs. limit=15.0 2023-12-21 12:59:48,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=46640.0, ans=0.1 2023-12-21 12:59:52,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=46640.0, ans=0.125 2023-12-21 12:59:54,647 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.055e+01 2.553e+01 2.739e+01 3.029e+01 4.205e+01, threshold=5.478e+01, percent-clipped=0.0 2023-12-21 12:59:54,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=46640.0, ans=0.125 2023-12-21 12:59:56,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=46640.0, ans=10.0 2023-12-21 13:00:01,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=46706.666666666664, ans=0.125 2023-12-21 13:00:09,442 INFO [train.py:886] (1/4) Epoch 2, batch 2250, loss[loss=0.01657, audio_tagging_loss=0.01657, over 24057.00 frames. ], tot_loss[loss=0.01857, audio_tagging_loss=0.01857, over 4940192.37 frames. ], batch size: 100, lr: 3.77e-02, grad_scale: 256.0 2023-12-21 13:00:15,166 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 13:00:18,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=46773.333333333336, ans=0.125 2023-12-21 13:00:24,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=46840.0, ans=0.0 2023-12-21 13:00:25,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=46840.0, ans=0.125 2023-12-21 13:00:27,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=46840.0, ans=0.00068695652173913 2023-12-21 13:01:01,796 INFO [train.py:886] (1/4) Epoch 2, batch 2300, loss[loss=0.01818, audio_tagging_loss=0.01818, over 24750.00 frames. ], tot_loss[loss=0.01839, audio_tagging_loss=0.01839, over 4944778.29 frames. ], batch size: 99, lr: 3.76e-02, grad_scale: 256.0 2023-12-21 13:01:13,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=47173.333333333336, ans=0.1 2023-12-21 13:01:14,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=47173.333333333336, ans=0.0 2023-12-21 13:01:27,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=47240.0, ans=0.0005999999999999998 2023-12-21 13:01:30,720 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=12.0 2023-12-21 13:01:31,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=47240.0, ans=0.04949747468305833 2023-12-21 13:01:38,829 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.192e+01 2.499e+01 2.770e+01 3.074e+01 4.050e+01, threshold=5.539e+01, percent-clipped=0.0 2023-12-21 13:01:39,158 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.023e+00 2023-12-21 13:01:54,323 INFO [train.py:886] (1/4) Epoch 2, batch 2350, loss[loss=0.01796, audio_tagging_loss=0.01796, over 25000.00 frames. ], tot_loss[loss=0.0183, audio_tagging_loss=0.0183, over 4947222.36 frames. ], batch size: 100, lr: 3.76e-02, grad_scale: 256.0 2023-12-21 13:02:16,535 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.69 vs. limit=10.0 2023-12-21 13:02:25,820 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.23 vs. limit=15.0 2023-12-21 13:02:34,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=47706.666666666664, ans=0.1 2023-12-21 13:02:44,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=47773.333333333336, ans=0.125 2023-12-21 13:02:45,216 INFO [train.py:886] (1/4) Epoch 2, batch 2400, loss[loss=0.01965, audio_tagging_loss=0.01965, over 25000.00 frames. ], tot_loss[loss=0.01824, audio_tagging_loss=0.01824, over 4953039.18 frames. ], batch size: 100, lr: 3.75e-02, grad_scale: 256.0 2023-12-21 13:02:48,590 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.11 vs. limit=15.0 2023-12-21 13:02:49,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=47773.333333333336, ans=0.125 2023-12-21 13:02:55,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=47840.0, ans=0.1 2023-12-21 13:02:59,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=47840.0, ans=0.07 2023-12-21 13:03:03,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=47840.0, ans=0.125 2023-12-21 13:03:20,962 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.64 vs. limit=22.5 2023-12-21 13:03:23,055 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.458e+01 2.728e+01 3.033e+01 4.100e+01, threshold=5.456e+01, percent-clipped=0.0 2023-12-21 13:03:29,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=48040.0, ans=0.0004260869565217394 2023-12-21 13:03:37,947 INFO [train.py:886] (1/4) Epoch 2, batch 2450, loss[loss=0.02039, audio_tagging_loss=0.02039, over 25000.00 frames. ], tot_loss[loss=0.01828, audio_tagging_loss=0.01828, over 4955592.21 frames. ], batch size: 100, lr: 3.75e-02, grad_scale: 256.0 2023-12-21 13:03:41,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=48106.666666666664, ans=0.00041159420289855163 2023-12-21 13:04:10,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=48306.666666666664, ans=10.0 2023-12-21 13:04:10,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=48306.666666666664, ans=0.1 2023-12-21 13:04:18,346 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.80 vs. limit=22.5 2023-12-21 13:04:30,287 INFO [train.py:886] (1/4) Epoch 2, batch 2500, loss[loss=0.018, audio_tagging_loss=0.018, over 24750.00 frames. ], tot_loss[loss=0.01831, audio_tagging_loss=0.01831, over 4953082.03 frames. ], batch size: 99, lr: 3.74e-02, grad_scale: 256.0 2023-12-21 13:04:41,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=48506.666666666664, ans=0.0003246376811594214 2023-12-21 13:04:47,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=48506.666666666664, ans=0.125 2023-12-21 13:04:48,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.71 vs. limit=15.0 2023-12-21 13:04:54,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=48573.333333333336, ans=0.125 2023-12-21 13:04:55,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=48573.333333333336, ans=0.125 2023-12-21 13:05:05,206 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=23.22 vs. limit=22.5 2023-12-21 13:05:08,426 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.182e+01 2.552e+01 2.788e+01 3.039e+01 3.953e+01, threshold=5.575e+01, percent-clipped=0.0 2023-12-21 13:05:21,675 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.57 vs. limit=22.5 2023-12-21 13:05:21,983 INFO [train.py:886] (1/4) Epoch 2, batch 2550, loss[loss=0.01595, audio_tagging_loss=0.01595, over 24750.00 frames. ], tot_loss[loss=0.01835, audio_tagging_loss=0.01835, over 4953238.80 frames. ], batch size: 99, lr: 3.73e-02, grad_scale: 256.0 2023-12-21 13:05:41,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=48840.0, ans=0.0 2023-12-21 13:05:44,008 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.91 vs. limit=22.5 2023-12-21 13:05:55,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=48973.333333333336, ans=0.0 2023-12-21 13:05:55,489 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.55 vs. limit=10.0 2023-12-21 13:05:56,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=48973.333333333336, ans=6.0 2023-12-21 13:06:14,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=49040.0, ans=0.1 2023-12-21 13:06:15,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=49106.666666666664, ans=0.125 2023-12-21 13:06:16,098 INFO [train.py:886] (1/4) Epoch 2, batch 2600, loss[loss=0.01899, audio_tagging_loss=0.01899, over 24033.00 frames. ], tot_loss[loss=0.0184, audio_tagging_loss=0.0184, over 4952793.38 frames. ], batch size: 100, lr: 3.73e-02, grad_scale: 256.0 2023-12-21 13:06:20,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=49106.666666666664, ans=0.0 2023-12-21 13:06:30,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=49173.333333333336, ans=0.1 2023-12-21 13:06:34,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=49173.333333333336, ans=10.0 2023-12-21 13:06:38,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=49240.0, ans=0.0 2023-12-21 13:06:38,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=49240.0, ans=0.125 2023-12-21 13:06:39,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=49240.0, ans=0.125 2023-12-21 13:06:50,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=49306.666666666664, ans=0.2 2023-12-21 13:06:53,175 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 2.505e+01 2.770e+01 3.074e+01 4.443e+01, threshold=5.539e+01, percent-clipped=0.0 2023-12-21 13:07:07,317 INFO [train.py:886] (1/4) Epoch 2, batch 2650, loss[loss=0.0207, audio_tagging_loss=0.0207, over 25000.00 frames. ], tot_loss[loss=0.01828, audio_tagging_loss=0.01828, over 4953588.22 frames. ], batch size: 100, lr: 3.72e-02, grad_scale: 256.0 2023-12-21 13:07:07,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=49440.0, ans=0.07 2023-12-21 13:07:09,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=49440.0, ans=0.125 2023-12-21 13:07:18,870 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.520e+00 2023-12-21 13:07:23,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=49506.666666666664, ans=0.1 2023-12-21 13:07:23,993 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.23 vs. limit=12.0 2023-12-21 13:07:25,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=49506.666666666664, ans=0.0 2023-12-21 13:07:27,758 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.35 vs. limit=10.0 2023-12-21 13:07:34,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=49573.333333333336, ans=0.125 2023-12-21 13:07:42,025 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.68 vs. limit=15.0 2023-12-21 13:07:42,766 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.53 vs. limit=22.5 2023-12-21 13:08:00,853 INFO [train.py:886] (1/4) Epoch 2, batch 2700, loss[loss=0.01821, audio_tagging_loss=0.01821, over 24750.00 frames. ], tot_loss[loss=0.01826, audio_tagging_loss=0.01826, over 4957546.74 frames. ], batch size: 99, lr: 3.71e-02, grad_scale: 256.0 2023-12-21 13:08:01,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=49773.333333333336, ans=0.0 2023-12-21 13:08:16,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=49840.0, ans=0.1 2023-12-21 13:08:36,171 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2023-12-21 13:08:38,475 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.020e+01 2.555e+01 2.825e+01 3.141e+01 4.056e+01, threshold=5.650e+01, percent-clipped=0.0 2023-12-21 13:08:40,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=49973.333333333336, ans=0.09899494936611666 2023-12-21 13:08:53,329 INFO [train.py:886] (1/4) Epoch 2, batch 2750, loss[loss=0.01593, audio_tagging_loss=0.01593, over 24750.00 frames. ], tot_loss[loss=0.01828, audio_tagging_loss=0.01828, over 4960760.06 frames. ], batch size: 99, lr: 3.71e-02, grad_scale: 256.0 2023-12-21 13:09:02,802 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.47 vs. limit=15.0 2023-12-21 13:09:03,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=50173.333333333336, ans=0.1 2023-12-21 13:09:13,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=50240.0, ans=0.07 2023-12-21 13:09:37,716 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=9.487e-02 2023-12-21 13:09:40,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=50373.333333333336, ans=0.125 2023-12-21 13:09:41,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=50373.333333333336, ans=0.125 2023-12-21 13:09:45,139 INFO [train.py:886] (1/4) Epoch 2, batch 2800, loss[loss=0.02009, audio_tagging_loss=0.02009, over 24750.00 frames. ], tot_loss[loss=0.01842, audio_tagging_loss=0.01842, over 4959292.51 frames. ], batch size: 99, lr: 3.70e-02, grad_scale: 256.0 2023-12-21 13:09:50,995 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=15.0 2023-12-21 13:09:56,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=50506.666666666664, ans=0.0 2023-12-21 13:09:58,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=50506.666666666664, ans=0.2 2023-12-21 13:10:01,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=50506.666666666664, ans=0.0 2023-12-21 13:10:11,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=50573.333333333336, ans=0.125 2023-12-21 13:10:12,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=50573.333333333336, ans=0.0 2023-12-21 13:10:20,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=50640.0, ans=0.2 2023-12-21 13:10:23,047 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.547e+01 2.749e+01 3.005e+01 4.633e+01, threshold=5.497e+01, percent-clipped=0.0 2023-12-21 13:10:38,451 INFO [train.py:886] (1/4) Epoch 2, batch 2850, loss[loss=0.02006, audio_tagging_loss=0.02006, over 24750.00 frames. ], tot_loss[loss=0.01856, audio_tagging_loss=0.01856, over 4952392.26 frames. ], batch size: 99, lr: 3.70e-02, grad_scale: 256.0 2023-12-21 13:10:42,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=50773.333333333336, ans=0.0 2023-12-21 13:10:54,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=50840.0, ans=0.2 2023-12-21 13:10:58,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=50906.666666666664, ans=0.2 2023-12-21 13:11:14,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=50973.333333333336, ans=0.1 2023-12-21 13:11:22,061 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.03 vs. limit=15.0 2023-12-21 13:11:24,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=51040.0, ans=0.0 2023-12-21 13:11:30,670 INFO [train.py:886] (1/4) Epoch 2, batch 2900, loss[loss=0.0162, audio_tagging_loss=0.0162, over 24750.00 frames. ], tot_loss[loss=0.01836, audio_tagging_loss=0.01836, over 4950869.51 frames. ], batch size: 99, lr: 3.69e-02, grad_scale: 256.0 2023-12-21 13:11:51,716 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.07 vs. limit=15.0 2023-12-21 13:12:04,850 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2023-12-21 13:12:08,839 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.140e+01 2.551e+01 2.842e+01 3.147e+01 4.281e+01, threshold=5.684e+01, percent-clipped=0.0 2023-12-21 13:12:22,990 INFO [train.py:886] (1/4) Epoch 2, batch 2950, loss[loss=0.01661, audio_tagging_loss=0.01661, over 25000.00 frames. ], tot_loss[loss=0.01822, audio_tagging_loss=0.01822, over 4949772.70 frames. ], batch size: 100, lr: 3.68e-02, grad_scale: 256.0 2023-12-21 13:13:15,805 INFO [train.py:886] (1/4) Epoch 2, batch 3000, loss[loss=0.01893, audio_tagging_loss=0.01893, over 25000.00 frames. ], tot_loss[loss=0.01832, audio_tagging_loss=0.01832, over 4954579.35 frames. ], batch size: 100, lr: 3.68e-02, grad_scale: 256.0 2023-12-21 13:13:15,806 INFO [train.py:909] (1/4) Computing validation loss 2023-12-21 13:13:27,991 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.5607, 4.0402, 4.2397, 4.0460], device='cuda:1') 2023-12-21 13:13:38,878 INFO [train.py:917] (1/4) Epoch 2, validation: loss=0.04373, audio_tagging_loss=0.04373, over 3737520.00 frames. 2023-12-21 13:13:38,879 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-21 13:13:45,728 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.74 vs. limit=22.5 2023-12-21 13:13:52,480 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=18.06 vs. limit=15.0 2023-12-21 13:13:54,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=51840.0, ans=0.2 2023-12-21 13:14:04,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=51906.666666666664, ans=0.1 2023-12-21 13:14:08,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=51906.666666666664, ans=0.07 2023-12-21 13:14:09,401 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.50 vs. limit=15.0 2023-12-21 13:14:15,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=51973.333333333336, ans=0.2 2023-12-21 13:14:16,990 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.105e+01 2.493e+01 2.750e+01 3.115e+01 4.237e+01, threshold=5.501e+01, percent-clipped=0.0 2023-12-21 13:14:31,124 INFO [train.py:886] (1/4) Epoch 2, batch 3050, loss[loss=0.01804, audio_tagging_loss=0.01804, over 25000.00 frames. ], tot_loss[loss=0.01827, audio_tagging_loss=0.01827, over 4956570.69 frames. ], batch size: 100, lr: 3.67e-02, grad_scale: 256.0 2023-12-21 13:14:33,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=52106.666666666664, ans=0.2 2023-12-21 13:14:36,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=52106.666666666664, ans=0.2 2023-12-21 13:15:14,554 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=15.0 2023-12-21 13:15:15,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=52373.333333333336, ans=0.2 2023-12-21 13:15:21,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=52373.333333333336, ans=0.0 2023-12-21 13:15:24,010 INFO [train.py:886] (1/4) Epoch 2, batch 3100, loss[loss=0.02167, audio_tagging_loss=0.02167, over 24750.00 frames. ], tot_loss[loss=0.01832, audio_tagging_loss=0.01832, over 4952365.08 frames. ], batch size: 99, lr: 3.67e-02, grad_scale: 256.0 2023-12-21 13:15:39,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=52506.666666666664, ans=0.2 2023-12-21 13:15:40,323 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.515e+01 2023-12-21 13:15:58,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=52640.0, ans=0.04949747468305833 2023-12-21 13:16:01,673 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.021e+01 2.616e+01 2.830e+01 3.122e+01 4.076e+01, threshold=5.659e+01, percent-clipped=0.0 2023-12-21 13:16:03,977 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.37 vs. limit=15.0 2023-12-21 13:16:11,480 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.95 vs. limit=15.0 2023-12-21 13:16:15,757 INFO [train.py:886] (1/4) Epoch 2, batch 3150, loss[loss=0.02067, audio_tagging_loss=0.02067, over 25000.00 frames. ], tot_loss[loss=0.01846, audio_tagging_loss=0.01846, over 4945482.53 frames. ], batch size: 100, lr: 3.66e-02, grad_scale: 256.0 2023-12-21 13:16:19,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=52773.333333333336, ans=0.125 2023-12-21 13:16:23,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=52773.333333333336, ans=0.2 2023-12-21 13:16:32,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.18 vs. limit=15.0 2023-12-21 13:16:48,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=52973.333333333336, ans=0.125 2023-12-21 13:16:53,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=52973.333333333336, ans=0.5 2023-12-21 13:17:03,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=53040.0, ans=0.1 2023-12-21 13:17:07,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=53106.666666666664, ans=0.125 2023-12-21 13:17:08,693 INFO [train.py:886] (1/4) Epoch 2, batch 3200, loss[loss=0.01716, audio_tagging_loss=0.01716, over 25000.00 frames. ], tot_loss[loss=0.01842, audio_tagging_loss=0.01842, over 4944506.69 frames. ], batch size: 100, lr: 3.65e-02, grad_scale: 256.0 2023-12-21 13:17:18,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=53173.333333333336, ans=0.2 2023-12-21 13:17:21,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=53173.333333333336, ans=0.1 2023-12-21 13:17:33,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=53240.0, ans=0.125 2023-12-21 13:17:41,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=53306.666666666664, ans=0.125 2023-12-21 13:17:42,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=53306.666666666664, ans=0.125 2023-12-21 13:17:43,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=53306.666666666664, ans=0.1 2023-12-21 13:17:48,110 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.552e+01 2.741e+01 3.134e+01 4.308e+01, threshold=5.481e+01, percent-clipped=0.0 2023-12-21 13:18:04,658 INFO [train.py:886] (1/4) Epoch 2, batch 3250, loss[loss=0.01506, audio_tagging_loss=0.01506, over 22563.00 frames. ], tot_loss[loss=0.01829, audio_tagging_loss=0.01829, over 4944620.19 frames. ], batch size: 107, lr: 3.65e-02, grad_scale: 256.0 2023-12-21 13:18:05,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=53440.0, ans=0.125 2023-12-21 13:18:06,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=53440.0, ans=0.125 2023-12-21 13:18:21,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=53506.666666666664, ans=0.125 2023-12-21 13:18:24,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=53573.333333333336, ans=0.125 2023-12-21 13:18:27,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=53573.333333333336, ans=0.0 2023-12-21 13:18:40,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=53640.0, ans=0.125 2023-12-21 13:18:43,824 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.96 vs. limit=15.0 2023-12-21 13:18:48,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=53706.666666666664, ans=0.125 2023-12-21 13:18:54,228 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=30.14 vs. limit=22.5 2023-12-21 13:18:54,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=53773.333333333336, ans=0.2 2023-12-21 13:18:55,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.80 vs. limit=22.5 2023-12-21 13:18:55,718 INFO [train.py:886] (1/4) Epoch 2, batch 3300, loss[loss=0.01908, audio_tagging_loss=0.01908, over 25000.00 frames. ], tot_loss[loss=0.01826, audio_tagging_loss=0.01826, over 4943030.08 frames. ], batch size: 100, lr: 3.64e-02, grad_scale: 256.0 2023-12-21 13:19:00,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=53773.333333333336, ans=0.0 2023-12-21 13:19:09,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=53840.0, ans=0.0 2023-12-21 13:19:10,233 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.61 vs. limit=15.0 2023-12-21 13:19:13,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=53840.0, ans=0.07 2023-12-21 13:19:22,487 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.05 vs. limit=22.5 2023-12-21 13:19:30,435 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.42 vs. limit=10.0 2023-12-21 13:19:35,085 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.041e+01 2.487e+01 2.709e+01 2.952e+01 3.963e+01, threshold=5.419e+01, percent-clipped=0.0 2023-12-21 13:19:35,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=53973.333333333336, ans=0.125 2023-12-21 13:19:41,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=54040.0, ans=0.0 2023-12-21 13:19:45,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=54040.0, ans=0.07 2023-12-21 13:19:46,064 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.77 vs. limit=22.5 2023-12-21 13:19:50,320 INFO [train.py:886] (1/4) Epoch 2, batch 3350, loss[loss=0.021, audio_tagging_loss=0.021, over 24750.00 frames. ], tot_loss[loss=0.01815, audio_tagging_loss=0.01815, over 4947365.18 frames. ], batch size: 99, lr: 3.64e-02, grad_scale: 256.0 2023-12-21 13:19:59,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=54173.333333333336, ans=0.125 2023-12-21 13:20:07,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=54173.333333333336, ans=0.5 2023-12-21 13:20:07,520 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.27 vs. limit=15.0 2023-12-21 13:20:10,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=54240.0, ans=0.125 2023-12-21 13:20:20,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=54306.666666666664, ans=0.0 2023-12-21 13:20:21,700 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.28 vs. limit=22.5 2023-12-21 13:20:21,811 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.89 vs. limit=10.0 2023-12-21 13:20:23,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=54306.666666666664, ans=0.0 2023-12-21 13:20:30,354 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.63 vs. limit=22.5 2023-12-21 13:20:32,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=54373.333333333336, ans=0.125 2023-12-21 13:20:36,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=54373.333333333336, ans=0.09899494936611666 2023-12-21 13:20:40,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=54440.0, ans=0.07 2023-12-21 13:20:41,757 INFO [train.py:886] (1/4) Epoch 2, batch 3400, loss[loss=0.0188, audio_tagging_loss=0.0188, over 25000.00 frames. ], tot_loss[loss=0.01823, audio_tagging_loss=0.01823, over 4954081.08 frames. ], batch size: 100, lr: 3.63e-02, grad_scale: 256.0 2023-12-21 13:21:02,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=54573.333333333336, ans=0.0 2023-12-21 13:21:09,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=54573.333333333336, ans=0.0 2023-12-21 13:21:20,897 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.042e+01 2.559e+01 2.794e+01 3.054e+01 3.708e+01, threshold=5.589e+01, percent-clipped=0.0 2023-12-21 13:21:34,419 INFO [train.py:886] (1/4) Epoch 2, batch 3450, loss[loss=0.0192, audio_tagging_loss=0.0192, over 24750.00 frames. ], tot_loss[loss=0.01831, audio_tagging_loss=0.01831, over 4950918.61 frames. ], batch size: 99, lr: 3.62e-02, grad_scale: 256.0 2023-12-21 13:21:35,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=54773.333333333336, ans=0.09899494936611666 2023-12-21 13:22:00,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=54906.666666666664, ans=0.1 2023-12-21 13:22:20,850 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.41 vs. limit=6.0 2023-12-21 13:22:28,220 INFO [train.py:886] (1/4) Epoch 2, batch 3500, loss[loss=0.019, audio_tagging_loss=0.019, over 25000.00 frames. ], tot_loss[loss=0.01836, audio_tagging_loss=0.01836, over 4947600.39 frames. ], batch size: 100, lr: 3.62e-02, grad_scale: 512.0 2023-12-21 13:22:39,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=55173.333333333336, ans=0.2 2023-12-21 13:22:40,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=55173.333333333336, ans=0.125 2023-12-21 13:22:41,180 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.21 vs. limit=15.0 2023-12-21 13:22:52,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=55240.0, ans=0.125 2023-12-21 13:23:01,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=55306.666666666664, ans=0.125 2023-12-21 13:23:05,438 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.192e+01 2.571e+01 2.817e+01 3.195e+01 5.368e+01, threshold=5.635e+01, percent-clipped=0.0 2023-12-21 13:23:07,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=55306.666666666664, ans=0.0 2023-12-21 13:23:11,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=55373.333333333336, ans=0.125 2023-12-21 13:23:16,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=55373.333333333336, ans=0.0 2023-12-21 13:23:19,041 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.87 vs. limit=15.0 2023-12-21 13:23:19,474 INFO [train.py:886] (1/4) Epoch 2, batch 3550, loss[loss=0.01818, audio_tagging_loss=0.01818, over 24750.00 frames. ], tot_loss[loss=0.01829, audio_tagging_loss=0.01829, over 4949503.60 frames. ], batch size: 99, lr: 3.61e-02, grad_scale: 512.0 2023-12-21 13:23:21,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=55440.0, ans=0.125 2023-12-21 13:23:52,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=55640.0, ans=0.0 2023-12-21 13:23:53,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=55640.0, ans=0.0 2023-12-21 13:24:08,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=55706.666666666664, ans=0.125 2023-12-21 13:24:11,734 INFO [train.py:886] (1/4) Epoch 2, batch 3600, loss[loss=0.0217, audio_tagging_loss=0.0217, over 25000.00 frames. ], tot_loss[loss=0.01828, audio_tagging_loss=0.01828, over 4951460.03 frames. ], batch size: 100, lr: 3.61e-02, grad_scale: 512.0 2023-12-21 13:24:21,993 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.93 vs. limit=10.0 2023-12-21 13:24:23,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=55840.0, ans=0.125 2023-12-21 13:24:31,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=55906.666666666664, ans=0.1 2023-12-21 13:24:39,819 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 13:24:42,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=55906.666666666664, ans=15.0 2023-12-21 13:24:43,109 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.80 vs. limit=6.0 2023-12-21 13:24:44,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=55973.333333333336, ans=0.0 2023-12-21 13:24:50,575 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.154e+01 2.558e+01 2.810e+01 3.070e+01 4.011e+01, threshold=5.620e+01, percent-clipped=0.0 2023-12-21 13:25:04,355 INFO [train.py:886] (1/4) Epoch 2, batch 3650, loss[loss=0.01946, audio_tagging_loss=0.01946, over 24750.00 frames. ], tot_loss[loss=0.01809, audio_tagging_loss=0.01809, over 4957327.29 frames. ], batch size: 99, lr: 3.60e-02, grad_scale: 256.0 2023-12-21 13:25:13,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=56106.666666666664, ans=0.1 2023-12-21 13:25:14,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=56173.333333333336, ans=0.125 2023-12-21 13:25:19,698 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.93 vs. limit=10.0 2023-12-21 13:25:27,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=56240.0, ans=0.1 2023-12-21 13:25:50,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=56373.333333333336, ans=0.0 2023-12-21 13:25:56,767 INFO [train.py:886] (1/4) Epoch 2, batch 3700, loss[loss=0.01961, audio_tagging_loss=0.01961, over 25000.00 frames. ], tot_loss[loss=0.01817, audio_tagging_loss=0.01817, over 4962589.46 frames. ], batch size: 100, lr: 3.59e-02, grad_scale: 256.0 2023-12-21 13:26:06,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=56506.666666666664, ans=0.125 2023-12-21 13:26:07,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=56506.666666666664, ans=0.125 2023-12-21 13:26:22,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=56573.333333333336, ans=0.125 2023-12-21 13:26:30,966 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=21.98 vs. limit=22.5 2023-12-21 13:26:31,634 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.44 vs. limit=12.0 2023-12-21 13:26:35,075 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.140e+01 2.535e+01 2.837e+01 3.085e+01 3.878e+01, threshold=5.675e+01, percent-clipped=0.0 2023-12-21 13:26:50,309 INFO [train.py:886] (1/4) Epoch 2, batch 3750, loss[loss=0.01991, audio_tagging_loss=0.01991, over 24750.00 frames. ], tot_loss[loss=0.01826, audio_tagging_loss=0.01826, over 4961605.88 frames. ], batch size: 99, lr: 3.59e-02, grad_scale: 256.0 2023-12-21 13:27:00,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=56840.0, ans=0.125 2023-12-21 13:27:16,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=56906.666666666664, ans=0.2 2023-12-21 13:27:17,445 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.84 vs. limit=15.0 2023-12-21 13:27:26,315 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.653e+01 2023-12-21 13:27:33,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=57040.0, ans=0.09899494936611666 2023-12-21 13:27:38,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=57040.0, ans=0.1 2023-12-21 13:27:41,336 INFO [train.py:886] (1/4) Epoch 2, batch 3800, loss[loss=0.01866, audio_tagging_loss=0.01866, over 25000.00 frames. ], tot_loss[loss=0.01844, audio_tagging_loss=0.01844, over 4951955.15 frames. ], batch size: 100, lr: 3.58e-02, grad_scale: 256.0 2023-12-21 13:27:48,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=57106.666666666664, ans=0.0 2023-12-21 13:27:53,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=57173.333333333336, ans=0.125 2023-12-21 13:27:59,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=57173.333333333336, ans=0.0 2023-12-21 13:28:20,922 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.558e+01 2.812e+01 3.070e+01 5.505e+01, threshold=5.625e+01, percent-clipped=0.0 2023-12-21 13:28:26,220 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.50 vs. limit=15.0 2023-12-21 13:28:34,236 INFO [train.py:886] (1/4) Epoch 2, batch 3850, loss[loss=0.01767, audio_tagging_loss=0.01767, over 24750.00 frames. ], tot_loss[loss=0.01836, audio_tagging_loss=0.01836, over 4941177.83 frames. ], batch size: 99, lr: 3.58e-02, grad_scale: 256.0 2023-12-21 13:28:58,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=57573.333333333336, ans=0.125 2023-12-21 13:29:11,366 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.18 vs. limit=15.0 2023-12-21 13:29:27,439 INFO [train.py:886] (1/4) Epoch 2, batch 3900, loss[loss=0.01661, audio_tagging_loss=0.01661, over 25000.00 frames. ], tot_loss[loss=0.01821, audio_tagging_loss=0.01821, over 4944466.71 frames. ], batch size: 100, lr: 3.57e-02, grad_scale: 256.0 2023-12-21 13:29:39,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=57840.0, ans=0.125 2023-12-21 13:30:00,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=57973.333333333336, ans=0.125 2023-12-21 13:30:05,871 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.478e+01 2.658e+01 2.976e+01 3.993e+01, threshold=5.317e+01, percent-clipped=0.0 2023-12-21 13:30:07,458 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.75 vs. limit=22.5 2023-12-21 13:30:12,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=58040.0, ans=0.0 2023-12-21 13:30:19,061 INFO [train.py:886] (1/4) Epoch 2, batch 3950, loss[loss=0.01617, audio_tagging_loss=0.01617, over 25000.00 frames. ], tot_loss[loss=0.01821, audio_tagging_loss=0.01821, over 4943751.98 frames. ], batch size: 100, lr: 3.56e-02, grad_scale: 256.0 2023-12-21 13:30:20,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=58106.666666666664, ans=0.0 2023-12-21 13:30:26,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=58106.666666666664, ans=0.125 2023-12-21 13:30:34,989 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.72 vs. limit=15.0 2023-12-21 13:30:38,833 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=17.24 vs. limit=15.0 2023-12-21 13:30:44,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=58240.0, ans=0.0 2023-12-21 13:30:45,463 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.87 vs. limit=22.5 2023-12-21 13:30:48,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=58240.0, ans=0.1 2023-12-21 13:30:56,614 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.26 vs. limit=15.0 2023-12-21 13:31:12,223 INFO [train.py:886] (1/4) Epoch 2, batch 4000, loss[loss=0.01718, audio_tagging_loss=0.01718, over 25000.00 frames. ], tot_loss[loss=0.01815, audio_tagging_loss=0.01815, over 4942252.04 frames. ], batch size: 100, lr: 3.56e-02, grad_scale: 256.0 2023-12-21 13:31:17,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=58440.0, ans=0.02 2023-12-21 13:31:18,058 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.66 vs. limit=15.0 2023-12-21 13:31:19,106 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.20 vs. limit=22.5 2023-12-21 13:31:37,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=58573.333333333336, ans=0.125 2023-12-21 13:31:50,751 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.173e+01 2.643e+01 2.871e+01 3.262e+01 4.395e+01, threshold=5.743e+01, percent-clipped=0.0 2023-12-21 13:31:53,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=58706.666666666664, ans=0.125 2023-12-21 13:32:00,133 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.78 vs. limit=15.0 2023-12-21 13:32:04,045 INFO [train.py:886] (1/4) Epoch 2, batch 4050, loss[loss=0.01964, audio_tagging_loss=0.01964, over 25000.00 frames. ], tot_loss[loss=0.0182, audio_tagging_loss=0.0182, over 4943630.01 frames. ], batch size: 100, lr: 3.55e-02, grad_scale: 256.0 2023-12-21 13:32:09,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=58773.333333333336, ans=0.1 2023-12-21 13:32:20,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=58840.0, ans=0.0 2023-12-21 13:32:28,678 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.93 vs. limit=15.0 2023-12-21 13:32:45,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=59040.0, ans=0.2 2023-12-21 13:32:45,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=59040.0, ans=0.125 2023-12-21 13:32:53,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=59040.0, ans=0.07 2023-12-21 13:32:54,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=59040.0, ans=0.1 2023-12-21 13:32:56,521 INFO [train.py:886] (1/4) Epoch 2, batch 4100, loss[loss=0.01875, audio_tagging_loss=0.01875, over 24750.00 frames. ], tot_loss[loss=0.0183, audio_tagging_loss=0.0183, over 4941333.30 frames. ], batch size: 99, lr: 3.55e-02, grad_scale: 256.0 2023-12-21 13:32:56,853 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.50 vs. limit=15.0 2023-12-21 13:33:07,296 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.35 vs. limit=22.5 2023-12-21 13:33:08,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=59173.333333333336, ans=0.0 2023-12-21 13:33:11,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=59173.333333333336, ans=0.0 2023-12-21 13:33:20,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=59240.0, ans=10.0 2023-12-21 13:33:23,471 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.37 vs. limit=15.0 2023-12-21 13:33:27,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=59306.666666666664, ans=0.2 2023-12-21 13:33:27,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=59306.666666666664, ans=0.1 2023-12-21 13:33:34,104 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.565e+01 2.821e+01 3.074e+01 4.312e+01, threshold=5.642e+01, percent-clipped=0.0 2023-12-21 13:33:47,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=59373.333333333336, ans=0.125 2023-12-21 13:33:48,665 INFO [train.py:886] (1/4) Epoch 2, batch 4150, loss[loss=0.01757, audio_tagging_loss=0.01757, over 24750.00 frames. ], tot_loss[loss=0.01829, audio_tagging_loss=0.01829, over 4942408.41 frames. ], batch size: 99, lr: 3.54e-02, grad_scale: 256.0 2023-12-21 13:33:51,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=59440.0, ans=0.2 2023-12-21 13:33:52,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=59440.0, ans=0.0 2023-12-21 13:34:17,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=59573.333333333336, ans=0.0 2023-12-21 13:34:24,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=59640.0, ans=0.125 2023-12-21 13:34:27,461 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.91 vs. limit=15.0 2023-12-21 13:34:29,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=59706.666666666664, ans=0.0 2023-12-21 13:34:40,079 INFO [train.py:886] (1/4) Epoch 2, batch 4200, loss[loss=0.01739, audio_tagging_loss=0.01739, over 25000.00 frames. ], tot_loss[loss=0.0181, audio_tagging_loss=0.0181, over 4945177.70 frames. ], batch size: 100, lr: 3.53e-02, grad_scale: 256.0 2023-12-21 13:34:49,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=59840.0, ans=0.0 2023-12-21 13:35:18,624 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.091e+01 2.515e+01 2.712e+01 3.027e+01 3.804e+01, threshold=5.424e+01, percent-clipped=0.0 2023-12-21 13:35:18,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=59973.333333333336, ans=0.1 2023-12-21 13:35:31,823 INFO [train.py:886] (1/4) Epoch 2, batch 4250, loss[loss=0.01904, audio_tagging_loss=0.01904, over 24750.00 frames. ], tot_loss[loss=0.01812, audio_tagging_loss=0.01812, over 4947835.26 frames. ], batch size: 99, lr: 3.53e-02, grad_scale: 256.0 2023-12-21 13:35:47,378 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.94 vs. limit=10.0 2023-12-21 13:35:47,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=60173.333333333336, ans=0.5 2023-12-21 13:35:59,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=60240.0, ans=0.05 2023-12-21 13:36:03,909 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=26.26 vs. limit=22.5 2023-12-21 13:36:17,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=60373.333333333336, ans=0.0 2023-12-21 13:36:21,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=60373.333333333336, ans=0.0 2023-12-21 13:36:23,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=60440.0, ans=0.5 2023-12-21 13:36:24,764 INFO [train.py:886] (1/4) Epoch 2, batch 4300, loss[loss=0.01609, audio_tagging_loss=0.01609, over 25000.00 frames. ], tot_loss[loss=0.01804, audio_tagging_loss=0.01804, over 4943760.65 frames. ], batch size: 100, lr: 3.52e-02, grad_scale: 256.0 2023-12-21 13:36:31,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=60440.0, ans=0.2 2023-12-21 13:36:47,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=60573.333333333336, ans=0.2 2023-12-21 13:36:56,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=60640.0, ans=0.125 2023-12-21 13:37:03,037 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.145e+01 2.538e+01 2.771e+01 3.049e+01 3.843e+01, threshold=5.542e+01, percent-clipped=0.0 2023-12-21 13:37:08,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=60706.666666666664, ans=0.05 2023-12-21 13:37:15,381 INFO [train.py:886] (1/4) Epoch 2, batch 4350, loss[loss=0.01555, audio_tagging_loss=0.01555, over 25000.00 frames. ], tot_loss[loss=0.0182, audio_tagging_loss=0.0182, over 4954072.91 frames. ], batch size: 100, lr: 3.52e-02, grad_scale: 256.0 2023-12-21 13:37:34,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=60840.0, ans=0.1 2023-12-21 13:37:39,318 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.02 vs. limit=22.5 2023-12-21 13:37:41,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=60906.666666666664, ans=0.2 2023-12-21 13:37:42,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=60906.666666666664, ans=0.125 2023-12-21 13:37:52,347 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=18.05 vs. limit=15.0 2023-12-21 13:38:02,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=61040.0, ans=0.0 2023-12-21 13:38:08,516 INFO [train.py:886] (1/4) Epoch 2, batch 4400, loss[loss=0.01642, audio_tagging_loss=0.01642, over 25000.00 frames. ], tot_loss[loss=0.01829, audio_tagging_loss=0.01829, over 4950303.60 frames. ], batch size: 100, lr: 3.51e-02, grad_scale: 256.0 2023-12-21 13:38:10,810 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.37 vs. limit=15.0 2023-12-21 13:38:14,585 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=15.0 2023-12-21 13:38:16,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=61106.666666666664, ans=0.125 2023-12-21 13:38:40,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=61306.666666666664, ans=0.125 2023-12-21 13:38:46,108 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.219e+01 2.594e+01 2.828e+01 3.102e+01 3.980e+01, threshold=5.657e+01, percent-clipped=0.0 2023-12-21 13:38:49,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=61373.333333333336, ans=0.125 2023-12-21 13:38:53,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=61373.333333333336, ans=0.125 2023-12-21 13:38:54,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=61373.333333333336, ans=0.125 2023-12-21 13:38:59,967 INFO [train.py:886] (1/4) Epoch 2, batch 4450, loss[loss=0.02033, audio_tagging_loss=0.02033, over 24750.00 frames. ], tot_loss[loss=0.01839, audio_tagging_loss=0.01839, over 4946864.31 frames. ], batch size: 99, lr: 3.51e-02, grad_scale: 256.0 2023-12-21 13:39:11,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=61506.666666666664, ans=0.125 2023-12-21 13:39:11,522 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.74 vs. limit=15.0 2023-12-21 13:39:17,381 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.80 vs. limit=12.0 2023-12-21 13:39:18,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=61506.666666666664, ans=0.125 2023-12-21 13:39:22,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=61573.333333333336, ans=0.125 2023-12-21 13:39:22,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=61573.333333333336, ans=0.125 2023-12-21 13:39:22,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=61573.333333333336, ans=0.2 2023-12-21 13:39:23,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=61573.333333333336, ans=0.0 2023-12-21 13:39:25,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=61573.333333333336, ans=0.1 2023-12-21 13:39:30,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=61640.0, ans=0.0 2023-12-21 13:39:51,930 INFO [train.py:886] (1/4) Epoch 2, batch 4500, loss[loss=0.01708, audio_tagging_loss=0.01708, over 25000.00 frames. ], tot_loss[loss=0.01817, audio_tagging_loss=0.01817, over 4953305.23 frames. ], batch size: 100, lr: 3.50e-02, grad_scale: 256.0 2023-12-21 13:39:54,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=61773.333333333336, ans=0.1 2023-12-21 13:39:55,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=61773.333333333336, ans=0.1 2023-12-21 13:39:58,087 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.65 vs. limit=15.0 2023-12-21 13:40:02,611 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.24 vs. limit=15.0 2023-12-21 13:40:07,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.whiten.whitening_limit, batch_count=61840.0, ans=12.0 2023-12-21 13:40:11,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=61840.0, ans=0.0 2023-12-21 13:40:18,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=61906.666666666664, ans=0.125 2023-12-21 13:40:24,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=61973.333333333336, ans=0.5 2023-12-21 13:40:29,937 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.124e+01 2.527e+01 2.862e+01 3.154e+01 4.163e+01, threshold=5.724e+01, percent-clipped=0.0 2023-12-21 13:40:30,567 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.90 vs. limit=15.0 2023-12-21 13:40:44,706 INFO [train.py:886] (1/4) Epoch 2, batch 4550, loss[loss=0.01959, audio_tagging_loss=0.01959, over 24750.00 frames. ], tot_loss[loss=0.01804, audio_tagging_loss=0.01804, over 4952970.58 frames. ], batch size: 99, lr: 3.49e-02, grad_scale: 256.0 2023-12-21 13:40:45,101 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.53 vs. limit=15.0 2023-12-21 13:40:47,060 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.75 vs. limit=15.0 2023-12-21 13:40:49,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=62106.666666666664, ans=0.95 2023-12-21 13:41:06,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=62240.0, ans=0.125 2023-12-21 13:41:06,951 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.79 vs. limit=10.0 2023-12-21 13:41:12,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=62240.0, ans=0.1 2023-12-21 13:41:20,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=62306.666666666664, ans=0.0 2023-12-21 13:41:29,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=62373.333333333336, ans=0.0 2023-12-21 13:41:33,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=62373.333333333336, ans=0.015 2023-12-21 13:41:35,508 INFO [train.py:886] (1/4) Epoch 2, batch 4600, loss[loss=0.01899, audio_tagging_loss=0.01899, over 25000.00 frames. ], tot_loss[loss=0.018, audio_tagging_loss=0.018, over 4957444.30 frames. ], batch size: 100, lr: 3.49e-02, grad_scale: 256.0 2023-12-21 13:41:35,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=62440.0, ans=0.125 2023-12-21 13:41:46,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=62506.666666666664, ans=0.1 2023-12-21 13:41:49,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=62506.666666666664, ans=0.0 2023-12-21 13:41:52,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=62506.666666666664, ans=0.125 2023-12-21 13:42:09,618 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.93 vs. limit=15.0 2023-12-21 13:42:15,552 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.102e+01 2.477e+01 2.661e+01 2.958e+01 4.591e+01, threshold=5.321e+01, percent-clipped=0.0 2023-12-21 13:42:15,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=62640.0, ans=0.0 2023-12-21 13:42:17,792 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.14 vs. limit=22.5 2023-12-21 13:42:23,035 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 13:42:29,644 INFO [train.py:886] (1/4) Epoch 2, batch 4650, loss[loss=0.01758, audio_tagging_loss=0.01758, over 24054.00 frames. ], tot_loss[loss=0.01796, audio_tagging_loss=0.01796, over 4956891.65 frames. ], batch size: 100, lr: 3.48e-02, grad_scale: 256.0 2023-12-21 13:42:31,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=62773.333333333336, ans=0.0 2023-12-21 13:42:32,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=62773.333333333336, ans=0.125 2023-12-21 13:42:41,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.38 vs. limit=12.0 2023-12-21 13:42:53,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=62906.666666666664, ans=0.125 2023-12-21 13:42:55,706 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.57 vs. limit=15.0 2023-12-21 13:43:01,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=62973.333333333336, ans=0.2 2023-12-21 13:43:15,517 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.40 vs. limit=15.0 2023-12-21 13:43:19,731 INFO [train.py:886] (1/4) Epoch 2, batch 4700, loss[loss=0.02042, audio_tagging_loss=0.02042, over 24750.00 frames. ], tot_loss[loss=0.01818, audio_tagging_loss=0.01818, over 4958604.65 frames. ], batch size: 99, lr: 3.48e-02, grad_scale: 256.0 2023-12-21 13:43:25,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=63106.666666666664, ans=22.5 2023-12-21 13:43:36,937 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.67 vs. limit=15.0 2023-12-21 13:43:55,355 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.185e+01 2.602e+01 2.941e+01 3.244e+01 4.815e+01, threshold=5.882e+01, percent-clipped=0.0 2023-12-21 13:43:58,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=63373.333333333336, ans=0.125 2023-12-21 13:44:04,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=63373.333333333336, ans=0.1 2023-12-21 13:44:07,438 INFO [train.py:886] (1/4) Epoch 2, batch 4750, loss[loss=0.01927, audio_tagging_loss=0.01927, over 24750.00 frames. ], tot_loss[loss=0.01829, audio_tagging_loss=0.01829, over 4957712.74 frames. ], batch size: 99, lr: 3.47e-02, grad_scale: 256.0 2023-12-21 13:44:09,600 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=17.82 vs. limit=15.0 2023-12-21 13:44:19,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=63506.666666666664, ans=0.125 2023-12-21 13:44:45,619 INFO [train.py:886] (1/4) Epoch 3, batch 0, loss[loss=0.04927, audio_tagging_loss=0.04927, over 21764.00 frames. ], tot_loss[loss=0.04927, audio_tagging_loss=0.04927, over 21764.00 frames. ], batch size: 107, lr: 3.30e-02, grad_scale: 256.0 2023-12-21 13:44:45,619 INFO [train.py:909] (1/4) Computing validation loss 2023-12-21 13:45:06,677 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.0151, 2.5476, 2.0234, 2.1170], device='cuda:1') 2023-12-21 13:45:08,233 INFO [train.py:917] (1/4) Epoch 3, validation: loss=0.04026, audio_tagging_loss=0.04026, over 3737520.00 frames. 2023-12-21 13:45:08,234 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-21 13:45:09,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=63546.666666666664, ans=10.0 2023-12-21 13:45:12,568 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.46 vs. limit=15.0 2023-12-21 13:45:25,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=63613.333333333336, ans=0.0 2023-12-21 13:45:29,415 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=20.61 vs. limit=15.0 2023-12-21 13:45:32,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=63680.0, ans=0.125 2023-12-21 13:45:42,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=63746.666666666664, ans=0.07 2023-12-21 13:46:01,488 INFO [train.py:886] (1/4) Epoch 3, batch 50, loss[loss=0.02407, audio_tagging_loss=0.02407, over 25000.00 frames. ], tot_loss[loss=0.0288, audio_tagging_loss=0.0288, over 1117286.01 frames. ], batch size: 100, lr: 3.29e-02, grad_scale: 64.0 2023-12-21 13:46:04,055 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.53 vs. limit=15.0 2023-12-21 13:46:12,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.82 vs. limit=12.0 2023-12-21 13:46:18,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=63946.666666666664, ans=0.0 2023-12-21 13:46:20,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=64013.333333333336, ans=0.2 2023-12-21 13:46:22,115 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.48 vs. limit=22.5 2023-12-21 13:46:23,537 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.345e+01 2.959e+01 3.288e+01 3.830e+01 1.189e+02, threshold=6.575e+01, percent-clipped=4.0 2023-12-21 13:46:25,649 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=28.74 vs. limit=22.5 2023-12-21 13:46:38,100 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.23 vs. limit=15.0 2023-12-21 13:46:51,664 INFO [train.py:886] (1/4) Epoch 3, batch 100, loss[loss=0.01866, audio_tagging_loss=0.01866, over 25000.00 frames. ], tot_loss[loss=0.0248, audio_tagging_loss=0.0248, over 1971775.49 frames. ], batch size: 100, lr: 3.29e-02, grad_scale: 64.0 2023-12-21 13:47:07,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=64280.0, ans=0.125 2023-12-21 13:47:18,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=64346.666666666664, ans=0.2 2023-12-21 13:47:20,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=64346.666666666664, ans=0.0 2023-12-21 13:47:30,273 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.86 vs. limit=6.0 2023-12-21 13:47:43,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=64546.666666666664, ans=0.125 2023-12-21 13:47:44,259 INFO [train.py:886] (1/4) Epoch 3, batch 150, loss[loss=0.02294, audio_tagging_loss=0.02294, over 25000.00 frames. ], tot_loss[loss=0.02245, audio_tagging_loss=0.02245, over 2632656.03 frames. ], batch size: 100, lr: 3.28e-02, grad_scale: 64.0 2023-12-21 13:47:58,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=64613.333333333336, ans=0.125 2023-12-21 13:48:06,334 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=15.0 2023-12-21 13:48:07,592 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.621e+01 2.918e+01 3.112e+01 3.943e+01, threshold=5.836e+01, percent-clipped=0.0 2023-12-21 13:48:11,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=64680.0, ans=10.0 2023-12-21 13:48:35,602 INFO [train.py:886] (1/4) Epoch 3, batch 200, loss[loss=0.01473, audio_tagging_loss=0.01473, over 25000.00 frames. ], tot_loss[loss=0.02085, audio_tagging_loss=0.02085, over 3147562.97 frames. ], batch size: 100, lr: 3.28e-02, grad_scale: 64.0 2023-12-21 13:48:39,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=64880.0, ans=0.1 2023-12-21 13:48:41,577 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.64 vs. limit=15.0 2023-12-21 13:48:49,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=64946.666666666664, ans=0.0 2023-12-21 13:48:57,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=65013.333333333336, ans=0.1 2023-12-21 13:49:00,046 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.90 vs. limit=15.0 2023-12-21 13:49:16,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=65080.0, ans=0.2 2023-12-21 13:49:25,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=65146.666666666664, ans=0.0 2023-12-21 13:49:28,035 INFO [train.py:886] (1/4) Epoch 3, batch 250, loss[loss=0.01718, audio_tagging_loss=0.01718, over 24033.00 frames. ], tot_loss[loss=0.02011, audio_tagging_loss=0.02011, over 3551807.41 frames. ], batch size: 100, lr: 3.27e-02, grad_scale: 64.0 2023-12-21 13:49:49,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=65346.666666666664, ans=0.2 2023-12-21 13:49:51,904 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 2.523e+01 2.809e+01 3.152e+01 4.163e+01, threshold=5.618e+01, percent-clipped=0.0 2023-12-21 13:49:59,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=65413.333333333336, ans=0.125 2023-12-21 13:50:00,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=65413.333333333336, ans=0.125 2023-12-21 13:50:01,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=65413.333333333336, ans=0.125 2023-12-21 13:50:05,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=65413.333333333336, ans=0.2 2023-12-21 13:50:17,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=65480.0, ans=0.1 2023-12-21 13:50:20,242 INFO [train.py:886] (1/4) Epoch 3, batch 300, loss[loss=0.0176, audio_tagging_loss=0.0176, over 24750.00 frames. ], tot_loss[loss=0.01956, audio_tagging_loss=0.01956, over 3858021.19 frames. ], batch size: 99, lr: 3.27e-02, grad_scale: 64.0 2023-12-21 13:50:20,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=65546.66666666667, ans=0.0 2023-12-21 13:50:21,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=65546.66666666667, ans=0.125 2023-12-21 13:50:24,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=65546.66666666667, ans=0.125 2023-12-21 13:50:25,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=65546.66666666667, ans=0.2 2023-12-21 13:50:55,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=65746.66666666667, ans=0.125 2023-12-21 13:50:56,155 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.87 vs. limit=6.0 2023-12-21 13:51:07,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=65813.33333333333, ans=0.125 2023-12-21 13:51:10,614 INFO [train.py:886] (1/4) Epoch 3, batch 350, loss[loss=0.02189, audio_tagging_loss=0.02189, over 24750.00 frames. ], tot_loss[loss=0.01922, audio_tagging_loss=0.01922, over 4096502.17 frames. ], batch size: 99, lr: 3.26e-02, grad_scale: 64.0 2023-12-21 13:51:12,176 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.77 vs. limit=6.0 2023-12-21 13:51:27,379 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=53.55 vs. limit=22.5 2023-12-21 13:51:31,006 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.82 vs. limit=15.0 2023-12-21 13:51:32,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=66013.33333333333, ans=0.0 2023-12-21 13:51:35,294 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.203e+01 2.550e+01 2.783e+01 3.112e+01 3.866e+01, threshold=5.566e+01, percent-clipped=0.0 2023-12-21 13:51:36,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=66013.33333333333, ans=0.125 2023-12-21 13:51:56,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=66146.66666666667, ans=0.125 2023-12-21 13:52:03,169 INFO [train.py:886] (1/4) Epoch 3, batch 400, loss[loss=0.02056, audio_tagging_loss=0.02056, over 25000.00 frames. ], tot_loss[loss=0.01881, audio_tagging_loss=0.01881, over 4284008.81 frames. ], batch size: 100, lr: 3.25e-02, grad_scale: 64.0 2023-12-21 13:52:03,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=66213.33333333333, ans=0.1 2023-12-21 13:52:05,624 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.03 vs. limit=15.0 2023-12-21 13:52:13,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.43 vs. limit=6.0 2023-12-21 13:52:15,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=66280.0, ans=0.2 2023-12-21 13:52:41,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=66413.33333333333, ans=0.125 2023-12-21 13:52:45,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=66480.0, ans=0.0 2023-12-21 13:52:53,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=66546.66666666667, ans=0.0 2023-12-21 13:52:54,319 INFO [train.py:886] (1/4) Epoch 3, batch 450, loss[loss=0.01395, audio_tagging_loss=0.01395, over 24750.00 frames. ], tot_loss[loss=0.01847, audio_tagging_loss=0.01847, over 4434152.23 frames. ], batch size: 99, lr: 3.25e-02, grad_scale: 64.0 2023-12-21 13:53:02,959 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.70 vs. limit=12.0 2023-12-21 13:53:04,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=66613.33333333333, ans=0.1 2023-12-21 13:53:05,122 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.06 vs. limit=6.0 2023-12-21 13:53:06,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=66613.33333333333, ans=0.125 2023-12-21 13:53:17,868 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+01 2.418e+01 2.694e+01 2.965e+01 4.467e+01, threshold=5.389e+01, percent-clipped=0.0 2023-12-21 13:53:21,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=66680.0, ans=0.125 2023-12-21 13:53:41,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=66813.33333333333, ans=0.0 2023-12-21 13:53:46,581 INFO [train.py:886] (1/4) Epoch 3, batch 500, loss[loss=0.01725, audio_tagging_loss=0.01725, over 25000.00 frames. ], tot_loss[loss=0.01829, audio_tagging_loss=0.01829, over 4548079.21 frames. ], batch size: 100, lr: 3.24e-02, grad_scale: 64.0 2023-12-21 13:54:09,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=67013.33333333333, ans=0.125 2023-12-21 13:54:28,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=67146.66666666667, ans=0.1 2023-12-21 13:54:38,665 INFO [train.py:886] (1/4) Epoch 3, batch 550, loss[loss=0.01451, audio_tagging_loss=0.01451, over 25000.00 frames. ], tot_loss[loss=0.01822, audio_tagging_loss=0.01822, over 4639218.07 frames. ], batch size: 100, lr: 3.24e-02, grad_scale: 64.0 2023-12-21 13:55:02,552 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.176e+01 2.576e+01 2.807e+01 3.063e+01 3.994e+01, threshold=5.614e+01, percent-clipped=0.0 2023-12-21 13:55:11,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=67413.33333333333, ans=0.0 2023-12-21 13:55:14,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=67413.33333333333, ans=10.0 2023-12-21 13:55:14,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=67413.33333333333, ans=0.125 2023-12-21 13:55:18,844 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.440e+01 2023-12-21 13:55:28,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=67480.0, ans=0.0 2023-12-21 13:55:29,073 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.567e+00 2023-12-21 13:55:29,819 INFO [train.py:886] (1/4) Epoch 3, batch 600, loss[loss=0.01684, audio_tagging_loss=0.01684, over 24750.00 frames. ], tot_loss[loss=0.01828, audio_tagging_loss=0.01828, over 4710056.20 frames. ], batch size: 99, lr: 3.23e-02, grad_scale: 64.0 2023-12-21 13:55:51,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=67680.0, ans=0.0 2023-12-21 13:55:51,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=67680.0, ans=0.2 2023-12-21 13:55:51,437 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.20 vs. limit=15.0 2023-12-21 13:55:54,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=67680.0, ans=0.2 2023-12-21 13:56:18,902 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.07 vs. limit=6.0 2023-12-21 13:56:22,132 INFO [train.py:886] (1/4) Epoch 3, batch 650, loss[loss=0.01786, audio_tagging_loss=0.01786, over 24750.00 frames. ], tot_loss[loss=0.01831, audio_tagging_loss=0.01831, over 4754544.20 frames. ], batch size: 99, lr: 3.23e-02, grad_scale: 64.0 2023-12-21 13:56:29,527 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 13:56:34,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=67946.66666666667, ans=0.125 2023-12-21 13:56:44,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=68013.33333333333, ans=0.2 2023-12-21 13:56:46,302 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.19 vs. limit=15.0 2023-12-21 13:56:46,692 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+01 2.530e+01 2.784e+01 3.010e+01 3.967e+01, threshold=5.567e+01, percent-clipped=0.0 2023-12-21 13:56:47,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=68013.33333333333, ans=0.125 2023-12-21 13:56:56,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=68080.0, ans=0.125 2023-12-21 13:56:58,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=68080.0, ans=0.125 2023-12-21 13:56:59,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=68080.0, ans=0.0 2023-12-21 13:57:04,864 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=17.72 vs. limit=15.0 2023-12-21 13:57:15,061 INFO [train.py:886] (1/4) Epoch 3, batch 700, loss[loss=0.01876, audio_tagging_loss=0.01876, over 24750.00 frames. ], tot_loss[loss=0.01825, audio_tagging_loss=0.01825, over 4795464.46 frames. ], batch size: 99, lr: 3.22e-02, grad_scale: 64.0 2023-12-21 13:57:37,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=68346.66666666667, ans=0.0 2023-12-21 13:57:53,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=68413.33333333333, ans=0.1 2023-12-21 13:57:58,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=68480.0, ans=0.125 2023-12-21 13:58:06,477 INFO [train.py:886] (1/4) Epoch 3, batch 750, loss[loss=0.0186, audio_tagging_loss=0.0186, over 22350.00 frames. ], tot_loss[loss=0.01817, audio_tagging_loss=0.01817, over 4828650.60 frames. ], batch size: 107, lr: 3.22e-02, grad_scale: 64.0 2023-12-21 13:58:14,881 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.97 vs. limit=22.5 2023-12-21 13:58:23,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=68613.33333333333, ans=0.0 2023-12-21 13:58:26,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=68680.0, ans=0.2 2023-12-21 13:58:29,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=68680.0, ans=0.1 2023-12-21 13:58:30,490 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.144e+01 2.631e+01 2.824e+01 3.219e+01 3.992e+01, threshold=5.647e+01, percent-clipped=0.0 2023-12-21 13:58:35,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=68680.0, ans=0.1 2023-12-21 13:58:59,115 INFO [train.py:886] (1/4) Epoch 3, batch 800, loss[loss=0.01976, audio_tagging_loss=0.01976, over 25000.00 frames. ], tot_loss[loss=0.01815, audio_tagging_loss=0.01815, over 4854334.30 frames. ], batch size: 100, lr: 3.21e-02, grad_scale: 64.0 2023-12-21 13:59:01,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=68880.0, ans=0.07 2023-12-21 13:59:12,642 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.93 vs. limit=15.0 2023-12-21 13:59:28,795 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.88 vs. limit=6.0 2023-12-21 13:59:32,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=69080.0, ans=0.1 2023-12-21 13:59:42,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=69146.66666666667, ans=0.125 2023-12-21 13:59:50,047 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.33 vs. limit=15.0 2023-12-21 13:59:50,505 INFO [train.py:886] (1/4) Epoch 3, batch 850, loss[loss=0.01416, audio_tagging_loss=0.01416, over 25000.00 frames. ], tot_loss[loss=0.01814, audio_tagging_loss=0.01814, over 4881501.85 frames. ], batch size: 100, lr: 3.21e-02, grad_scale: 64.0 2023-12-21 13:59:54,487 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.59 vs. limit=15.0 2023-12-21 14:00:14,051 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.575e+01 2.735e+01 3.023e+01 4.765e+01, threshold=5.470e+01, percent-clipped=0.0 2023-12-21 14:00:31,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=21.32 vs. limit=22.5 2023-12-21 14:00:35,948 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 14:00:42,367 INFO [train.py:886] (1/4) Epoch 3, batch 900, loss[loss=0.01948, audio_tagging_loss=0.01948, over 25000.00 frames. ], tot_loss[loss=0.01816, audio_tagging_loss=0.01816, over 4900988.73 frames. ], batch size: 100, lr: 3.20e-02, grad_scale: 64.0 2023-12-21 14:00:46,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=69546.66666666667, ans=0.0 2023-12-21 14:01:06,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=69680.0, ans=0.0 2023-12-21 14:01:08,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=69680.0, ans=0.125 2023-12-21 14:01:30,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=69813.33333333333, ans=0.0 2023-12-21 14:01:34,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=69880.0, ans=0.125 2023-12-21 14:01:35,267 INFO [train.py:886] (1/4) Epoch 3, batch 950, loss[loss=0.01823, audio_tagging_loss=0.01823, over 24750.00 frames. ], tot_loss[loss=0.01822, audio_tagging_loss=0.01822, over 4910160.17 frames. ], batch size: 99, lr: 3.20e-02, grad_scale: 64.0 2023-12-21 14:01:40,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=69880.0, ans=0.0 2023-12-21 14:01:57,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=70013.33333333333, ans=0.0 2023-12-21 14:01:58,497 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.202e+01 2.594e+01 2.861e+01 3.061e+01 4.080e+01, threshold=5.722e+01, percent-clipped=0.0 2023-12-21 14:01:59,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=70013.33333333333, ans=0.07 2023-12-21 14:02:08,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=70080.0, ans=0.1 2023-12-21 14:02:08,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=70080.0, ans=0.1 2023-12-21 14:02:25,381 INFO [train.py:886] (1/4) Epoch 3, batch 1000, loss[loss=0.02011, audio_tagging_loss=0.02011, over 24750.00 frames. ], tot_loss[loss=0.01806, audio_tagging_loss=0.01806, over 4912922.40 frames. ], batch size: 99, lr: 3.19e-02, grad_scale: 64.0 2023-12-21 14:02:56,239 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=20.17 vs. limit=15.0 2023-12-21 14:03:18,559 INFO [train.py:886] (1/4) Epoch 3, batch 1050, loss[loss=0.02158, audio_tagging_loss=0.02158, over 25000.00 frames. ], tot_loss[loss=0.01793, audio_tagging_loss=0.01793, over 4922695.40 frames. ], batch size: 100, lr: 3.19e-02, grad_scale: 64.0 2023-12-21 14:03:26,671 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.86 vs. limit=22.5 2023-12-21 14:03:26,806 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=20.67 vs. limit=15.0 2023-12-21 14:03:35,902 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.39 vs. limit=15.0 2023-12-21 14:03:43,856 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.177e+01 2.533e+01 2.739e+01 3.016e+01 3.717e+01, threshold=5.478e+01, percent-clipped=0.0 2023-12-21 14:04:07,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=70813.33333333333, ans=0.125 2023-12-21 14:04:11,156 INFO [train.py:886] (1/4) Epoch 3, batch 1100, loss[loss=0.02075, audio_tagging_loss=0.02075, over 25000.00 frames. ], tot_loss[loss=0.01787, audio_tagging_loss=0.01787, over 4929669.37 frames. ], batch size: 100, lr: 3.18e-02, grad_scale: 64.0 2023-12-21 14:04:17,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=70880.0, ans=0.125 2023-12-21 14:04:31,392 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.64 vs. limit=15.0 2023-12-21 14:04:36,181 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.93 vs. limit=12.0 2023-12-21 14:04:39,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=71013.33333333333, ans=0.125 2023-12-21 14:04:52,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=71146.66666666667, ans=0.125 2023-12-21 14:05:02,658 INFO [train.py:886] (1/4) Epoch 3, batch 1150, loss[loss=0.01754, audio_tagging_loss=0.01754, over 24750.00 frames. ], tot_loss[loss=0.0179, audio_tagging_loss=0.0179, over 4938285.66 frames. ], batch size: 99, lr: 3.18e-02, grad_scale: 64.0 2023-12-21 14:05:07,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=71213.33333333333, ans=0.2 2023-12-21 14:05:20,103 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=6.358e-01 2023-12-21 14:05:24,068 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.686e+01 2023-12-21 14:05:25,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=71346.66666666667, ans=0.025 2023-12-21 14:05:27,720 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.087e+01 2.554e+01 2.817e+01 3.036e+01 4.286e+01, threshold=5.635e+01, percent-clipped=0.0 2023-12-21 14:05:45,310 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.38 vs. limit=15.0 2023-12-21 14:05:56,110 INFO [train.py:886] (1/4) Epoch 3, batch 1200, loss[loss=0.02034, audio_tagging_loss=0.02034, over 25000.00 frames. ], tot_loss[loss=0.01792, audio_tagging_loss=0.01792, over 4942342.37 frames. ], batch size: 100, lr: 3.17e-02, grad_scale: 64.0 2023-12-21 14:06:06,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=71613.33333333333, ans=0.0 2023-12-21 14:06:12,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=71613.33333333333, ans=0.0 2023-12-21 14:06:22,808 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.34 vs. limit=15.0 2023-12-21 14:06:29,793 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.37 vs. limit=15.0 2023-12-21 14:06:38,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=71813.33333333333, ans=0.0 2023-12-21 14:06:46,189 INFO [train.py:886] (1/4) Epoch 3, batch 1250, loss[loss=0.01742, audio_tagging_loss=0.01742, over 24750.00 frames. ], tot_loss[loss=0.01812, audio_tagging_loss=0.01812, over 4944252.91 frames. ], batch size: 99, lr: 3.17e-02, grad_scale: 64.0 2023-12-21 14:06:52,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=71880.0, ans=0.125 2023-12-21 14:06:54,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=71880.0, ans=0.07 2023-12-21 14:07:10,845 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.031e+01 2.406e+01 2.702e+01 2.933e+01 3.398e+01, threshold=5.404e+01, percent-clipped=0.0 2023-12-21 14:07:12,542 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=14.20 vs. limit=15.0 2023-12-21 14:07:14,199 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.77 vs. limit=22.5 2023-12-21 14:07:20,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=72080.0, ans=0.0 2023-12-21 14:07:28,518 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.27 vs. limit=22.5 2023-12-21 14:07:32,933 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.99 vs. limit=15.0 2023-12-21 14:07:34,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=72146.66666666667, ans=0.0 2023-12-21 14:07:37,263 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.252e+01 2023-12-21 14:07:38,888 INFO [train.py:886] (1/4) Epoch 3, batch 1300, loss[loss=0.01854, audio_tagging_loss=0.01854, over 25000.00 frames. ], tot_loss[loss=0.01811, audio_tagging_loss=0.01811, over 4944111.56 frames. ], batch size: 100, lr: 3.16e-02, grad_scale: 64.0 2023-12-21 14:07:51,533 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.74 vs. limit=22.5 2023-12-21 14:08:09,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=72413.33333333333, ans=0.1 2023-12-21 14:08:21,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=72480.0, ans=0.125 2023-12-21 14:08:23,808 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=19.25 vs. limit=15.0 2023-12-21 14:08:23,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.72 vs. limit=15.0 2023-12-21 14:08:29,925 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.95 vs. limit=10.0 2023-12-21 14:08:31,419 INFO [train.py:886] (1/4) Epoch 3, batch 1350, loss[loss=0.01672, audio_tagging_loss=0.01672, over 25000.00 frames. ], tot_loss[loss=0.0181, audio_tagging_loss=0.0181, over 4947331.51 frames. ], batch size: 100, lr: 3.16e-02, grad_scale: 64.0 2023-12-21 14:08:41,264 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.97 vs. limit=15.0 2023-12-21 14:08:54,018 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.196e+01 2.555e+01 2.757e+01 3.030e+01 4.227e+01, threshold=5.514e+01, percent-clipped=0.0 2023-12-21 14:09:01,742 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.09 vs. limit=10.0 2023-12-21 14:09:11,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=72813.33333333333, ans=0.035 2023-12-21 14:09:12,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=72813.33333333333, ans=0.125 2023-12-21 14:09:14,693 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.60 vs. limit=15.0 2023-12-21 14:09:21,827 INFO [train.py:886] (1/4) Epoch 3, batch 1400, loss[loss=0.01519, audio_tagging_loss=0.01519, over 24750.00 frames. ], tot_loss[loss=0.01812, audio_tagging_loss=0.01812, over 4951057.73 frames. ], batch size: 99, lr: 3.15e-02, grad_scale: 64.0 2023-12-21 14:09:25,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=72880.0, ans=0.125 2023-12-21 14:09:27,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=72880.0, ans=0.125 2023-12-21 14:09:34,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.06 vs. limit=22.5 2023-12-21 14:09:35,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=72946.66666666667, ans=0.125 2023-12-21 14:09:36,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=72946.66666666667, ans=15.0 2023-12-21 14:09:37,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=72946.66666666667, ans=0.125 2023-12-21 14:10:01,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=73080.0, ans=0.0 2023-12-21 14:10:13,695 INFO [train.py:886] (1/4) Epoch 3, batch 1450, loss[loss=0.01605, audio_tagging_loss=0.01605, over 25000.00 frames. ], tot_loss[loss=0.01801, audio_tagging_loss=0.01801, over 4954175.18 frames. ], batch size: 100, lr: 3.15e-02, grad_scale: 64.0 2023-12-21 14:10:22,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=73213.33333333333, ans=0.0 2023-12-21 14:10:23,412 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.04 vs. limit=6.0 2023-12-21 14:10:36,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=73346.66666666667, ans=0.125 2023-12-21 14:10:38,205 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.097e+01 2.554e+01 2.770e+01 3.046e+01 3.677e+01, threshold=5.540e+01, percent-clipped=0.0 2023-12-21 14:10:43,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=73346.66666666667, ans=0.1 2023-12-21 14:10:59,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=73480.0, ans=0.125 2023-12-21 14:11:03,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=73480.0, ans=0.07 2023-12-21 14:11:04,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=73546.66666666667, ans=0.125 2023-12-21 14:11:05,634 INFO [train.py:886] (1/4) Epoch 3, batch 1500, loss[loss=0.01402, audio_tagging_loss=0.01402, over 25000.00 frames. ], tot_loss[loss=0.0179, audio_tagging_loss=0.0179, over 4957805.70 frames. ], batch size: 100, lr: 3.14e-02, grad_scale: 64.0 2023-12-21 14:11:31,886 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.19 vs. limit=12.0 2023-12-21 14:11:49,995 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.29 vs. limit=22.5 2023-12-21 14:11:57,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=73880.0, ans=0.0 2023-12-21 14:11:58,011 INFO [train.py:886] (1/4) Epoch 3, batch 1550, loss[loss=0.01758, audio_tagging_loss=0.01758, over 24750.00 frames. ], tot_loss[loss=0.01802, audio_tagging_loss=0.01802, over 4954539.03 frames. ], batch size: 99, lr: 3.14e-02, grad_scale: 64.0 2023-12-21 14:12:15,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=73946.66666666667, ans=0.1 2023-12-21 14:12:16,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=73946.66666666667, ans=0.125 2023-12-21 14:12:22,091 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.202e+01 2.602e+01 2.823e+01 3.207e+01 3.964e+01, threshold=5.646e+01, percent-clipped=0.0 2023-12-21 14:12:33,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=74080.0, ans=0.0 2023-12-21 14:12:36,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=74080.0, ans=0.04949747468305833 2023-12-21 14:12:37,687 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.19 vs. limit=15.0 2023-12-21 14:12:39,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=74146.66666666667, ans=0.1 2023-12-21 14:12:40,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=74146.66666666667, ans=0.125 2023-12-21 14:12:49,724 INFO [train.py:886] (1/4) Epoch 3, batch 1600, loss[loss=0.01669, audio_tagging_loss=0.01669, over 24750.00 frames. ], tot_loss[loss=0.01814, audio_tagging_loss=0.01814, over 4953210.98 frames. ], batch size: 99, lr: 3.13e-02, grad_scale: 64.0 2023-12-21 14:12:54,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=74213.33333333333, ans=0.09899494936611666 2023-12-21 14:13:02,342 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.30 vs. limit=22.5 2023-12-21 14:13:04,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=74280.0, ans=0.0 2023-12-21 14:13:04,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=74280.0, ans=0.0 2023-12-21 14:13:10,467 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.99 vs. limit=22.5 2023-12-21 14:13:15,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=74346.66666666667, ans=0.0 2023-12-21 14:13:27,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=74413.33333333333, ans=0.0 2023-12-21 14:13:28,925 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.90 vs. limit=6.0 2023-12-21 14:13:39,709 INFO [train.py:886] (1/4) Epoch 3, batch 1650, loss[loss=0.0154, audio_tagging_loss=0.0154, over 24750.00 frames. ], tot_loss[loss=0.01797, audio_tagging_loss=0.01797, over 4949212.53 frames. ], batch size: 99, lr: 3.13e-02, grad_scale: 64.0 2023-12-21 14:13:52,067 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.35 vs. limit=12.0 2023-12-21 14:13:54,850 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.77 vs. limit=6.0 2023-12-21 14:13:55,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=74613.33333333333, ans=0.125 2023-12-21 14:13:55,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=74613.33333333333, ans=0.125 2023-12-21 14:13:58,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=74613.33333333333, ans=0.125 2023-12-21 14:14:03,657 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.550e+01 2.728e+01 3.001e+01 3.650e+01, threshold=5.456e+01, percent-clipped=0.0 2023-12-21 14:14:06,142 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.49 vs. limit=15.0 2023-12-21 14:14:15,518 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.86 vs. limit=6.0 2023-12-21 14:14:18,986 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.27 vs. limit=15.0 2023-12-21 14:14:30,601 INFO [train.py:886] (1/4) Epoch 3, batch 1700, loss[loss=0.01732, audio_tagging_loss=0.01732, over 24027.00 frames. ], tot_loss[loss=0.01792, audio_tagging_loss=0.01792, over 4946025.11 frames. ], batch size: 100, lr: 3.12e-02, grad_scale: 64.0 2023-12-21 14:14:30,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=74880.0, ans=0.2 2023-12-21 14:14:47,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=74946.66666666667, ans=0.07 2023-12-21 14:15:02,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=75080.0, ans=0.125 2023-12-21 14:15:06,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=75080.0, ans=0.125 2023-12-21 14:15:17,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.28 vs. limit=15.0 2023-12-21 14:15:22,039 INFO [train.py:886] (1/4) Epoch 3, batch 1750, loss[loss=0.01698, audio_tagging_loss=0.01698, over 25000.00 frames. ], tot_loss[loss=0.01785, audio_tagging_loss=0.01785, over 4949315.52 frames. ], batch size: 100, lr: 3.12e-02, grad_scale: 64.0 2023-12-21 14:15:22,638 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.99 vs. limit=6.0 2023-12-21 14:15:25,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=75213.33333333333, ans=0.125 2023-12-21 14:15:28,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=75213.33333333333, ans=0.1 2023-12-21 14:15:44,389 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.559e+01 2.774e+01 3.008e+01 3.944e+01, threshold=5.548e+01, percent-clipped=0.0 2023-12-21 14:15:54,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=75413.33333333333, ans=0.0 2023-12-21 14:15:58,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=75413.33333333333, ans=0.07 2023-12-21 14:16:12,126 INFO [train.py:886] (1/4) Epoch 3, batch 1800, loss[loss=0.01818, audio_tagging_loss=0.01818, over 25000.00 frames. ], tot_loss[loss=0.01777, audio_tagging_loss=0.01777, over 4950081.30 frames. ], batch size: 100, lr: 3.11e-02, grad_scale: 64.0 2023-12-21 14:16:38,313 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.32 vs. limit=15.0 2023-12-21 14:16:39,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=75680.0, ans=0.2 2023-12-21 14:16:40,323 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.78 vs. limit=10.0 2023-12-21 14:16:43,368 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.96 vs. limit=15.0 2023-12-21 14:16:44,145 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.89 vs. limit=22.5 2023-12-21 14:16:44,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=75746.66666666667, ans=15.0 2023-12-21 14:16:49,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=75746.66666666667, ans=0.07 2023-12-21 14:17:04,503 INFO [train.py:886] (1/4) Epoch 3, batch 1850, loss[loss=0.01649, audio_tagging_loss=0.01649, over 25000.00 frames. ], tot_loss[loss=0.0179, audio_tagging_loss=0.0179, over 4948220.59 frames. ], batch size: 100, lr: 3.11e-02, grad_scale: 64.0 2023-12-21 14:17:27,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=76013.33333333333, ans=0.125 2023-12-21 14:17:28,690 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.193e+01 2.592e+01 2.758e+01 3.000e+01 3.750e+01, threshold=5.517e+01, percent-clipped=0.0 2023-12-21 14:17:31,287 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.67 vs. limit=6.0 2023-12-21 14:17:44,543 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.85 vs. limit=10.0 2023-12-21 14:17:45,582 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=19.51 vs. limit=15.0 2023-12-21 14:17:55,765 INFO [train.py:886] (1/4) Epoch 3, batch 1900, loss[loss=0.01769, audio_tagging_loss=0.01769, over 24750.00 frames. ], tot_loss[loss=0.01797, audio_tagging_loss=0.01797, over 4936400.18 frames. ], batch size: 99, lr: 3.11e-02, grad_scale: 64.0 2023-12-21 14:18:03,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=76213.33333333333, ans=0.125 2023-12-21 14:18:08,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=76280.0, ans=0.2 2023-12-21 14:18:40,327 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=6.601e-01 2023-12-21 14:18:46,718 INFO [train.py:886] (1/4) Epoch 3, batch 1950, loss[loss=0.02202, audio_tagging_loss=0.02202, over 24750.00 frames. ], tot_loss[loss=0.01797, audio_tagging_loss=0.01797, over 4933076.30 frames. ], batch size: 99, lr: 3.10e-02, grad_scale: 64.0 2023-12-21 14:18:56,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=76613.33333333333, ans=0.1 2023-12-21 14:19:02,684 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.33 vs. limit=12.0 2023-12-21 14:19:10,602 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.230e+01 2.559e+01 2.784e+01 3.135e+01 5.134e+01, threshold=5.569e+01, percent-clipped=0.0 2023-12-21 14:19:21,604 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.13 vs. limit=15.0 2023-12-21 14:19:23,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=76746.66666666667, ans=0.0 2023-12-21 14:19:30,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=76813.33333333333, ans=0.0 2023-12-21 14:19:37,602 INFO [train.py:886] (1/4) Epoch 3, batch 2000, loss[loss=0.01425, audio_tagging_loss=0.01425, over 24750.00 frames. ], tot_loss[loss=0.01789, audio_tagging_loss=0.01789, over 4929154.31 frames. ], batch size: 99, lr: 3.10e-02, grad_scale: 64.0 2023-12-21 14:19:40,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=76880.0, ans=0.0 2023-12-21 14:19:43,679 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.28 vs. limit=22.5 2023-12-21 14:19:45,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=76880.0, ans=0.125 2023-12-21 14:19:47,760 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.11 vs. limit=8.0 2023-12-21 14:19:51,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=76946.66666666667, ans=0.125 2023-12-21 14:19:55,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=77013.33333333333, ans=0.0 2023-12-21 14:20:15,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=77080.0, ans=0.5 2023-12-21 14:20:19,076 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.14 vs. limit=6.0 2023-12-21 14:20:27,161 INFO [train.py:886] (1/4) Epoch 3, batch 2050, loss[loss=0.01783, audio_tagging_loss=0.01783, over 25000.00 frames. ], tot_loss[loss=0.0178, audio_tagging_loss=0.0178, over 4932791.39 frames. ], batch size: 100, lr: 3.09e-02, grad_scale: 128.0 2023-12-21 14:20:29,540 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.84 vs. limit=15.0 2023-12-21 14:20:33,739 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.86 vs. limit=15.0 2023-12-21 14:20:51,404 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+01 2.488e+01 2.688e+01 3.036e+01 3.855e+01, threshold=5.376e+01, percent-clipped=0.0 2023-12-21 14:20:51,886 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.84 vs. limit=6.0 2023-12-21 14:20:58,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=77413.33333333333, ans=0.0 2023-12-21 14:21:07,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=77413.33333333333, ans=10.0 2023-12-21 14:21:09,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=77480.0, ans=0.125 2023-12-21 14:21:19,152 INFO [train.py:886] (1/4) Epoch 3, batch 2100, loss[loss=0.01777, audio_tagging_loss=0.01777, over 25000.00 frames. ], tot_loss[loss=0.01776, audio_tagging_loss=0.01776, over 4938921.34 frames. ], batch size: 100, lr: 3.09e-02, grad_scale: 128.0 2023-12-21 14:21:22,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=77546.66666666667, ans=0.1 2023-12-21 14:21:31,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=77613.33333333333, ans=0.2 2023-12-21 14:21:36,615 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.10 vs. limit=15.0 2023-12-21 14:21:38,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=77680.0, ans=0.1 2023-12-21 14:21:42,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=77680.0, ans=0.0 2023-12-21 14:21:47,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=77680.0, ans=0.0 2023-12-21 14:22:07,071 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.22 vs. limit=15.0 2023-12-21 14:22:08,611 INFO [train.py:886] (1/4) Epoch 3, batch 2150, loss[loss=0.01726, audio_tagging_loss=0.01726, over 25000.00 frames. ], tot_loss[loss=0.01768, audio_tagging_loss=0.01768, over 4946300.39 frames. ], batch size: 100, lr: 3.08e-02, grad_scale: 128.0 2023-12-21 14:22:27,241 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=20.15 vs. limit=15.0 2023-12-21 14:22:28,172 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.39 vs. limit=15.0 2023-12-21 14:22:32,516 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.494e+01 2.667e+01 2.941e+01 3.537e+01, threshold=5.333e+01, percent-clipped=0.0 2023-12-21 14:22:37,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=78013.33333333333, ans=0.1 2023-12-21 14:22:38,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=78013.33333333333, ans=0.125 2023-12-21 14:22:46,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=78080.0, ans=0.125 2023-12-21 14:22:48,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.78 vs. limit=15.0 2023-12-21 14:22:50,724 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.179e+00 2023-12-21 14:23:01,006 INFO [train.py:886] (1/4) Epoch 3, batch 2200, loss[loss=0.01659, audio_tagging_loss=0.01659, over 24750.00 frames. ], tot_loss[loss=0.01772, audio_tagging_loss=0.01772, over 4943213.29 frames. ], batch size: 99, lr: 3.08e-02, grad_scale: 128.0 2023-12-21 14:23:08,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=78213.33333333333, ans=10.0 2023-12-21 14:23:22,709 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.65 vs. limit=15.0 2023-12-21 14:23:28,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=78346.66666666667, ans=0.0 2023-12-21 14:23:41,511 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.19 vs. limit=22.5 2023-12-21 14:23:53,460 INFO [train.py:886] (1/4) Epoch 3, batch 2250, loss[loss=0.01879, audio_tagging_loss=0.01879, over 24750.00 frames. ], tot_loss[loss=0.0178, audio_tagging_loss=0.0178, over 4941860.29 frames. ], batch size: 99, lr: 3.07e-02, grad_scale: 128.0 2023-12-21 14:23:55,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=78546.66666666667, ans=0.0 2023-12-21 14:23:56,608 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.30 vs. limit=22.5 2023-12-21 14:24:16,833 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.068e+01 2.575e+01 2.774e+01 3.056e+01 3.945e+01, threshold=5.548e+01, percent-clipped=0.0 2023-12-21 14:24:17,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.10 vs. limit=22.5 2023-12-21 14:24:26,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=78746.66666666667, ans=0.125 2023-12-21 14:24:31,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=78746.66666666667, ans=0.125 2023-12-21 14:24:31,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=78746.66666666667, ans=0.05 2023-12-21 14:24:31,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=78746.66666666667, ans=0.025 2023-12-21 14:24:35,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=78813.33333333333, ans=0.0 2023-12-21 14:24:43,249 INFO [train.py:886] (1/4) Epoch 3, batch 2300, loss[loss=0.01481, audio_tagging_loss=0.01481, over 25000.00 frames. ], tot_loss[loss=0.0177, audio_tagging_loss=0.0177, over 4940488.58 frames. ], batch size: 100, lr: 3.07e-02, grad_scale: 128.0 2023-12-21 14:25:06,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=79013.33333333333, ans=0.0 2023-12-21 14:25:10,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.30 vs. limit=22.5 2023-12-21 14:25:35,572 INFO [train.py:886] (1/4) Epoch 3, batch 2350, loss[loss=0.01834, audio_tagging_loss=0.01834, over 25000.00 frames. ], tot_loss[loss=0.01772, audio_tagging_loss=0.01772, over 4942247.29 frames. ], batch size: 100, lr: 3.06e-02, grad_scale: 128.0 2023-12-21 14:25:38,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=79213.33333333333, ans=0.0 2023-12-21 14:25:40,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=79213.33333333333, ans=0.125 2023-12-21 14:25:48,869 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=12.0 2023-12-21 14:25:50,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=79280.0, ans=0.0 2023-12-21 14:25:56,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=79346.66666666667, ans=0.125 2023-12-21 14:25:58,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=79346.66666666667, ans=0.125 2023-12-21 14:25:59,423 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.142e+01 2.547e+01 2.838e+01 3.114e+01 3.985e+01, threshold=5.677e+01, percent-clipped=0.0 2023-12-21 14:26:01,784 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.24 vs. limit=12.0 2023-12-21 14:26:02,911 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.35 vs. limit=22.5 2023-12-21 14:26:09,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.34 vs. limit=10.0 2023-12-21 14:26:15,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=79480.0, ans=0.1 2023-12-21 14:26:17,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=79480.0, ans=0.125 2023-12-21 14:26:24,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=79480.0, ans=0.0 2023-12-21 14:26:26,853 INFO [train.py:886] (1/4) Epoch 3, batch 2400, loss[loss=0.01595, audio_tagging_loss=0.01595, over 25000.00 frames. ], tot_loss[loss=0.01776, audio_tagging_loss=0.01776, over 4942516.96 frames. ], batch size: 100, lr: 3.06e-02, grad_scale: 128.0 2023-12-21 14:26:27,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=79546.66666666667, ans=0.0 2023-12-21 14:26:33,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=79546.66666666667, ans=0.2 2023-12-21 14:26:36,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=79613.33333333333, ans=0.125 2023-12-21 14:26:37,683 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.10 vs. limit=10.0 2023-12-21 14:26:42,432 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.06 vs. limit=15.0 2023-12-21 14:26:47,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=79680.0, ans=0.125 2023-12-21 14:27:01,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=79746.66666666667, ans=0.0 2023-12-21 14:27:12,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=79813.33333333333, ans=0.07 2023-12-21 14:27:17,482 INFO [train.py:886] (1/4) Epoch 3, batch 2450, loss[loss=0.01602, audio_tagging_loss=0.01602, over 25000.00 frames. ], tot_loss[loss=0.0178, audio_tagging_loss=0.0178, over 4947465.58 frames. ], batch size: 100, lr: 3.05e-02, grad_scale: 128.0 2023-12-21 14:27:19,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=79880.0, ans=0.125 2023-12-21 14:27:43,404 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.171e+01 2.542e+01 2.747e+01 3.025e+01 4.680e+01, threshold=5.493e+01, percent-clipped=0.0 2023-12-21 14:28:11,039 INFO [train.py:886] (1/4) Epoch 3, batch 2500, loss[loss=0.01594, audio_tagging_loss=0.01594, over 24750.00 frames. ], tot_loss[loss=0.01807, audio_tagging_loss=0.01807, over 4942451.43 frames. ], batch size: 99, lr: 3.05e-02, grad_scale: 128.0 2023-12-21 14:28:24,462 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.356e-02 2023-12-21 14:28:47,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=80413.33333333333, ans=0.1 2023-12-21 14:28:50,997 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.03 vs. limit=12.0 2023-12-21 14:28:54,818 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.02 vs. limit=15.0 2023-12-21 14:29:01,852 INFO [train.py:886] (1/4) Epoch 3, batch 2550, loss[loss=0.01677, audio_tagging_loss=0.01677, over 24750.00 frames. ], tot_loss[loss=0.01803, audio_tagging_loss=0.01803, over 4938551.20 frames. ], batch size: 99, lr: 3.05e-02, grad_scale: 128.0 2023-12-21 14:29:07,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=80546.66666666667, ans=0.0 2023-12-21 14:29:17,647 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=16.12 vs. limit=15.0 2023-12-21 14:29:24,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=80680.0, ans=0.2 2023-12-21 14:29:25,738 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.095e+01 2.659e+01 2.860e+01 3.083e+01 5.383e+01, threshold=5.720e+01, percent-clipped=0.0 2023-12-21 14:29:26,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=80680.0, ans=0.0 2023-12-21 14:29:42,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=80746.66666666667, ans=0.125 2023-12-21 14:29:52,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=80813.33333333333, ans=10.0 2023-12-21 14:29:54,294 INFO [train.py:886] (1/4) Epoch 3, batch 2600, loss[loss=0.01873, audio_tagging_loss=0.01873, over 24750.00 frames. ], tot_loss[loss=0.0179, audio_tagging_loss=0.0179, over 4939968.23 frames. ], batch size: 99, lr: 3.04e-02, grad_scale: 128.0 2023-12-21 14:30:12,817 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.76 vs. limit=15.0 2023-12-21 14:30:17,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.09 vs. limit=15.0 2023-12-21 14:30:19,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=81013.33333333333, ans=0.125 2023-12-21 14:30:24,255 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.33 vs. limit=22.5 2023-12-21 14:30:26,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=81080.0, ans=0.0 2023-12-21 14:30:34,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=81146.66666666667, ans=0.125 2023-12-21 14:30:46,925 INFO [train.py:886] (1/4) Epoch 3, batch 2650, loss[loss=0.01562, audio_tagging_loss=0.01562, over 25000.00 frames. ], tot_loss[loss=0.01776, audio_tagging_loss=0.01776, over 4943174.95 frames. ], batch size: 100, lr: 3.04e-02, grad_scale: 128.0 2023-12-21 14:31:09,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=81346.66666666667, ans=0.1 2023-12-21 14:31:10,400 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.055e+01 2.677e+01 2.899e+01 3.171e+01 4.559e+01, threshold=5.799e+01, percent-clipped=0.0 2023-12-21 14:31:35,614 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.16 vs. limit=12.0 2023-12-21 14:31:38,088 INFO [train.py:886] (1/4) Epoch 3, batch 2700, loss[loss=0.01614, audio_tagging_loss=0.01614, over 25000.00 frames. ], tot_loss[loss=0.01763, audio_tagging_loss=0.01763, over 4948505.87 frames. ], batch size: 100, lr: 3.03e-02, grad_scale: 128.0 2023-12-21 14:31:53,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=81613.33333333333, ans=0.0 2023-12-21 14:32:21,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=81813.33333333333, ans=0.0 2023-12-21 14:32:22,813 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.53 vs. limit=15.0 2023-12-21 14:32:29,883 INFO [train.py:886] (1/4) Epoch 3, batch 2750, loss[loss=0.018, audio_tagging_loss=0.018, over 25000.00 frames. ], tot_loss[loss=0.01759, audio_tagging_loss=0.01759, over 4956335.50 frames. ], batch size: 100, lr: 3.03e-02, grad_scale: 128.0 2023-12-21 14:32:32,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=81880.0, ans=0.0 2023-12-21 14:32:45,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=81946.66666666667, ans=0.0 2023-12-21 14:32:53,652 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.181e+01 2.528e+01 2.695e+01 2.960e+01 4.010e+01, threshold=5.390e+01, percent-clipped=0.0 2023-12-21 14:32:53,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=82013.33333333333, ans=0.125 2023-12-21 14:33:10,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=82146.66666666667, ans=0.2 2023-12-21 14:33:11,925 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.69 vs. limit=12.0 2023-12-21 14:33:20,861 INFO [train.py:886] (1/4) Epoch 3, batch 2800, loss[loss=0.01926, audio_tagging_loss=0.01926, over 24750.00 frames. ], tot_loss[loss=0.01764, audio_tagging_loss=0.01764, over 4955854.67 frames. ], batch size: 99, lr: 3.02e-02, grad_scale: 128.0 2023-12-21 14:33:33,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=82280.0, ans=0.0 2023-12-21 14:33:34,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=82280.0, ans=0.125 2023-12-21 14:33:36,414 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.74 vs. limit=22.5 2023-12-21 14:34:05,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=82480.0, ans=0.125 2023-12-21 14:34:12,666 INFO [train.py:886] (1/4) Epoch 3, batch 2850, loss[loss=0.0172, audio_tagging_loss=0.0172, over 24750.00 frames. ], tot_loss[loss=0.0178, audio_tagging_loss=0.0178, over 4954876.58 frames. ], batch size: 99, lr: 3.02e-02, grad_scale: 128.0 2023-12-21 14:34:15,829 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.43 vs. limit=15.0 2023-12-21 14:34:36,513 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.217e+01 2.481e+01 2.766e+01 3.054e+01 4.031e+01, threshold=5.532e+01, percent-clipped=0.0 2023-12-21 14:34:40,789 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.24 vs. limit=15.0 2023-12-21 14:35:01,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=82813.33333333333, ans=0.0 2023-12-21 14:35:04,877 INFO [train.py:886] (1/4) Epoch 3, batch 2900, loss[loss=0.01851, audio_tagging_loss=0.01851, over 24750.00 frames. ], tot_loss[loss=0.01769, audio_tagging_loss=0.01769, over 4943349.69 frames. ], batch size: 99, lr: 3.01e-02, grad_scale: 128.0 2023-12-21 14:35:06,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=82880.0, ans=0.1 2023-12-21 14:35:10,821 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.877e+00 2023-12-21 14:35:34,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=83013.33333333333, ans=0.04949747468305833 2023-12-21 14:35:39,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=83080.0, ans=0.125 2023-12-21 14:35:43,791 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.163e+00 2023-12-21 14:35:48,807 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.96 vs. limit=15.0 2023-12-21 14:35:56,377 INFO [train.py:886] (1/4) Epoch 3, batch 2950, loss[loss=0.01744, audio_tagging_loss=0.01744, over 25000.00 frames. ], tot_loss[loss=0.0176, audio_tagging_loss=0.0176, over 4945942.88 frames. ], batch size: 100, lr: 3.01e-02, grad_scale: 128.0 2023-12-21 14:35:56,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=83213.33333333333, ans=0.0 2023-12-21 14:36:04,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=83213.33333333333, ans=0.125 2023-12-21 14:36:04,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.50 vs. limit=6.0 2023-12-21 14:36:06,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2023-12-21 14:36:08,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=83280.0, ans=0.0 2023-12-21 14:36:12,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=83280.0, ans=0.125 2023-12-21 14:36:15,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=83346.66666666667, ans=0.125 2023-12-21 14:36:19,501 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+01 2.548e+01 2.774e+01 2.996e+01 3.948e+01, threshold=5.549e+01, percent-clipped=0.0 2023-12-21 14:36:31,096 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.27 vs. limit=12.0 2023-12-21 14:36:47,854 INFO [train.py:886] (1/4) Epoch 3, batch 3000, loss[loss=0.01808, audio_tagging_loss=0.01808, over 25000.00 frames. ], tot_loss[loss=0.01745, audio_tagging_loss=0.01745, over 4952118.19 frames. ], batch size: 100, lr: 3.00e-02, grad_scale: 128.0 2023-12-21 14:36:47,855 INFO [train.py:909] (1/4) Computing validation loss 2023-12-21 14:37:09,000 INFO [train.py:917] (1/4) Epoch 3, validation: loss=0.04203, audio_tagging_loss=0.04203, over 3737520.00 frames. 2023-12-21 14:37:09,001 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-21 14:37:09,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=83546.66666666667, ans=0.125 2023-12-21 14:37:39,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=83746.66666666667, ans=0.125 2023-12-21 14:37:50,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=83813.33333333333, ans=0.0 2023-12-21 14:37:52,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=83813.33333333333, ans=0.0 2023-12-21 14:37:55,639 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.69 vs. limit=15.0 2023-12-21 14:37:59,633 INFO [train.py:886] (1/4) Epoch 3, batch 3050, loss[loss=0.01574, audio_tagging_loss=0.01574, over 25000.00 frames. ], tot_loss[loss=0.01751, audio_tagging_loss=0.01751, over 4950907.64 frames. ], batch size: 100, lr: 3.00e-02, grad_scale: 128.0 2023-12-21 14:38:00,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=83880.0, ans=0.125 2023-12-21 14:38:02,392 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.267e+00 2023-12-21 14:38:14,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=83946.66666666667, ans=0.0 2023-12-21 14:38:15,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=83946.66666666667, ans=0.125 2023-12-21 14:38:23,516 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.446e+01 2.688e+01 2.913e+01 3.941e+01, threshold=5.377e+01, percent-clipped=0.0 2023-12-21 14:38:35,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.03 vs. limit=15.0 2023-12-21 14:38:46,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=84146.66666666667, ans=0.125 2023-12-21 14:38:47,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=84146.66666666667, ans=0.125 2023-12-21 14:38:51,852 INFO [train.py:886] (1/4) Epoch 3, batch 3100, loss[loss=0.01704, audio_tagging_loss=0.01704, over 25000.00 frames. ], tot_loss[loss=0.01749, audio_tagging_loss=0.01749, over 4953225.49 frames. ], batch size: 100, lr: 3.00e-02, grad_scale: 128.0 2023-12-21 14:38:55,144 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.11 vs. limit=15.0 2023-12-21 14:38:59,068 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.78 vs. limit=12.0 2023-12-21 14:38:59,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=84213.33333333333, ans=0.1 2023-12-21 14:39:36,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=84480.0, ans=0.125 2023-12-21 14:39:44,557 INFO [train.py:886] (1/4) Epoch 3, batch 3150, loss[loss=0.01863, audio_tagging_loss=0.01863, over 24750.00 frames. ], tot_loss[loss=0.01755, audio_tagging_loss=0.01755, over 4945107.95 frames. ], batch size: 99, lr: 2.99e-02, grad_scale: 128.0 2023-12-21 14:40:06,859 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.245e+01 2.586e+01 2.797e+01 3.017e+01 3.878e+01, threshold=5.594e+01, percent-clipped=0.0 2023-12-21 14:40:18,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=84746.66666666667, ans=0.2 2023-12-21 14:40:31,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=84813.33333333333, ans=10.0 2023-12-21 14:40:34,477 INFO [train.py:886] (1/4) Epoch 3, batch 3200, loss[loss=0.01359, audio_tagging_loss=0.01359, over 24750.00 frames. ], tot_loss[loss=0.01758, audio_tagging_loss=0.01758, over 4948059.11 frames. ], batch size: 99, lr: 2.99e-02, grad_scale: 128.0 2023-12-21 14:40:53,868 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.30 vs. limit=12.0 2023-12-21 14:40:57,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=85013.33333333333, ans=0.125 2023-12-21 14:41:01,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=85013.33333333333, ans=0.125 2023-12-21 14:41:02,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=85013.33333333333, ans=0.2 2023-12-21 14:41:11,543 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.09 vs. limit=22.5 2023-12-21 14:41:19,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=85146.66666666667, ans=0.125 2023-12-21 14:41:20,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=85146.66666666667, ans=0.125 2023-12-21 14:41:24,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=85146.66666666667, ans=0.1 2023-12-21 14:41:27,346 INFO [train.py:886] (1/4) Epoch 3, batch 3250, loss[loss=0.01984, audio_tagging_loss=0.01984, over 25000.00 frames. ], tot_loss[loss=0.0175, audio_tagging_loss=0.0175, over 4949186.43 frames. ], batch size: 100, lr: 2.98e-02, grad_scale: 128.0 2023-12-21 14:41:27,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=85213.33333333333, ans=0.1 2023-12-21 14:41:46,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=85346.66666666667, ans=0.1 2023-12-21 14:41:51,356 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 2.547e+01 2.711e+01 3.063e+01 4.521e+01, threshold=5.423e+01, percent-clipped=0.0 2023-12-21 14:41:57,723 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=22.15 vs. limit=22.5 2023-12-21 14:42:10,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=85480.0, ans=0.2 2023-12-21 14:42:17,717 INFO [train.py:886] (1/4) Epoch 3, batch 3300, loss[loss=0.01805, audio_tagging_loss=0.01805, over 25000.00 frames. ], tot_loss[loss=0.01741, audio_tagging_loss=0.01741, over 4947887.21 frames. ], batch size: 100, lr: 2.98e-02, grad_scale: 128.0 2023-12-21 14:42:39,626 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=16.18 vs. limit=15.0 2023-12-21 14:42:49,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=85746.66666666667, ans=0.0 2023-12-21 14:42:53,533 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.83 vs. limit=15.0 2023-12-21 14:43:01,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=85813.33333333333, ans=0.125 2023-12-21 14:43:10,636 INFO [train.py:886] (1/4) Epoch 3, batch 3350, loss[loss=0.01683, audio_tagging_loss=0.01683, over 25000.00 frames. ], tot_loss[loss=0.01743, audio_tagging_loss=0.01743, over 4951606.56 frames. ], batch size: 100, lr: 2.97e-02, grad_scale: 128.0 2023-12-21 14:43:34,513 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+01 2.545e+01 2.748e+01 2.993e+01 3.826e+01, threshold=5.495e+01, percent-clipped=0.0 2023-12-21 14:43:55,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=86146.66666666667, ans=0.2 2023-12-21 14:43:59,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=86146.66666666667, ans=0.125 2023-12-21 14:44:02,989 INFO [train.py:886] (1/4) Epoch 3, batch 3400, loss[loss=0.01708, audio_tagging_loss=0.01708, over 25000.00 frames. ], tot_loss[loss=0.01746, audio_tagging_loss=0.01746, over 4956006.98 frames. ], batch size: 100, lr: 2.97e-02, grad_scale: 128.0 2023-12-21 14:44:07,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=86213.33333333333, ans=0.125 2023-12-21 14:44:30,546 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2023-12-21 14:44:46,578 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=2.548e-03 2023-12-21 14:44:48,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=86480.0, ans=0.2 2023-12-21 14:44:48,779 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.00 vs. limit=15.0 2023-12-21 14:44:53,107 INFO [train.py:886] (1/4) Epoch 3, batch 3450, loss[loss=0.01606, audio_tagging_loss=0.01606, over 24750.00 frames. ], tot_loss[loss=0.01767, audio_tagging_loss=0.01767, over 4948202.01 frames. ], batch size: 99, lr: 2.97e-02, grad_scale: 128.0 2023-12-21 14:44:54,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=86546.66666666667, ans=0.0 2023-12-21 14:45:18,159 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.264e+01 2.604e+01 2.823e+01 3.074e+01 4.024e+01, threshold=5.647e+01, percent-clipped=0.0 2023-12-21 14:45:19,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=86680.0, ans=0.2 2023-12-21 14:45:19,757 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.19 vs. limit=22.5 2023-12-21 14:45:22,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=86680.0, ans=0.125 2023-12-21 14:45:22,719 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.71 vs. limit=6.0 2023-12-21 14:45:23,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=86680.0, ans=0.0 2023-12-21 14:45:27,122 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.89 vs. limit=15.0 2023-12-21 14:45:32,423 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.07 vs. limit=15.0 2023-12-21 14:45:41,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=86813.33333333333, ans=0.2 2023-12-21 14:45:44,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=86813.33333333333, ans=0.0 2023-12-21 14:45:46,445 INFO [train.py:886] (1/4) Epoch 3, batch 3500, loss[loss=0.01891, audio_tagging_loss=0.01891, over 24750.00 frames. ], tot_loss[loss=0.01768, audio_tagging_loss=0.01768, over 4949466.40 frames. ], batch size: 99, lr: 2.96e-02, grad_scale: 128.0 2023-12-21 14:45:51,770 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.62 vs. limit=6.0 2023-12-21 14:45:55,438 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.89 vs. limit=15.0 2023-12-21 14:45:58,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=86946.66666666667, ans=0.125 2023-12-21 14:46:00,859 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.49 vs. limit=15.0 2023-12-21 14:46:08,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=87013.33333333333, ans=0.1 2023-12-21 14:46:30,533 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.76 vs. limit=15.0 2023-12-21 14:46:38,194 INFO [train.py:886] (1/4) Epoch 3, batch 3550, loss[loss=0.01634, audio_tagging_loss=0.01634, over 25000.00 frames. ], tot_loss[loss=0.01762, audio_tagging_loss=0.01762, over 4947336.74 frames. ], batch size: 100, lr: 2.96e-02, grad_scale: 128.0 2023-12-21 14:46:56,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=87280.0, ans=0.125 2023-12-21 14:47:00,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=87346.66666666667, ans=0.125 2023-12-21 14:47:01,416 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.155e+01 2.500e+01 2.743e+01 3.014e+01 4.154e+01, threshold=5.485e+01, percent-clipped=0.0 2023-12-21 14:47:04,487 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.72 vs. limit=22.5 2023-12-21 14:47:18,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=87413.33333333333, ans=0.1 2023-12-21 14:47:28,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=87480.0, ans=0.125 2023-12-21 14:47:29,939 INFO [train.py:886] (1/4) Epoch 3, batch 3600, loss[loss=0.0163, audio_tagging_loss=0.0163, over 25000.00 frames. ], tot_loss[loss=0.01748, audio_tagging_loss=0.01748, over 4944359.48 frames. ], batch size: 100, lr: 2.95e-02, grad_scale: 128.0 2023-12-21 14:47:47,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=87613.33333333333, ans=0.125 2023-12-21 14:47:59,697 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.86 vs. limit=22.5 2023-12-21 14:48:11,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=87813.33333333333, ans=0.0 2023-12-21 14:48:22,032 INFO [train.py:886] (1/4) Epoch 3, batch 3650, loss[loss=0.01581, audio_tagging_loss=0.01581, over 25000.00 frames. ], tot_loss[loss=0.01743, audio_tagging_loss=0.01743, over 4953848.24 frames. ], batch size: 100, lr: 2.95e-02, grad_scale: 128.0 2023-12-21 14:48:24,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=87880.0, ans=0.125 2023-12-21 14:48:27,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=87880.0, ans=0.1 2023-12-21 14:48:45,863 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.110e+01 2.549e+01 2.738e+01 3.044e+01 4.294e+01, threshold=5.477e+01, percent-clipped=0.0 2023-12-21 14:48:51,761 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.32 vs. limit=15.0 2023-12-21 14:48:51,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.68 vs. limit=15.0 2023-12-21 14:49:06,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=88146.66666666667, ans=0.125 2023-12-21 14:49:13,680 INFO [train.py:886] (1/4) Epoch 3, batch 3700, loss[loss=0.01952, audio_tagging_loss=0.01952, over 24750.00 frames. ], tot_loss[loss=0.01743, audio_tagging_loss=0.01743, over 4958740.30 frames. ], batch size: 99, lr: 2.94e-02, grad_scale: 128.0 2023-12-21 14:49:22,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=88213.33333333333, ans=0.2 2023-12-21 14:49:22,835 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.00 vs. limit=15.0 2023-12-21 14:49:40,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=88346.66666666667, ans=0.125 2023-12-21 14:49:45,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=88413.33333333333, ans=0.125 2023-12-21 14:49:45,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=88413.33333333333, ans=0.125 2023-12-21 14:49:51,629 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.61 vs. limit=22.5 2023-12-21 14:50:02,625 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=19.46 vs. limit=15.0 2023-12-21 14:50:05,627 INFO [train.py:886] (1/4) Epoch 3, batch 3750, loss[loss=0.01992, audio_tagging_loss=0.01992, over 24750.00 frames. ], tot_loss[loss=0.0176, audio_tagging_loss=0.0176, over 4956698.44 frames. ], batch size: 99, lr: 2.94e-02, grad_scale: 128.0 2023-12-21 14:50:12,554 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.50 vs. limit=22.5 2023-12-21 14:50:22,362 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.58 vs. limit=22.5 2023-12-21 14:50:24,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=88613.33333333333, ans=0.125 2023-12-21 14:50:27,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=88680.0, ans=0.125 2023-12-21 14:50:29,286 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.124e+01 2.571e+01 2.734e+01 2.974e+01 3.491e+01, threshold=5.468e+01, percent-clipped=0.0 2023-12-21 14:50:33,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=88680.0, ans=0.0 2023-12-21 14:50:41,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=88746.66666666667, ans=0.125 2023-12-21 14:50:57,881 INFO [train.py:886] (1/4) Epoch 3, batch 3800, loss[loss=0.01976, audio_tagging_loss=0.01976, over 24750.00 frames. ], tot_loss[loss=0.01764, audio_tagging_loss=0.01764, over 4953247.92 frames. ], batch size: 99, lr: 2.94e-02, grad_scale: 128.0 2023-12-21 14:50:58,547 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.19 vs. limit=15.0 2023-12-21 14:51:14,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=88946.66666666667, ans=10.0 2023-12-21 14:51:18,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=89013.33333333333, ans=0.05 2023-12-21 14:51:29,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=89080.0, ans=0.1 2023-12-21 14:51:49,282 INFO [train.py:886] (1/4) Epoch 3, batch 3850, loss[loss=0.01733, audio_tagging_loss=0.01733, over 24750.00 frames. ], tot_loss[loss=0.01757, audio_tagging_loss=0.01757, over 4944061.89 frames. ], batch size: 99, lr: 2.93e-02, grad_scale: 128.0 2023-12-21 14:51:50,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=89213.33333333333, ans=0.1 2023-12-21 14:51:58,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=89280.0, ans=0.09899494936611666 2023-12-21 14:52:12,669 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.051e+01 2.537e+01 2.733e+01 2.945e+01 4.366e+01, threshold=5.467e+01, percent-clipped=0.0 2023-12-21 14:52:14,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=89346.66666666667, ans=0.2 2023-12-21 14:52:20,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=89413.33333333333, ans=0.1 2023-12-21 14:52:22,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=89413.33333333333, ans=0.0 2023-12-21 14:52:35,953 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.75 vs. limit=6.0 2023-12-21 14:52:40,931 INFO [train.py:886] (1/4) Epoch 3, batch 3900, loss[loss=0.01713, audio_tagging_loss=0.01713, over 24750.00 frames. ], tot_loss[loss=0.01748, audio_tagging_loss=0.01748, over 4945289.54 frames. ], batch size: 99, lr: 2.93e-02, grad_scale: 128.0 2023-12-21 14:52:44,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=89546.66666666667, ans=0.0 2023-12-21 14:52:52,339 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.73 vs. limit=15.0 2023-12-21 14:53:11,986 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.19 vs. limit=12.0 2023-12-21 14:53:17,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=89746.66666666667, ans=0.2 2023-12-21 14:53:33,206 INFO [train.py:886] (1/4) Epoch 3, batch 3950, loss[loss=0.01681, audio_tagging_loss=0.01681, over 25000.00 frames. ], tot_loss[loss=0.01741, audio_tagging_loss=0.01741, over 4948593.41 frames. ], batch size: 100, lr: 2.92e-02, grad_scale: 128.0 2023-12-21 14:53:51,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=89946.66666666667, ans=0.1 2023-12-21 14:53:57,088 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.521e+01 2.746e+01 2.975e+01 3.800e+01, threshold=5.493e+01, percent-clipped=0.0 2023-12-21 14:54:15,115 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.46 vs. limit=15.0 2023-12-21 14:54:24,075 INFO [train.py:886] (1/4) Epoch 3, batch 4000, loss[loss=0.01989, audio_tagging_loss=0.01989, over 25000.00 frames. ], tot_loss[loss=0.0175, audio_tagging_loss=0.0175, over 4957833.42 frames. ], batch size: 100, lr: 2.92e-02, grad_scale: 128.0 2023-12-21 14:54:27,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=90213.33333333333, ans=0.125 2023-12-21 14:54:51,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=90346.66666666667, ans=0.125 2023-12-21 14:54:51,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=90346.66666666667, ans=0.0 2023-12-21 14:55:05,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=90480.0, ans=0.1 2023-12-21 14:55:16,291 INFO [train.py:886] (1/4) Epoch 3, batch 4050, loss[loss=0.0174, audio_tagging_loss=0.0174, over 24750.00 frames. ], tot_loss[loss=0.0175, audio_tagging_loss=0.0175, over 4958841.18 frames. ], batch size: 99, lr: 2.92e-02, grad_scale: 256.0 2023-12-21 14:55:32,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=90613.33333333333, ans=0.125 2023-12-21 14:55:33,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=90613.33333333333, ans=0.125 2023-12-21 14:55:39,791 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.147e+01 2.624e+01 2.857e+01 3.103e+01 4.116e+01, threshold=5.714e+01, percent-clipped=0.0 2023-12-21 14:55:43,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=90680.0, ans=0.125 2023-12-21 14:55:51,265 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.13 vs. limit=15.0 2023-12-21 14:55:59,664 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.75 vs. limit=15.0 2023-12-21 14:56:00,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=90813.33333333333, ans=0.1 2023-12-21 14:56:06,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=90880.0, ans=0.125 2023-12-21 14:56:08,207 INFO [train.py:886] (1/4) Epoch 3, batch 4100, loss[loss=0.02008, audio_tagging_loss=0.02008, over 24750.00 frames. ], tot_loss[loss=0.01769, audio_tagging_loss=0.01769, over 4952321.15 frames. ], batch size: 99, lr: 2.91e-02, grad_scale: 256.0 2023-12-21 14:56:20,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=90946.66666666667, ans=0.125 2023-12-21 14:56:29,158 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2023-12-21 14:56:32,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=91013.33333333333, ans=0.125 2023-12-21 14:56:43,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=91080.0, ans=0.125 2023-12-21 14:56:54,551 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.53 vs. limit=15.0 2023-12-21 14:56:57,958 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.371e-01 2023-12-21 14:56:59,603 INFO [train.py:886] (1/4) Epoch 3, batch 4150, loss[loss=0.01825, audio_tagging_loss=0.01825, over 24750.00 frames. ], tot_loss[loss=0.01753, audio_tagging_loss=0.01753, over 4950992.78 frames. ], batch size: 99, lr: 2.91e-02, grad_scale: 256.0 2023-12-21 14:57:02,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=91213.33333333333, ans=0.125 2023-12-21 14:57:07,532 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.71 vs. limit=15.0 2023-12-21 14:57:09,390 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.96 vs. limit=22.5 2023-12-21 14:57:22,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=91346.66666666667, ans=0.125 2023-12-21 14:57:24,203 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.103e+01 2.605e+01 2.901e+01 3.180e+01 4.178e+01, threshold=5.801e+01, percent-clipped=0.0 2023-12-21 14:57:27,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=91346.66666666667, ans=0.125 2023-12-21 14:57:32,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=91413.33333333333, ans=0.1 2023-12-21 14:57:44,251 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.276e+01 2023-12-21 14:57:49,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=91480.0, ans=0.1 2023-12-21 14:57:52,740 INFO [train.py:886] (1/4) Epoch 3, batch 4200, loss[loss=0.01818, audio_tagging_loss=0.01818, over 24750.00 frames. ], tot_loss[loss=0.0175, audio_tagging_loss=0.0175, over 4952414.85 frames. ], batch size: 99, lr: 2.90e-02, grad_scale: 256.0 2023-12-21 14:58:05,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=91613.33333333333, ans=0.125 2023-12-21 14:58:08,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=91613.33333333333, ans=0.125 2023-12-21 14:58:19,664 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.339e+00 2023-12-21 14:58:26,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.61 vs. limit=15.0 2023-12-21 14:58:27,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=91746.66666666667, ans=0.04949747468305833 2023-12-21 14:58:28,747 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.87 vs. limit=15.0 2023-12-21 14:58:30,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=91746.66666666667, ans=0.1 2023-12-21 14:58:35,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=91813.33333333333, ans=0.125 2023-12-21 14:58:42,615 INFO [train.py:886] (1/4) Epoch 3, batch 4250, loss[loss=0.01519, audio_tagging_loss=0.01519, over 24750.00 frames. ], tot_loss[loss=0.01749, audio_tagging_loss=0.01749, over 4952111.06 frames. ], batch size: 99, lr: 2.90e-02, grad_scale: 256.0 2023-12-21 14:58:59,917 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=4.968e+00 2023-12-21 14:59:06,107 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.09 vs. limit=15.0 2023-12-21 14:59:07,473 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.175e+01 2.544e+01 2.694e+01 3.014e+01 4.277e+01, threshold=5.388e+01, percent-clipped=0.0 2023-12-21 14:59:17,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=92080.0, ans=0.125 2023-12-21 14:59:19,062 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.95 vs. limit=10.0 2023-12-21 14:59:22,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=92080.0, ans=0.1 2023-12-21 14:59:26,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=92146.66666666667, ans=0.0 2023-12-21 14:59:27,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=92146.66666666667, ans=0.125 2023-12-21 14:59:30,777 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.43 vs. limit=15.0 2023-12-21 14:59:36,002 INFO [train.py:886] (1/4) Epoch 3, batch 4300, loss[loss=0.02055, audio_tagging_loss=0.02055, over 25000.00 frames. ], tot_loss[loss=0.01738, audio_tagging_loss=0.01738, over 4951378.20 frames. ], batch size: 100, lr: 2.89e-02, grad_scale: 128.0 2023-12-21 14:59:36,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=92213.33333333333, ans=0.125 2023-12-21 14:59:41,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=92213.33333333333, ans=0.125 2023-12-21 14:59:44,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=92280.0, ans=0.125 2023-12-21 14:59:48,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=92280.0, ans=0.125 2023-12-21 14:59:51,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=92280.0, ans=0.07 2023-12-21 15:00:03,528 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 15:00:05,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=92346.66666666667, ans=0.1 2023-12-21 15:00:22,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=92480.0, ans=0.125 2023-12-21 15:00:26,775 INFO [train.py:886] (1/4) Epoch 3, batch 4350, loss[loss=0.01688, audio_tagging_loss=0.01688, over 25000.00 frames. ], tot_loss[loss=0.01747, audio_tagging_loss=0.01747, over 4953391.36 frames. ], batch size: 100, lr: 2.89e-02, grad_scale: 128.0 2023-12-21 15:00:42,840 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.26 vs. limit=12.0 2023-12-21 15:00:44,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=92613.33333333333, ans=0.0 2023-12-21 15:00:50,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=92680.0, ans=0.125 2023-12-21 15:00:50,896 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.529e+01 2.731e+01 2.913e+01 4.342e+01, threshold=5.462e+01, percent-clipped=0.0 2023-12-21 15:01:08,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=92813.33333333333, ans=0.0 2023-12-21 15:01:17,665 INFO [train.py:886] (1/4) Epoch 3, batch 4400, loss[loss=0.01712, audio_tagging_loss=0.01712, over 22649.00 frames. ], tot_loss[loss=0.01762, audio_tagging_loss=0.01762, over 4948334.75 frames. ], batch size: 107, lr: 2.89e-02, grad_scale: 128.0 2023-12-21 15:01:31,811 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.27 vs. limit=22.5 2023-12-21 15:01:43,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=93013.33333333333, ans=0.0 2023-12-21 15:01:45,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=93013.33333333333, ans=0.0 2023-12-21 15:01:51,498 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.25 vs. limit=15.0 2023-12-21 15:01:59,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=93146.66666666667, ans=0.125 2023-12-21 15:02:04,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.65 vs. limit=22.5 2023-12-21 15:02:10,670 INFO [train.py:886] (1/4) Epoch 3, batch 4450, loss[loss=0.02036, audio_tagging_loss=0.02036, over 24750.00 frames. ], tot_loss[loss=0.01764, audio_tagging_loss=0.01764, over 4944575.61 frames. ], batch size: 99, lr: 2.88e-02, grad_scale: 128.0 2023-12-21 15:02:12,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=93213.33333333333, ans=0.2 2023-12-21 15:02:33,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=93346.66666666667, ans=0.125 2023-12-21 15:02:34,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.96 vs. limit=15.0 2023-12-21 15:02:34,336 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+01 2.637e+01 2.853e+01 3.131e+01 4.120e+01, threshold=5.707e+01, percent-clipped=0.0 2023-12-21 15:02:37,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=93346.66666666667, ans=0.2 2023-12-21 15:02:42,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=93413.33333333333, ans=0.0 2023-12-21 15:02:54,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.47 vs. limit=15.0 2023-12-21 15:02:57,237 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.03 vs. limit=22.5 2023-12-21 15:03:02,132 INFO [train.py:886] (1/4) Epoch 3, batch 4500, loss[loss=0.01628, audio_tagging_loss=0.01628, over 25000.00 frames. ], tot_loss[loss=0.01757, audio_tagging_loss=0.01757, over 4946241.11 frames. ], batch size: 100, lr: 2.88e-02, grad_scale: 128.0 2023-12-21 15:03:16,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=93613.33333333333, ans=0.1 2023-12-21 15:03:21,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=93613.33333333333, ans=0.125 2023-12-21 15:03:29,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=93680.0, ans=0.125 2023-12-21 15:03:35,440 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.46 vs. limit=15.0 2023-12-21 15:03:54,133 INFO [train.py:886] (1/4) Epoch 3, batch 4550, loss[loss=0.01691, audio_tagging_loss=0.01691, over 25000.00 frames. ], tot_loss[loss=0.01749, audio_tagging_loss=0.01749, over 4952813.93 frames. ], batch size: 100, lr: 2.87e-02, grad_scale: 128.0 2023-12-21 15:03:59,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=93880.0, ans=0.1 2023-12-21 15:04:01,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=93880.0, ans=0.125 2023-12-21 15:04:01,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=93880.0, ans=0.0 2023-12-21 15:04:01,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=93880.0, ans=0.125 2023-12-21 15:04:09,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=93946.66666666667, ans=0.5 2023-12-21 15:04:18,731 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.146e+01 2.554e+01 2.788e+01 2.993e+01 3.924e+01, threshold=5.575e+01, percent-clipped=0.0 2023-12-21 15:04:21,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=94013.33333333333, ans=0.07 2023-12-21 15:04:31,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=94080.0, ans=0.0 2023-12-21 15:04:38,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=94146.66666666667, ans=0.125 2023-12-21 15:04:42,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=94146.66666666667, ans=0.125 2023-12-21 15:04:45,101 INFO [train.py:886] (1/4) Epoch 3, batch 4600, loss[loss=0.01932, audio_tagging_loss=0.01932, over 24888.00 frames. ], tot_loss[loss=0.01753, audio_tagging_loss=0.01753, over 4953503.07 frames. ], batch size: 100, lr: 2.87e-02, grad_scale: 128.0 2023-12-21 15:04:50,038 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.50 vs. limit=12.0 2023-12-21 15:04:52,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=94213.33333333333, ans=0.125 2023-12-21 15:04:55,062 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=8.645e-01 2023-12-21 15:05:12,287 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.94 vs. limit=15.0 2023-12-21 15:05:19,575 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.63 vs. limit=15.0 2023-12-21 15:05:21,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=94413.33333333333, ans=0.05 2023-12-21 15:05:23,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=94413.33333333333, ans=0.125 2023-12-21 15:05:34,959 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.409e+00 2023-12-21 15:05:35,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=94480.0, ans=0.125 2023-12-21 15:05:37,599 INFO [train.py:886] (1/4) Epoch 3, batch 4650, loss[loss=0.01549, audio_tagging_loss=0.01549, over 25000.00 frames. ], tot_loss[loss=0.01756, audio_tagging_loss=0.01756, over 4954631.38 frames. ], batch size: 100, lr: 2.87e-02, grad_scale: 128.0 2023-12-21 15:05:45,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=94546.66666666667, ans=0.2 2023-12-21 15:05:50,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=94613.33333333333, ans=0.0 2023-12-21 15:06:01,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=94680.0, ans=0.0 2023-12-21 15:06:03,480 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.165e+01 2.559e+01 2.807e+01 3.071e+01 3.874e+01, threshold=5.614e+01, percent-clipped=0.0 2023-12-21 15:06:05,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=94680.0, ans=0.0 2023-12-21 15:06:11,374 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=18.20 vs. limit=15.0 2023-12-21 15:06:23,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=94813.33333333333, ans=0.035 2023-12-21 15:06:27,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=94880.0, ans=0.125 2023-12-21 15:06:28,130 INFO [train.py:886] (1/4) Epoch 3, batch 4700, loss[loss=0.01875, audio_tagging_loss=0.01875, over 24750.00 frames. ], tot_loss[loss=0.01751, audio_tagging_loss=0.01751, over 4949711.41 frames. ], batch size: 99, lr: 2.86e-02, grad_scale: 128.0 2023-12-21 15:06:37,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=94946.66666666667, ans=0.125 2023-12-21 15:06:50,901 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=20.04 vs. limit=22.5 2023-12-21 15:06:58,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=95080.0, ans=0.0 2023-12-21 15:07:10,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=95146.66666666667, ans=0.0 2023-12-21 15:07:11,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=95146.66666666667, ans=0.125 2023-12-21 15:07:15,308 INFO [train.py:886] (1/4) Epoch 3, batch 4750, loss[loss=0.01807, audio_tagging_loss=0.01807, over 24750.00 frames. ], tot_loss[loss=0.01758, audio_tagging_loss=0.01758, over 4949927.48 frames. ], batch size: 99, lr: 2.86e-02, grad_scale: 128.0 2023-12-21 15:07:21,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=95213.33333333333, ans=0.125 2023-12-21 15:07:28,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=95280.0, ans=0.0 2023-12-21 15:07:28,575 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.61 vs. limit=10.0 2023-12-21 15:07:52,988 INFO [train.py:886] (1/4) Epoch 4, batch 0, loss[loss=0.03854, audio_tagging_loss=0.03854, over 25000.00 frames. ], tot_loss[loss=0.03854, audio_tagging_loss=0.03854, over 25000.00 frames. ], batch size: 100, lr: 2.67e-02, grad_scale: 128.0 2023-12-21 15:07:52,989 INFO [train.py:909] (1/4) Computing validation loss 2023-12-21 15:08:16,367 INFO [train.py:917] (1/4) Epoch 4, validation: loss=0.03936, audio_tagging_loss=0.03936, over 3737520.00 frames. 2023-12-21 15:08:16,367 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-21 15:08:25,263 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.144e+01 2.624e+01 2.822e+01 3.250e+01 1.153e+02, threshold=5.643e+01, percent-clipped=3.0 2023-12-21 15:08:25,745 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=19.68 vs. limit=15.0 2023-12-21 15:08:37,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=95453.33333333333, ans=0.125 2023-12-21 15:08:47,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=95520.0, ans=0.125 2023-12-21 15:08:55,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=95520.0, ans=0.125 2023-12-21 15:08:58,935 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2023-12-21 15:09:07,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=95653.33333333333, ans=0.125 2023-12-21 15:09:08,010 INFO [train.py:886] (1/4) Epoch 4, batch 50, loss[loss=0.02169, audio_tagging_loss=0.02169, over 25000.00 frames. ], tot_loss[loss=0.02762, audio_tagging_loss=0.02762, over 1117493.60 frames. ], batch size: 100, lr: 2.67e-02, grad_scale: 128.0 2023-12-21 15:09:27,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=95720.0, ans=0.0 2023-12-21 15:09:39,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=95853.33333333333, ans=0.0 2023-12-21 15:09:50,973 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.43 vs. limit=12.0 2023-12-21 15:10:00,224 INFO [train.py:886] (1/4) Epoch 4, batch 100, loss[loss=0.0179, audio_tagging_loss=0.0179, over 25000.00 frames. ], tot_loss[loss=0.02383, audio_tagging_loss=0.02383, over 1974448.70 frames. ], batch size: 100, lr: 2.67e-02, grad_scale: 128.0 2023-12-21 15:10:02,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=95986.66666666667, ans=0.0 2023-12-21 15:10:07,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=95986.66666666667, ans=0.035 2023-12-21 15:10:08,522 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.427e+01 2.883e+01 3.182e+01 3.510e+01 4.274e+01, threshold=6.364e+01, percent-clipped=0.0 2023-12-21 15:10:09,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=96053.33333333333, ans=0.0 2023-12-21 15:10:11,654 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=7.979e+00 2023-12-21 15:10:19,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=96120.0, ans=0.0 2023-12-21 15:10:26,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.11 vs. limit=5.0 2023-12-21 15:10:34,814 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.95 vs. limit=15.0 2023-12-21 15:10:51,521 INFO [train.py:886] (1/4) Epoch 4, batch 150, loss[loss=0.0165, audio_tagging_loss=0.0165, over 25000.00 frames. ], tot_loss[loss=0.02161, audio_tagging_loss=0.02161, over 2636319.03 frames. ], batch size: 100, lr: 2.66e-02, grad_scale: 128.0 2023-12-21 15:10:52,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=96320.0, ans=0.0 2023-12-21 15:11:02,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=96386.66666666667, ans=0.0 2023-12-21 15:11:12,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=96453.33333333333, ans=0.0 2023-12-21 15:11:19,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=96453.33333333333, ans=0.1 2023-12-21 15:11:22,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=96520.0, ans=0.0 2023-12-21 15:11:42,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=96586.66666666667, ans=0.0 2023-12-21 15:11:44,283 INFO [train.py:886] (1/4) Epoch 4, batch 200, loss[loss=0.01807, audio_tagging_loss=0.01807, over 25000.00 frames. ], tot_loss[loss=0.02026, audio_tagging_loss=0.02026, over 3151306.07 frames. ], batch size: 100, lr: 2.66e-02, grad_scale: 128.0 2023-12-21 15:11:46,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=96653.33333333333, ans=0.1 2023-12-21 15:11:47,565 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=13.64 vs. limit=12.0 2023-12-21 15:11:51,856 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.126e+01 2.596e+01 2.833e+01 2.992e+01 3.762e+01, threshold=5.666e+01, percent-clipped=0.0 2023-12-21 15:11:58,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=96720.0, ans=0.05 2023-12-21 15:12:06,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=96786.66666666667, ans=0.1 2023-12-21 15:12:18,802 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.91 vs. limit=15.0 2023-12-21 15:12:35,173 INFO [train.py:886] (1/4) Epoch 4, batch 250, loss[loss=0.02008, audio_tagging_loss=0.02008, over 25000.00 frames. ], tot_loss[loss=0.01942, audio_tagging_loss=0.01942, over 3547710.04 frames. ], batch size: 100, lr: 2.65e-02, grad_scale: 128.0 2023-12-21 15:12:39,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=96986.66666666667, ans=0.0 2023-12-21 15:13:02,272 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.41 vs. limit=15.0 2023-12-21 15:13:08,556 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.21 vs. limit=22.5 2023-12-21 15:13:20,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=97253.33333333333, ans=0.125 2023-12-21 15:13:23,726 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.08 vs. limit=22.5 2023-12-21 15:13:26,985 INFO [train.py:886] (1/4) Epoch 4, batch 300, loss[loss=0.01875, audio_tagging_loss=0.01875, over 24750.00 frames. ], tot_loss[loss=0.01886, audio_tagging_loss=0.01886, over 3858143.29 frames. ], batch size: 99, lr: 2.65e-02, grad_scale: 128.0 2023-12-21 15:13:34,778 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.096e+01 2.584e+01 2.801e+01 3.020e+01 3.817e+01, threshold=5.601e+01, percent-clipped=0.0 2023-12-21 15:13:38,282 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.74 vs. limit=15.0 2023-12-21 15:13:40,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=97386.66666666667, ans=0.07 2023-12-21 15:13:43,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=97386.66666666667, ans=0.0 2023-12-21 15:13:53,235 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.08 vs. limit=22.5 2023-12-21 15:13:56,136 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.06 vs. limit=10.0 2023-12-21 15:14:07,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=97586.66666666667, ans=0.0 2023-12-21 15:14:10,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=97586.66666666667, ans=0.125 2023-12-21 15:14:11,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=97586.66666666667, ans=0.125 2023-12-21 15:14:11,244 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.13 vs. limit=22.5 2023-12-21 15:14:14,803 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.04 vs. limit=12.0 2023-12-21 15:14:19,499 INFO [train.py:886] (1/4) Epoch 4, batch 350, loss[loss=0.01771, audio_tagging_loss=0.01771, over 25000.00 frames. ], tot_loss[loss=0.01853, audio_tagging_loss=0.01853, over 4099959.68 frames. ], batch size: 100, lr: 2.65e-02, grad_scale: 128.0 2023-12-21 15:14:21,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=97653.33333333333, ans=0.1 2023-12-21 15:14:32,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=97720.0, ans=0.1 2023-12-21 15:14:33,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=97720.0, ans=0.07 2023-12-21 15:14:45,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.49 vs. limit=22.5 2023-12-21 15:14:49,474 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.77 vs. limit=22.5 2023-12-21 15:14:55,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=97853.33333333333, ans=0.125 2023-12-21 15:15:09,507 INFO [train.py:886] (1/4) Epoch 4, batch 400, loss[loss=0.01658, audio_tagging_loss=0.01658, over 25000.00 frames. ], tot_loss[loss=0.01827, audio_tagging_loss=0.01827, over 4289652.64 frames. ], batch size: 100, lr: 2.64e-02, grad_scale: 128.0 2023-12-21 15:15:13,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=97986.66666666667, ans=10.0 2023-12-21 15:15:18,622 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+01 2.499e+01 2.678e+01 2.862e+01 4.047e+01, threshold=5.355e+01, percent-clipped=0.0 2023-12-21 15:15:45,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=98186.66666666667, ans=0.125 2023-12-21 15:15:54,093 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.07 vs. limit=10.0 2023-12-21 15:16:01,874 INFO [train.py:886] (1/4) Epoch 4, batch 450, loss[loss=0.0177, audio_tagging_loss=0.0177, over 25000.00 frames. ], tot_loss[loss=0.01795, audio_tagging_loss=0.01795, over 4441183.06 frames. ], batch size: 100, lr: 2.64e-02, grad_scale: 128.0 2023-12-21 15:16:05,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=98320.0, ans=0.125 2023-12-21 15:16:12,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=98386.66666666667, ans=0.0 2023-12-21 15:16:25,034 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.751e+01 2023-12-21 15:16:39,177 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.65 vs. limit=22.5 2023-12-21 15:16:39,355 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.69 vs. limit=22.5 2023-12-21 15:16:40,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=98520.0, ans=0.0 2023-12-21 15:16:52,055 INFO [train.py:886] (1/4) Epoch 4, batch 500, loss[loss=0.01467, audio_tagging_loss=0.01467, over 21909.00 frames. ], tot_loss[loss=0.01765, audio_tagging_loss=0.01765, over 4557469.59 frames. ], batch size: 107, lr: 2.64e-02, grad_scale: 128.0 2023-12-21 15:16:53,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=98653.33333333333, ans=0.125 2023-12-21 15:16:56,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=98653.33333333333, ans=0.1 2023-12-21 15:17:02,594 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+01 2.476e+01 2.666e+01 2.895e+01 4.028e+01, threshold=5.332e+01, percent-clipped=0.0 2023-12-21 15:17:03,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=98720.0, ans=0.2 2023-12-21 15:17:08,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=98720.0, ans=0.1 2023-12-21 15:17:44,837 INFO [train.py:886] (1/4) Epoch 4, batch 550, loss[loss=0.01984, audio_tagging_loss=0.01984, over 21585.00 frames. ], tot_loss[loss=0.01741, audio_tagging_loss=0.01741, over 4645808.64 frames. ], batch size: 107, lr: 2.63e-02, grad_scale: 128.0 2023-12-21 15:17:46,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.75 vs. limit=15.0 2023-12-21 15:17:56,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=99053.33333333333, ans=0.125 2023-12-21 15:17:56,890 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.40 vs. limit=12.0 2023-12-21 15:18:12,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=99120.0, ans=0.02 2023-12-21 15:18:16,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=99186.66666666667, ans=0.125 2023-12-21 15:18:24,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=99253.33333333333, ans=0.125 2023-12-21 15:18:28,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=99253.33333333333, ans=0.09899494936611666 2023-12-21 15:18:37,715 INFO [train.py:886] (1/4) Epoch 4, batch 600, loss[loss=0.01789, audio_tagging_loss=0.01789, over 24953.00 frames. ], tot_loss[loss=0.0175, audio_tagging_loss=0.0175, over 4716439.38 frames. ], batch size: 100, lr: 2.63e-02, grad_scale: 128.0 2023-12-21 15:18:39,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=99320.0, ans=0.1 2023-12-21 15:18:43,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=99320.0, ans=0.1 2023-12-21 15:18:45,282 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.158e+01 2.527e+01 2.841e+01 3.010e+01 4.382e+01, threshold=5.682e+01, percent-clipped=0.0 2023-12-21 15:18:54,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=99386.66666666667, ans=22.5 2023-12-21 15:18:57,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=99453.33333333333, ans=0.2 2023-12-21 15:19:20,728 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.11 vs. limit=15.0 2023-12-21 15:19:27,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=99653.33333333333, ans=0.5 2023-12-21 15:19:27,914 INFO [train.py:886] (1/4) Epoch 4, batch 650, loss[loss=0.01627, audio_tagging_loss=0.01627, over 24750.00 frames. ], tot_loss[loss=0.01758, audio_tagging_loss=0.01758, over 4763585.62 frames. ], batch size: 99, lr: 2.63e-02, grad_scale: 128.0 2023-12-21 15:19:30,181 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.71 vs. limit=22.5 2023-12-21 15:19:31,367 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=15.0 2023-12-21 15:19:54,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=99786.66666666667, ans=0.0 2023-12-21 15:20:12,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=99920.0, ans=0.5 2023-12-21 15:20:19,913 INFO [train.py:886] (1/4) Epoch 4, batch 700, loss[loss=0.01634, audio_tagging_loss=0.01634, over 24750.00 frames. ], tot_loss[loss=0.01745, audio_tagging_loss=0.01745, over 4802611.08 frames. ], batch size: 99, lr: 2.62e-02, grad_scale: 128.0 2023-12-21 15:20:22,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=99986.66666666667, ans=0.0 2023-12-21 15:20:22,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=99986.66666666667, ans=0.0 2023-12-21 15:20:25,423 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.61 vs. limit=15.0 2023-12-21 15:20:26,053 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.07 vs. limit=15.0 2023-12-21 15:20:26,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=99986.66666666667, ans=0.125 2023-12-21 15:20:27,573 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.152e+01 2.543e+01 2.759e+01 3.003e+01 3.794e+01, threshold=5.518e+01, percent-clipped=0.0 2023-12-21 15:20:54,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=100186.66666666667, ans=0.2 2023-12-21 15:21:12,373 INFO [train.py:886] (1/4) Epoch 4, batch 750, loss[loss=0.0155, audio_tagging_loss=0.0155, over 24750.00 frames. ], tot_loss[loss=0.01733, audio_tagging_loss=0.01733, over 4837603.24 frames. ], batch size: 99, lr: 2.62e-02, grad_scale: 128.0 2023-12-21 15:21:15,892 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.99 vs. limit=22.5 2023-12-21 15:21:16,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=100320.0, ans=0.125 2023-12-21 15:21:18,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=100320.0, ans=0.125 2023-12-21 15:21:19,420 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.071e-01 2023-12-21 15:21:50,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=100520.0, ans=0.07 2023-12-21 15:22:03,770 INFO [train.py:886] (1/4) Epoch 4, batch 800, loss[loss=0.01546, audio_tagging_loss=0.01546, over 25000.00 frames. ], tot_loss[loss=0.01723, audio_tagging_loss=0.01723, over 4855904.39 frames. ], batch size: 100, lr: 2.62e-02, grad_scale: 128.0 2023-12-21 15:22:11,435 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.023e+01 2.426e+01 2.587e+01 2.849e+01 3.873e+01, threshold=5.173e+01, percent-clipped=0.0 2023-12-21 15:22:12,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=100720.0, ans=0.2 2023-12-21 15:22:24,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=100786.66666666667, ans=0.125 2023-12-21 15:22:28,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=100786.66666666667, ans=0.2 2023-12-21 15:22:36,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=100853.33333333333, ans=0.1 2023-12-21 15:22:39,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=100853.33333333333, ans=0.1 2023-12-21 15:22:54,918 INFO [train.py:886] (1/4) Epoch 4, batch 850, loss[loss=0.01987, audio_tagging_loss=0.01987, over 25000.00 frames. ], tot_loss[loss=0.01724, audio_tagging_loss=0.01724, over 4874769.81 frames. ], batch size: 100, lr: 2.61e-02, grad_scale: 128.0 2023-12-21 15:23:16,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=101120.0, ans=0.0 2023-12-21 15:23:29,624 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.76 vs. limit=15.0 2023-12-21 15:23:32,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=101186.66666666667, ans=0.125 2023-12-21 15:23:41,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=101253.33333333333, ans=0.125 2023-12-21 15:23:43,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=101253.33333333333, ans=0.07 2023-12-21 15:23:45,074 INFO [train.py:886] (1/4) Epoch 4, batch 900, loss[loss=0.01799, audio_tagging_loss=0.01799, over 24942.00 frames. ], tot_loss[loss=0.01735, audio_tagging_loss=0.01735, over 4890852.55 frames. ], batch size: 100, lr: 2.61e-02, grad_scale: 128.0 2023-12-21 15:23:54,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.49 vs. limit=15.0 2023-12-21 15:23:54,877 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+01 2.627e+01 2.825e+01 3.078e+01 4.421e+01, threshold=5.651e+01, percent-clipped=0.0 2023-12-21 15:24:01,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=101386.66666666667, ans=0.125 2023-12-21 15:24:07,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=101453.33333333333, ans=0.1 2023-12-21 15:24:30,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=101586.66666666667, ans=0.125 2023-12-21 15:24:31,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=101586.66666666667, ans=0.0 2023-12-21 15:24:32,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=101586.66666666667, ans=0.1 2023-12-21 15:24:37,283 INFO [train.py:886] (1/4) Epoch 4, batch 950, loss[loss=0.01503, audio_tagging_loss=0.01503, over 24750.00 frames. ], tot_loss[loss=0.01744, audio_tagging_loss=0.01744, over 4897364.86 frames. ], batch size: 99, lr: 2.60e-02, grad_scale: 128.0 2023-12-21 15:24:56,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=101720.0, ans=0.0 2023-12-21 15:24:57,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=101786.66666666667, ans=0.0 2023-12-21 15:24:58,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=101786.66666666667, ans=0.125 2023-12-21 15:25:20,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.99 vs. limit=6.0 2023-12-21 15:25:29,008 INFO [train.py:886] (1/4) Epoch 4, batch 1000, loss[loss=0.01625, audio_tagging_loss=0.01625, over 24750.00 frames. ], tot_loss[loss=0.01745, audio_tagging_loss=0.01745, over 4901650.74 frames. ], batch size: 99, lr: 2.60e-02, grad_scale: 128.0 2023-12-21 15:25:36,518 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.499e+01 2.686e+01 2.949e+01 3.703e+01, threshold=5.372e+01, percent-clipped=0.0 2023-12-21 15:25:49,854 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.82 vs. limit=15.0 2023-12-21 15:26:05,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=102186.66666666667, ans=0.1 2023-12-21 15:26:06,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=102186.66666666667, ans=0.0 2023-12-21 15:26:08,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=102253.33333333333, ans=0.125 2023-12-21 15:26:11,146 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=11.00 vs. limit=10.0 2023-12-21 15:26:19,847 INFO [train.py:886] (1/4) Epoch 4, batch 1050, loss[loss=0.0187, audio_tagging_loss=0.0187, over 24750.00 frames. ], tot_loss[loss=0.01732, audio_tagging_loss=0.01732, over 4909406.72 frames. ], batch size: 99, lr: 2.60e-02, grad_scale: 128.0 2023-12-21 15:26:32,053 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.10 vs. limit=22.5 2023-12-21 15:26:38,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=102386.66666666667, ans=0.0 2023-12-21 15:26:44,480 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.76 vs. limit=15.0 2023-12-21 15:26:48,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=102453.33333333333, ans=0.1 2023-12-21 15:26:49,864 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.86 vs. limit=15.0 2023-12-21 15:27:11,326 INFO [train.py:886] (1/4) Epoch 4, batch 1100, loss[loss=0.01823, audio_tagging_loss=0.01823, over 25000.00 frames. ], tot_loss[loss=0.01718, audio_tagging_loss=0.01718, over 4920424.52 frames. ], batch size: 100, lr: 2.59e-02, grad_scale: 128.0 2023-12-21 15:27:19,910 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.175e+01 2.582e+01 2.804e+01 3.074e+01 3.939e+01, threshold=5.607e+01, percent-clipped=0.0 2023-12-21 15:27:23,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.13 vs. limit=15.0 2023-12-21 15:27:35,812 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=17.92 vs. limit=15.0 2023-12-21 15:27:44,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=102853.33333333333, ans=0.1 2023-12-21 15:27:45,172 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.26 vs. limit=22.5 2023-12-21 15:27:57,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=102920.0, ans=0.125 2023-12-21 15:27:59,153 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=18.37 vs. limit=15.0 2023-12-21 15:28:02,460 INFO [train.py:886] (1/4) Epoch 4, batch 1150, loss[loss=0.01857, audio_tagging_loss=0.01857, over 25000.00 frames. ], tot_loss[loss=0.01714, audio_tagging_loss=0.01714, over 4927281.57 frames. ], batch size: 100, lr: 2.59e-02, grad_scale: 128.0 2023-12-21 15:28:14,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=103053.33333333333, ans=0.1 2023-12-21 15:28:16,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=103053.33333333333, ans=0.0 2023-12-21 15:28:19,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=103053.33333333333, ans=0.125 2023-12-21 15:28:25,119 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.90 vs. limit=15.0 2023-12-21 15:28:25,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=103120.0, ans=0.2 2023-12-21 15:28:28,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=103120.0, ans=0.125 2023-12-21 15:28:32,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=103186.66666666667, ans=0.2 2023-12-21 15:28:43,069 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.27 vs. limit=15.0 2023-12-21 15:28:53,965 INFO [train.py:886] (1/4) Epoch 4, batch 1200, loss[loss=0.0166, audio_tagging_loss=0.0166, over 21975.00 frames. ], tot_loss[loss=0.01714, audio_tagging_loss=0.01714, over 4934633.35 frames. ], batch size: 107, lr: 2.59e-02, grad_scale: 128.0 2023-12-21 15:29:02,241 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+01 2.608e+01 2.769e+01 2.975e+01 3.396e+01, threshold=5.537e+01, percent-clipped=0.0 2023-12-21 15:29:04,809 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=24.30 vs. limit=22.5 2023-12-21 15:29:13,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=103386.66666666667, ans=0.125 2023-12-21 15:29:24,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=103520.0, ans=0.125 2023-12-21 15:29:29,353 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.88 vs. limit=10.0 2023-12-21 15:29:30,240 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.24 vs. limit=22.5 2023-12-21 15:29:46,020 INFO [train.py:886] (1/4) Epoch 4, batch 1250, loss[loss=0.01565, audio_tagging_loss=0.01565, over 24750.00 frames. ], tot_loss[loss=0.0172, audio_tagging_loss=0.0172, over 4931006.03 frames. ], batch size: 99, lr: 2.58e-02, grad_scale: 128.0 2023-12-21 15:29:56,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=103720.0, ans=0.1 2023-12-21 15:30:12,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=103786.66666666667, ans=0.0 2023-12-21 15:30:18,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=103853.33333333333, ans=0.125 2023-12-21 15:30:28,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=103920.0, ans=0.125 2023-12-21 15:30:30,069 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.58 vs. limit=22.5 2023-12-21 15:30:35,965 INFO [train.py:886] (1/4) Epoch 4, batch 1300, loss[loss=0.01572, audio_tagging_loss=0.01572, over 24750.00 frames. ], tot_loss[loss=0.01723, audio_tagging_loss=0.01723, over 4929073.76 frames. ], batch size: 99, lr: 2.58e-02, grad_scale: 128.0 2023-12-21 15:30:43,823 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.97 vs. limit=15.0 2023-12-21 15:30:45,068 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.115e+01 2.570e+01 2.794e+01 3.000e+01 3.597e+01, threshold=5.589e+01, percent-clipped=0.0 2023-12-21 15:31:05,663 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.69 vs. limit=22.5 2023-12-21 15:31:15,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=104186.66666666667, ans=0.1 2023-12-21 15:31:16,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=104186.66666666667, ans=0.125 2023-12-21 15:31:16,588 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.88 vs. limit=15.0 2023-12-21 15:31:19,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=104253.33333333333, ans=0.1 2023-12-21 15:31:24,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=104253.33333333333, ans=0.0 2023-12-21 15:31:26,898 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.91 vs. limit=10.0 2023-12-21 15:31:27,402 INFO [train.py:886] (1/4) Epoch 4, batch 1350, loss[loss=0.01848, audio_tagging_loss=0.01848, over 25000.00 frames. ], tot_loss[loss=0.01721, audio_tagging_loss=0.01721, over 4932400.73 frames. ], batch size: 100, lr: 2.58e-02, grad_scale: 128.0 2023-12-21 15:31:27,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=104320.0, ans=0.125 2023-12-21 15:31:52,583 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.50 vs. limit=15.0 2023-12-21 15:31:52,706 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.55 vs. limit=22.5 2023-12-21 15:31:58,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=104520.0, ans=0.1 2023-12-21 15:31:58,322 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.91 vs. limit=15.0 2023-12-21 15:32:03,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.29 vs. limit=15.0 2023-12-21 15:32:19,158 INFO [train.py:886] (1/4) Epoch 4, batch 1400, loss[loss=0.01558, audio_tagging_loss=0.01558, over 25000.00 frames. ], tot_loss[loss=0.01708, audio_tagging_loss=0.01708, over 4936856.13 frames. ], batch size: 100, lr: 2.57e-02, grad_scale: 128.0 2023-12-21 15:32:24,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=104653.33333333333, ans=0.125 2023-12-21 15:32:26,648 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.177e+01 2.479e+01 2.726e+01 3.027e+01 3.760e+01, threshold=5.453e+01, percent-clipped=0.0 2023-12-21 15:32:42,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=104786.66666666667, ans=0.0 2023-12-21 15:32:42,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=104786.66666666667, ans=0.04949747468305833 2023-12-21 15:32:43,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=104786.66666666667, ans=0.125 2023-12-21 15:32:50,928 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.85 vs. limit=6.0 2023-12-21 15:32:51,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=104853.33333333333, ans=0.125 2023-12-21 15:32:56,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=104853.33333333333, ans=0.2 2023-12-21 15:33:01,450 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.40 vs. limit=10.0 2023-12-21 15:33:02,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=104920.0, ans=0.1 2023-12-21 15:33:08,515 INFO [train.py:886] (1/4) Epoch 4, batch 1450, loss[loss=0.01648, audio_tagging_loss=0.01648, over 24750.00 frames. ], tot_loss[loss=0.01699, audio_tagging_loss=0.01699, over 4939674.42 frames. ], batch size: 99, lr: 2.57e-02, grad_scale: 128.0 2023-12-21 15:33:08,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=104986.66666666667, ans=0.0 2023-12-21 15:33:37,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=105120.0, ans=0.0 2023-12-21 15:34:00,913 INFO [train.py:886] (1/4) Epoch 4, batch 1500, loss[loss=0.01707, audio_tagging_loss=0.01707, over 24750.00 frames. ], tot_loss[loss=0.01711, audio_tagging_loss=0.01711, over 4937145.29 frames. ], batch size: 99, lr: 2.57e-02, grad_scale: 128.0 2023-12-21 15:34:08,809 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.551e+01 2.727e+01 2.901e+01 3.751e+01, threshold=5.454e+01, percent-clipped=0.0 2023-12-21 15:34:10,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=105386.66666666667, ans=0.125 2023-12-21 15:34:34,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=105520.0, ans=0.0 2023-12-21 15:34:46,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=105586.66666666667, ans=0.125 2023-12-21 15:34:50,185 INFO [train.py:886] (1/4) Epoch 4, batch 1550, loss[loss=0.01683, audio_tagging_loss=0.01683, over 25000.00 frames. ], tot_loss[loss=0.01727, audio_tagging_loss=0.01727, over 4933059.78 frames. ], batch size: 100, lr: 2.56e-02, grad_scale: 256.0 2023-12-21 15:35:25,421 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.98 vs. limit=22.5 2023-12-21 15:35:28,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=105853.33333333333, ans=0.125 2023-12-21 15:35:41,579 INFO [train.py:886] (1/4) Epoch 4, batch 1600, loss[loss=0.01619, audio_tagging_loss=0.01619, over 25000.00 frames. ], tot_loss[loss=0.0173, audio_tagging_loss=0.0173, over 4930852.24 frames. ], batch size: 100, lr: 2.56e-02, grad_scale: 128.0 2023-12-21 15:35:49,968 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 2.541e+01 2.790e+01 3.125e+01 4.127e+01, threshold=5.579e+01, percent-clipped=0.0 2023-12-21 15:35:52,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=106053.33333333333, ans=0.0 2023-12-21 15:36:07,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.77 vs. limit=15.0 2023-12-21 15:36:20,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=106186.66666666667, ans=0.0 2023-12-21 15:36:21,769 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.25 vs. limit=22.5 2023-12-21 15:36:34,464 INFO [train.py:886] (1/4) Epoch 4, batch 1650, loss[loss=0.01487, audio_tagging_loss=0.01487, over 24750.00 frames. ], tot_loss[loss=0.01726, audio_tagging_loss=0.01726, over 4932645.25 frames. ], batch size: 99, lr: 2.56e-02, grad_scale: 128.0 2023-12-21 15:36:46,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=106386.66666666667, ans=0.125 2023-12-21 15:37:00,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=106453.33333333333, ans=0.125 2023-12-21 15:37:02,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=106453.33333333333, ans=10.0 2023-12-21 15:37:04,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=106520.0, ans=0.0 2023-12-21 15:37:04,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=106520.0, ans=0.125 2023-12-21 15:37:04,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=106520.0, ans=0.0 2023-12-21 15:37:08,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=106520.0, ans=0.125 2023-12-21 15:37:12,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=106520.0, ans=0.125 2023-12-21 15:37:20,113 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.94 vs. limit=10.0 2023-12-21 15:37:22,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=106586.66666666667, ans=0.0 2023-12-21 15:37:24,423 INFO [train.py:886] (1/4) Epoch 4, batch 1700, loss[loss=0.01616, audio_tagging_loss=0.01616, over 25000.00 frames. ], tot_loss[loss=0.01719, audio_tagging_loss=0.01719, over 4935878.13 frames. ], batch size: 100, lr: 2.55e-02, grad_scale: 128.0 2023-12-21 15:37:30,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=106653.33333333333, ans=0.125 2023-12-21 15:37:34,859 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.68 vs. limit=15.0 2023-12-21 15:37:37,183 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.494e+01 2.744e+01 2.970e+01 4.279e+01, threshold=5.487e+01, percent-clipped=0.0 2023-12-21 15:37:43,069 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.22 vs. limit=15.0 2023-12-21 15:37:45,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=106720.0, ans=0.125 2023-12-21 15:38:09,351 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.57 vs. limit=22.5 2023-12-21 15:38:18,932 INFO [train.py:886] (1/4) Epoch 4, batch 1750, loss[loss=0.01716, audio_tagging_loss=0.01716, over 25000.00 frames. ], tot_loss[loss=0.01703, audio_tagging_loss=0.01703, over 4941215.13 frames. ], batch size: 100, lr: 2.55e-02, grad_scale: 128.0 2023-12-21 15:38:19,122 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.005e+00 2023-12-21 15:38:38,418 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.75 vs. limit=12.0 2023-12-21 15:38:48,505 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=19.91 vs. limit=15.0 2023-12-21 15:39:09,755 INFO [train.py:886] (1/4) Epoch 4, batch 1800, loss[loss=0.01665, audio_tagging_loss=0.01665, over 25000.00 frames. ], tot_loss[loss=0.01706, audio_tagging_loss=0.01706, over 4945302.67 frames. ], batch size: 100, lr: 2.55e-02, grad_scale: 128.0 2023-12-21 15:39:19,657 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.066e+01 2.593e+01 2.724e+01 2.959e+01 3.559e+01, threshold=5.448e+01, percent-clipped=0.0 2023-12-21 15:39:23,098 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.04 vs. limit=6.0 2023-12-21 15:39:26,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=107386.66666666667, ans=0.125 2023-12-21 15:39:26,997 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.80 vs. limit=15.0 2023-12-21 15:39:29,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=107453.33333333333, ans=0.2 2023-12-21 15:39:39,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=107520.0, ans=0.125 2023-12-21 15:39:44,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=107520.0, ans=0.0 2023-12-21 15:39:46,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=107520.0, ans=0.125 2023-12-21 15:39:49,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=107520.0, ans=0.0 2023-12-21 15:40:00,868 INFO [train.py:886] (1/4) Epoch 4, batch 1850, loss[loss=0.01641, audio_tagging_loss=0.01641, over 24750.00 frames. ], tot_loss[loss=0.01718, audio_tagging_loss=0.01718, over 4946131.00 frames. ], batch size: 99, lr: 2.54e-02, grad_scale: 128.0 2023-12-21 15:40:16,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=107720.0, ans=0.1 2023-12-21 15:40:44,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=107920.0, ans=0.125 2023-12-21 15:40:51,381 INFO [train.py:886] (1/4) Epoch 4, batch 1900, loss[loss=0.01846, audio_tagging_loss=0.01846, over 24750.00 frames. ], tot_loss[loss=0.01727, audio_tagging_loss=0.01727, over 4946168.78 frames. ], batch size: 99, lr: 2.54e-02, grad_scale: 128.0 2023-12-21 15:40:54,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=107986.66666666667, ans=0.0 2023-12-21 15:41:00,680 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.376e+01 2.677e+01 2.844e+01 3.083e+01 3.785e+01, threshold=5.688e+01, percent-clipped=0.0 2023-12-21 15:41:09,768 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.81 vs. limit=15.0 2023-12-21 15:41:13,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=108120.0, ans=0.125 2023-12-21 15:41:37,407 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.54 vs. limit=6.0 2023-12-21 15:41:41,757 INFO [train.py:886] (1/4) Epoch 4, batch 1950, loss[loss=0.01487, audio_tagging_loss=0.01487, over 24750.00 frames. ], tot_loss[loss=0.01721, audio_tagging_loss=0.01721, over 4941803.36 frames. ], batch size: 99, lr: 2.54e-02, grad_scale: 128.0 2023-12-21 15:41:45,909 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.53 vs. limit=10.0 2023-12-21 15:41:47,003 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.72 vs. limit=15.0 2023-12-21 15:42:07,054 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.93 vs. limit=15.0 2023-12-21 15:42:16,521 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.37 vs. limit=15.0 2023-12-21 15:42:25,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=108586.66666666667, ans=0.95 2023-12-21 15:42:29,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=108586.66666666667, ans=0.125 2023-12-21 15:42:33,634 INFO [train.py:886] (1/4) Epoch 4, batch 2000, loss[loss=0.01537, audio_tagging_loss=0.01537, over 25000.00 frames. ], tot_loss[loss=0.01716, audio_tagging_loss=0.01716, over 4947727.50 frames. ], batch size: 100, lr: 2.54e-02, grad_scale: 128.0 2023-12-21 15:42:42,112 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+01 2.508e+01 2.716e+01 2.994e+01 3.930e+01, threshold=5.432e+01, percent-clipped=0.0 2023-12-21 15:43:24,459 INFO [train.py:886] (1/4) Epoch 4, batch 2050, loss[loss=0.01692, audio_tagging_loss=0.01692, over 24750.00 frames. ], tot_loss[loss=0.01702, audio_tagging_loss=0.01702, over 4951477.03 frames. ], batch size: 99, lr: 2.53e-02, grad_scale: 128.0 2023-12-21 15:43:33,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=109053.33333333333, ans=0.125 2023-12-21 15:43:36,877 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.51 vs. limit=22.5 2023-12-21 15:43:47,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=109120.0, ans=0.0 2023-12-21 15:43:55,256 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.57 vs. limit=6.0 2023-12-21 15:44:00,261 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.13 vs. limit=15.0 2023-12-21 15:44:10,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=109253.33333333333, ans=0.1 2023-12-21 15:44:14,508 INFO [train.py:886] (1/4) Epoch 4, batch 2100, loss[loss=0.01812, audio_tagging_loss=0.01812, over 25000.00 frames. ], tot_loss[loss=0.01706, audio_tagging_loss=0.01706, over 4952484.94 frames. ], batch size: 100, lr: 2.53e-02, grad_scale: 128.0 2023-12-21 15:44:23,906 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.153e+01 2.560e+01 2.764e+01 2.957e+01 4.648e+01, threshold=5.528e+01, percent-clipped=0.0 2023-12-21 15:44:40,764 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=4.603e+00 2023-12-21 15:44:52,589 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.57 vs. limit=15.0 2023-12-21 15:45:05,226 INFO [train.py:886] (1/4) Epoch 4, batch 2150, loss[loss=0.01791, audio_tagging_loss=0.01791, over 24750.00 frames. ], tot_loss[loss=0.01716, audio_tagging_loss=0.01716, over 4957738.78 frames. ], batch size: 99, lr: 2.53e-02, grad_scale: 128.0 2023-12-21 15:45:23,029 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.59 vs. limit=15.0 2023-12-21 15:45:23,801 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.06 vs. limit=15.0 2023-12-21 15:45:27,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=109786.66666666667, ans=0.0 2023-12-21 15:45:38,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=109853.33333333333, ans=0.1 2023-12-21 15:45:55,493 INFO [train.py:886] (1/4) Epoch 4, batch 2200, loss[loss=0.0165, audio_tagging_loss=0.0165, over 24750.00 frames. ], tot_loss[loss=0.01731, audio_tagging_loss=0.01731, over 4953205.87 frames. ], batch size: 99, lr: 2.52e-02, grad_scale: 128.0 2023-12-21 15:45:57,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=109986.66666666667, ans=0.0 2023-12-21 15:46:04,583 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.221e+01 2.592e+01 2.698e+01 2.922e+01 4.235e+01, threshold=5.395e+01, percent-clipped=0.0 2023-12-21 15:46:05,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=110053.33333333333, ans=0.125 2023-12-21 15:46:11,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=110053.33333333333, ans=0.1 2023-12-21 15:46:11,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=110053.33333333333, ans=0.1 2023-12-21 15:46:15,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=110120.0, ans=0.0 2023-12-21 15:46:20,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=110120.0, ans=15.0 2023-12-21 15:46:20,951 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.69 vs. limit=15.0 2023-12-21 15:46:34,710 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.86 vs. limit=6.0 2023-12-21 15:46:36,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=110253.33333333333, ans=0.125 2023-12-21 15:46:43,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=110253.33333333333, ans=0.125 2023-12-21 15:46:45,325 INFO [train.py:886] (1/4) Epoch 4, batch 2250, loss[loss=0.01341, audio_tagging_loss=0.01341, over 24025.00 frames. ], tot_loss[loss=0.0173, audio_tagging_loss=0.0173, over 4948581.69 frames. ], batch size: 100, lr: 2.52e-02, grad_scale: 128.0 2023-12-21 15:46:50,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=110320.0, ans=0.0 2023-12-21 15:46:55,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=110386.66666666667, ans=0.0 2023-12-21 15:47:24,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=110520.0, ans=0.1 2023-12-21 15:47:28,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=110586.66666666667, ans=0.125 2023-12-21 15:47:37,173 INFO [train.py:886] (1/4) Epoch 4, batch 2300, loss[loss=0.01783, audio_tagging_loss=0.01783, over 24750.00 frames. ], tot_loss[loss=0.01715, audio_tagging_loss=0.01715, over 4946458.22 frames. ], batch size: 99, lr: 2.52e-02, grad_scale: 128.0 2023-12-21 15:47:45,882 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.063e+01 2.543e+01 2.713e+01 2.924e+01 4.099e+01, threshold=5.427e+01, percent-clipped=0.0 2023-12-21 15:47:48,171 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.17 vs. limit=22.5 2023-12-21 15:47:50,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=110720.0, ans=0.125 2023-12-21 15:48:08,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.95 vs. limit=12.0 2023-12-21 15:48:27,604 INFO [train.py:886] (1/4) Epoch 4, batch 2350, loss[loss=0.01822, audio_tagging_loss=0.01822, over 25000.00 frames. ], tot_loss[loss=0.0171, audio_tagging_loss=0.0171, over 4951946.41 frames. ], batch size: 100, lr: 2.51e-02, grad_scale: 128.0 2023-12-21 15:48:29,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=110986.66666666667, ans=0.125 2023-12-21 15:48:33,445 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.503e+01 2023-12-21 15:48:49,609 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.57 vs. limit=10.0 2023-12-21 15:48:51,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=111120.0, ans=0.2 2023-12-21 15:48:52,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=111120.0, ans=0.1 2023-12-21 15:48:54,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=111120.0, ans=0.125 2023-12-21 15:48:55,325 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.96 vs. limit=22.5 2023-12-21 15:49:01,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=111186.66666666667, ans=0.0 2023-12-21 15:49:09,834 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.09 vs. limit=15.0 2023-12-21 15:49:18,930 INFO [train.py:886] (1/4) Epoch 4, batch 2400, loss[loss=0.01941, audio_tagging_loss=0.01941, over 25000.00 frames. ], tot_loss[loss=0.01706, audio_tagging_loss=0.01706, over 4956446.79 frames. ], batch size: 100, lr: 2.51e-02, grad_scale: 128.0 2023-12-21 15:49:19,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=111320.0, ans=0.125 2023-12-21 15:49:25,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=111320.0, ans=0.2 2023-12-21 15:49:25,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=111320.0, ans=0.1 2023-12-21 15:49:27,413 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.041e+01 2.500e+01 2.719e+01 2.953e+01 3.930e+01, threshold=5.438e+01, percent-clipped=0.0 2023-12-21 15:49:27,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=111386.66666666667, ans=0.125 2023-12-21 15:49:35,430 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.77 vs. limit=12.0 2023-12-21 15:49:42,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=111453.33333333333, ans=0.0 2023-12-21 15:49:44,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=111453.33333333333, ans=0.1 2023-12-21 15:49:44,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=111453.33333333333, ans=0.0 2023-12-21 15:49:45,175 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.05 vs. limit=10.0 2023-12-21 15:50:02,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=111586.66666666667, ans=0.0 2023-12-21 15:50:11,412 INFO [train.py:886] (1/4) Epoch 4, batch 2450, loss[loss=0.01635, audio_tagging_loss=0.01635, over 25000.00 frames. ], tot_loss[loss=0.01713, audio_tagging_loss=0.01713, over 4962091.52 frames. ], batch size: 100, lr: 2.51e-02, grad_scale: 128.0 2023-12-21 15:50:20,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=111653.33333333333, ans=0.1 2023-12-21 15:50:37,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=111786.66666666667, ans=0.1 2023-12-21 15:50:46,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=111853.33333333333, ans=0.1 2023-12-21 15:50:51,753 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.31 vs. limit=12.0 2023-12-21 15:50:57,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=111920.0, ans=0.125 2023-12-21 15:50:57,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=111920.0, ans=0.125 2023-12-21 15:50:59,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=111920.0, ans=0.2 2023-12-21 15:51:02,639 INFO [train.py:886] (1/4) Epoch 4, batch 2500, loss[loss=0.02134, audio_tagging_loss=0.02134, over 24750.00 frames. ], tot_loss[loss=0.01723, audio_tagging_loss=0.01723, over 4961613.40 frames. ], batch size: 99, lr: 2.50e-02, grad_scale: 128.0 2023-12-21 15:51:05,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.07 vs. limit=8.0 2023-12-21 15:51:06,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=111986.66666666667, ans=0.125 2023-12-21 15:51:10,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=111986.66666666667, ans=0.2 2023-12-21 15:51:12,020 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.590e+01 2.772e+01 2.981e+01 3.773e+01, threshold=5.543e+01, percent-clipped=0.0 2023-12-21 15:51:20,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=112053.33333333333, ans=0.125 2023-12-21 15:51:22,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=112053.33333333333, ans=0.2 2023-12-21 15:51:29,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=112120.0, ans=0.1 2023-12-21 15:51:32,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=112120.0, ans=0.0 2023-12-21 15:51:38,750 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.78 vs. limit=12.0 2023-12-21 15:51:43,403 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.88 vs. limit=22.5 2023-12-21 15:51:53,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=112253.33333333333, ans=0.125 2023-12-21 15:51:56,347 INFO [train.py:886] (1/4) Epoch 4, batch 2550, loss[loss=0.01931, audio_tagging_loss=0.01931, over 24750.00 frames. ], tot_loss[loss=0.01725, audio_tagging_loss=0.01725, over 4959860.66 frames. ], batch size: 99, lr: 2.50e-02, grad_scale: 128.0 2023-12-21 15:51:59,623 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=16.35 vs. limit=15.0 2023-12-21 15:52:18,531 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.83 vs. limit=15.0 2023-12-21 15:52:26,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=112453.33333333333, ans=0.0 2023-12-21 15:52:28,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=112520.0, ans=0.0 2023-12-21 15:52:30,085 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 15:52:32,150 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.204e-01 2023-12-21 15:52:47,841 INFO [train.py:886] (1/4) Epoch 4, batch 2600, loss[loss=0.01948, audio_tagging_loss=0.01948, over 24750.00 frames. ], tot_loss[loss=0.0172, audio_tagging_loss=0.0172, over 4953605.61 frames. ], batch size: 99, lr: 2.50e-02, grad_scale: 128.0 2023-12-21 15:52:48,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=112653.33333333333, ans=0.0 2023-12-21 15:52:52,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.46 vs. limit=15.0 2023-12-21 15:52:58,416 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.211e+01 2.608e+01 2.784e+01 3.013e+01 3.853e+01, threshold=5.568e+01, percent-clipped=0.0 2023-12-21 15:53:01,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=112720.0, ans=0.125 2023-12-21 15:53:13,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.40 vs. limit=15.0 2023-12-21 15:53:16,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=112786.66666666667, ans=0.0 2023-12-21 15:53:29,298 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.91 vs. limit=10.0 2023-12-21 15:53:37,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=112920.0, ans=0.125 2023-12-21 15:53:40,009 INFO [train.py:886] (1/4) Epoch 4, batch 2650, loss[loss=0.01706, audio_tagging_loss=0.01706, over 25000.00 frames. ], tot_loss[loss=0.01711, audio_tagging_loss=0.01711, over 4950990.89 frames. ], batch size: 100, lr: 2.49e-02, grad_scale: 128.0 2023-12-21 15:53:41,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=112986.66666666667, ans=0.125 2023-12-21 15:53:47,880 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.77 vs. limit=22.5 2023-12-21 15:54:12,957 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.45 vs. limit=22.5 2023-12-21 15:54:19,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=113253.33333333333, ans=0.0 2023-12-21 15:54:29,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=113253.33333333333, ans=0.1 2023-12-21 15:54:31,520 INFO [train.py:886] (1/4) Epoch 4, batch 2700, loss[loss=0.01729, audio_tagging_loss=0.01729, over 25000.00 frames. ], tot_loss[loss=0.01704, audio_tagging_loss=0.01704, over 4954183.51 frames. ], batch size: 100, lr: 2.49e-02, grad_scale: 128.0 2023-12-21 15:54:40,198 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.230e+01 2.542e+01 2.756e+01 2.979e+01 4.286e+01, threshold=5.511e+01, percent-clipped=0.0 2023-12-21 15:54:43,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=113386.66666666667, ans=0.125 2023-12-21 15:54:43,791 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.96 vs. limit=15.0 2023-12-21 15:54:47,389 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2023-12-21 15:54:52,188 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.00 vs. limit=15.0 2023-12-21 15:55:10,457 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.49 vs. limit=6.0 2023-12-21 15:55:11,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=113586.66666666667, ans=0.2 2023-12-21 15:55:16,968 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.10 vs. limit=15.0 2023-12-21 15:55:21,443 INFO [train.py:886] (1/4) Epoch 4, batch 2750, loss[loss=0.01481, audio_tagging_loss=0.01481, over 25000.00 frames. ], tot_loss[loss=0.01708, audio_tagging_loss=0.01708, over 4954138.18 frames. ], batch size: 100, lr: 2.49e-02, grad_scale: 128.0 2023-12-21 15:55:29,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=113653.33333333333, ans=0.125 2023-12-21 15:55:42,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=113786.66666666667, ans=0.125 2023-12-21 15:55:48,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=113786.66666666667, ans=0.125 2023-12-21 15:56:10,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=113920.0, ans=0.125 2023-12-21 15:56:13,784 INFO [train.py:886] (1/4) Epoch 4, batch 2800, loss[loss=0.01519, audio_tagging_loss=0.01519, over 24750.00 frames. ], tot_loss[loss=0.01706, audio_tagging_loss=0.01706, over 4951740.86 frames. ], batch size: 99, lr: 2.49e-02, grad_scale: 128.0 2023-12-21 15:56:17,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=113986.66666666667, ans=0.0 2023-12-21 15:56:22,308 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.186e+01 2.583e+01 2.780e+01 3.121e+01 4.159e+01, threshold=5.559e+01, percent-clipped=0.0 2023-12-21 15:56:31,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=114120.0, ans=0.0 2023-12-21 15:56:49,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=114186.66666666667, ans=0.0 2023-12-21 15:57:03,912 INFO [train.py:886] (1/4) Epoch 4, batch 2850, loss[loss=0.02016, audio_tagging_loss=0.02016, over 25000.00 frames. ], tot_loss[loss=0.01716, audio_tagging_loss=0.01716, over 4942298.77 frames. ], batch size: 100, lr: 2.48e-02, grad_scale: 128.0 2023-12-21 15:57:41,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=114520.0, ans=0.125 2023-12-21 15:57:55,964 INFO [train.py:886] (1/4) Epoch 4, batch 2900, loss[loss=0.01647, audio_tagging_loss=0.01647, over 25000.00 frames. ], tot_loss[loss=0.01712, audio_tagging_loss=0.01712, over 4939819.81 frames. ], batch size: 100, lr: 2.48e-02, grad_scale: 128.0 2023-12-21 15:58:04,680 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.289e+01 2.617e+01 2.786e+01 3.010e+01 3.894e+01, threshold=5.572e+01, percent-clipped=0.0 2023-12-21 15:58:16,716 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=17.59 vs. limit=15.0 2023-12-21 15:58:19,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=114786.66666666667, ans=0.2 2023-12-21 15:58:20,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=114786.66666666667, ans=0.0 2023-12-21 15:58:23,180 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.86 vs. limit=22.5 2023-12-21 15:58:37,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=114920.0, ans=0.125 2023-12-21 15:58:48,198 INFO [train.py:886] (1/4) Epoch 4, batch 2950, loss[loss=0.01686, audio_tagging_loss=0.01686, over 24750.00 frames. ], tot_loss[loss=0.01701, audio_tagging_loss=0.01701, over 4943951.43 frames. ], batch size: 99, lr: 2.48e-02, grad_scale: 128.0 2023-12-21 15:59:01,050 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.59 vs. limit=6.0 2023-12-21 15:59:02,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=115053.33333333333, ans=0.0 2023-12-21 15:59:38,324 INFO [train.py:886] (1/4) Epoch 4, batch 3000, loss[loss=0.01744, audio_tagging_loss=0.01744, over 24750.00 frames. ], tot_loss[loss=0.017, audio_tagging_loss=0.017, over 4946492.85 frames. ], batch size: 99, lr: 2.47e-02, grad_scale: 128.0 2023-12-21 15:59:38,325 INFO [train.py:909] (1/4) Computing validation loss 2023-12-21 15:59:52,494 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.9766, 4.4925, 4.9516, 4.6254], device='cuda:1') 2023-12-21 15:59:55,939 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.7928, 2.7988, 2.8246, 2.8510], device='cuda:1') 2023-12-21 15:59:59,362 INFO [train.py:917] (1/4) Epoch 4, validation: loss=0.04177, audio_tagging_loss=0.04177, over 3737520.00 frames. 2023-12-21 15:59:59,362 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-21 16:00:07,861 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.203e+01 2.539e+01 2.719e+01 2.990e+01 3.720e+01, threshold=5.438e+01, percent-clipped=0.0 2023-12-21 16:00:23,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=115453.33333333333, ans=0.0 2023-12-21 16:00:44,527 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.06 vs. limit=22.5 2023-12-21 16:00:50,842 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.00 vs. limit=15.0 2023-12-21 16:00:51,251 INFO [train.py:886] (1/4) Epoch 4, batch 3050, loss[loss=0.0164, audio_tagging_loss=0.0164, over 25000.00 frames. ], tot_loss[loss=0.01697, audio_tagging_loss=0.01697, over 4948948.68 frames. ], batch size: 100, lr: 2.47e-02, grad_scale: 128.0 2023-12-21 16:00:52,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=115653.33333333333, ans=0.125 2023-12-21 16:00:59,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=115653.33333333333, ans=0.125 2023-12-21 16:01:04,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=115720.0, ans=0.0 2023-12-21 16:01:04,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=115720.0, ans=0.125 2023-12-21 16:01:42,107 INFO [train.py:886] (1/4) Epoch 4, batch 3100, loss[loss=0.01633, audio_tagging_loss=0.01633, over 24750.00 frames. ], tot_loss[loss=0.01708, audio_tagging_loss=0.01708, over 4952732.12 frames. ], batch size: 99, lr: 2.47e-02, grad_scale: 128.0 2023-12-21 16:01:46,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=115986.66666666667, ans=0.0 2023-12-21 16:01:46,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=115986.66666666667, ans=0.125 2023-12-21 16:01:52,026 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.258e+01 2.579e+01 2.744e+01 2.909e+01 3.915e+01, threshold=5.487e+01, percent-clipped=0.0 2023-12-21 16:01:55,185 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.43 vs. limit=22.5 2023-12-21 16:02:14,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=116186.66666666667, ans=0.125 2023-12-21 16:02:22,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=116186.66666666667, ans=0.0 2023-12-21 16:02:22,607 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.87 vs. limit=10.0 2023-12-21 16:02:24,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=116253.33333333333, ans=0.0 2023-12-21 16:02:25,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=116253.33333333333, ans=0.0 2023-12-21 16:02:34,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=116320.0, ans=0.125 2023-12-21 16:02:34,952 INFO [train.py:886] (1/4) Epoch 4, batch 3150, loss[loss=0.01609, audio_tagging_loss=0.01609, over 24750.00 frames. ], tot_loss[loss=0.01715, audio_tagging_loss=0.01715, over 4948031.89 frames. ], batch size: 99, lr: 2.46e-02, grad_scale: 128.0 2023-12-21 16:02:39,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=116320.0, ans=0.125 2023-12-21 16:03:08,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=116520.0, ans=0.1 2023-12-21 16:03:16,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.24 vs. limit=22.5 2023-12-21 16:03:27,449 INFO [train.py:886] (1/4) Epoch 4, batch 3200, loss[loss=0.01513, audio_tagging_loss=0.01513, over 25000.00 frames. ], tot_loss[loss=0.01713, audio_tagging_loss=0.01713, over 4948780.19 frames. ], batch size: 100, lr: 2.46e-02, grad_scale: 128.0 2023-12-21 16:03:32,406 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.725e+00 2023-12-21 16:03:32,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=116653.33333333333, ans=0.125 2023-12-21 16:03:36,017 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+01 2.555e+01 2.782e+01 3.048e+01 4.020e+01, threshold=5.565e+01, percent-clipped=0.0 2023-12-21 16:03:39,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=116720.0, ans=0.0 2023-12-21 16:03:57,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=116853.33333333333, ans=0.0 2023-12-21 16:03:59,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=116853.33333333333, ans=0.2 2023-12-21 16:04:02,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=116853.33333333333, ans=0.0 2023-12-21 16:04:06,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=116853.33333333333, ans=22.5 2023-12-21 16:04:12,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=116920.0, ans=0.05 2023-12-21 16:04:18,042 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.45 vs. limit=15.0 2023-12-21 16:04:18,493 INFO [train.py:886] (1/4) Epoch 4, batch 3250, loss[loss=0.0142, audio_tagging_loss=0.0142, over 25000.00 frames. ], tot_loss[loss=0.01699, audio_tagging_loss=0.01699, over 4950718.71 frames. ], batch size: 100, lr: 2.46e-02, grad_scale: 128.0 2023-12-21 16:04:30,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=117053.33333333333, ans=0.125 2023-12-21 16:04:36,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=117053.33333333333, ans=0.2 2023-12-21 16:04:50,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=117186.66666666667, ans=0.1 2023-12-21 16:04:59,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=117253.33333333333, ans=0.125 2023-12-21 16:05:05,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=117253.33333333333, ans=0.125 2023-12-21 16:05:11,476 INFO [train.py:886] (1/4) Epoch 4, batch 3300, loss[loss=0.01853, audio_tagging_loss=0.01853, over 21578.00 frames. ], tot_loss[loss=0.017, audio_tagging_loss=0.017, over 4950475.91 frames. ], batch size: 107, lr: 2.46e-02, grad_scale: 128.0 2023-12-21 16:05:20,923 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.207e+01 2.625e+01 2.805e+01 3.075e+01 3.853e+01, threshold=5.610e+01, percent-clipped=0.0 2023-12-21 16:05:25,822 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 16:05:28,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=117386.66666666667, ans=0.2 2023-12-21 16:05:42,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=117520.0, ans=0.1 2023-12-21 16:05:43,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=117520.0, ans=0.125 2023-12-21 16:05:49,163 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.43 vs. limit=15.0 2023-12-21 16:05:49,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=117520.0, ans=0.0 2023-12-21 16:05:56,545 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.46 vs. limit=22.5 2023-12-21 16:06:00,235 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.909e-02 2023-12-21 16:06:02,022 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.21 vs. limit=22.5 2023-12-21 16:06:03,538 INFO [train.py:886] (1/4) Epoch 4, batch 3350, loss[loss=0.01912, audio_tagging_loss=0.01912, over 25000.00 frames. ], tot_loss[loss=0.01703, audio_tagging_loss=0.01703, over 4956182.22 frames. ], batch size: 100, lr: 2.45e-02, grad_scale: 128.0 2023-12-21 16:06:09,684 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.02 vs. limit=15.0 2023-12-21 16:06:12,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=117653.33333333333, ans=0.1 2023-12-21 16:06:32,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=117853.33333333333, ans=0.125 2023-12-21 16:06:37,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=117853.33333333333, ans=0.2 2023-12-21 16:06:54,566 INFO [train.py:886] (1/4) Epoch 4, batch 3400, loss[loss=0.01594, audio_tagging_loss=0.01594, over 25000.00 frames. ], tot_loss[loss=0.01705, audio_tagging_loss=0.01705, over 4955538.97 frames. ], batch size: 100, lr: 2.45e-02, grad_scale: 128.0 2023-12-21 16:07:03,687 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.157e+01 2.590e+01 2.740e+01 3.014e+01 4.535e+01, threshold=5.480e+01, percent-clipped=0.0 2023-12-21 16:07:05,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=118053.33333333333, ans=0.0 2023-12-21 16:07:21,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=118120.0, ans=0.0 2023-12-21 16:07:39,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=118253.33333333333, ans=0.0 2023-12-21 16:07:39,367 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.52 vs. limit=15.0 2023-12-21 16:07:41,526 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.20 vs. limit=10.0 2023-12-21 16:07:47,982 INFO [train.py:886] (1/4) Epoch 4, batch 3450, loss[loss=0.01895, audio_tagging_loss=0.01895, over 25000.00 frames. ], tot_loss[loss=0.01723, audio_tagging_loss=0.01723, over 4947015.72 frames. ], batch size: 100, lr: 2.45e-02, grad_scale: 128.0 2023-12-21 16:08:11,646 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.25 vs. limit=15.0 2023-12-21 16:08:26,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=118520.0, ans=0.05 2023-12-21 16:08:34,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=118586.66666666667, ans=0.1 2023-12-21 16:08:38,284 INFO [train.py:886] (1/4) Epoch 4, batch 3500, loss[loss=0.01914, audio_tagging_loss=0.01914, over 25000.00 frames. ], tot_loss[loss=0.01719, audio_tagging_loss=0.01719, over 4946343.06 frames. ], batch size: 100, lr: 2.44e-02, grad_scale: 128.0 2023-12-21 16:08:43,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=118653.33333333333, ans=0.1 2023-12-21 16:08:48,899 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.114e+01 2.564e+01 2.726e+01 3.075e+01 4.208e+01, threshold=5.453e+01, percent-clipped=0.0 2023-12-21 16:09:00,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=118786.66666666667, ans=0.1 2023-12-21 16:09:06,512 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.46 vs. limit=12.0 2023-12-21 16:09:10,990 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.52 vs. limit=12.0 2023-12-21 16:09:15,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=118853.33333333333, ans=0.0 2023-12-21 16:09:16,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=118853.33333333333, ans=10.0 2023-12-21 16:09:16,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=118853.33333333333, ans=0.2 2023-12-21 16:09:29,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=118986.66666666667, ans=0.125 2023-12-21 16:09:30,647 INFO [train.py:886] (1/4) Epoch 4, batch 3550, loss[loss=0.02029, audio_tagging_loss=0.02029, over 25000.00 frames. ], tot_loss[loss=0.01705, audio_tagging_loss=0.01705, over 4941423.65 frames. ], batch size: 100, lr: 2.44e-02, grad_scale: 128.0 2023-12-21 16:09:32,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=118986.66666666667, ans=0.1 2023-12-21 16:09:41,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=119053.33333333333, ans=0.0 2023-12-21 16:09:49,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=119053.33333333333, ans=0.125 2023-12-21 16:10:01,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=119186.66666666667, ans=0.09899494936611666 2023-12-21 16:10:11,966 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.86 vs. limit=15.0 2023-12-21 16:10:12,808 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.40 vs. limit=22.5 2023-12-21 16:10:22,232 INFO [train.py:886] (1/4) Epoch 4, batch 3600, loss[loss=0.02009, audio_tagging_loss=0.02009, over 24750.00 frames. ], tot_loss[loss=0.01695, audio_tagging_loss=0.01695, over 4938729.64 frames. ], batch size: 99, lr: 2.44e-02, grad_scale: 128.0 2023-12-21 16:10:27,189 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.65 vs. limit=22.5 2023-12-21 16:10:32,449 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.504e+01 2.701e+01 2.952e+01 4.327e+01, threshold=5.401e+01, percent-clipped=0.0 2023-12-21 16:10:33,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=119386.66666666667, ans=0.125 2023-12-21 16:10:35,828 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.13 vs. limit=22.5 2023-12-21 16:10:49,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=119453.33333333333, ans=0.125 2023-12-21 16:10:56,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=119520.0, ans=0.1 2023-12-21 16:10:58,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=119520.0, ans=10.0 2023-12-21 16:11:03,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=119586.66666666667, ans=0.125 2023-12-21 16:11:12,788 INFO [train.py:886] (1/4) Epoch 4, batch 3650, loss[loss=0.01716, audio_tagging_loss=0.01716, over 24750.00 frames. ], tot_loss[loss=0.01683, audio_tagging_loss=0.01683, over 4943328.08 frames. ], batch size: 99, lr: 2.43e-02, grad_scale: 128.0 2023-12-21 16:11:22,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=119720.0, ans=0.0 2023-12-21 16:11:24,198 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.30 vs. limit=15.0 2023-12-21 16:11:24,234 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.11 vs. limit=6.0 2023-12-21 16:11:25,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=119720.0, ans=0.125 2023-12-21 16:11:36,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=119786.66666666667, ans=0.125 2023-12-21 16:11:41,348 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.17 vs. limit=15.0 2023-12-21 16:11:42,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=119786.66666666667, ans=0.2 2023-12-21 16:11:42,385 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.84 vs. limit=6.0 2023-12-21 16:11:43,184 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.44 vs. limit=15.0 2023-12-21 16:11:57,417 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.87 vs. limit=5.0 2023-12-21 16:11:57,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=119920.0, ans=0.125 2023-12-21 16:12:04,968 INFO [train.py:886] (1/4) Epoch 4, batch 3700, loss[loss=0.01636, audio_tagging_loss=0.01636, over 25000.00 frames. ], tot_loss[loss=0.01685, audio_tagging_loss=0.01685, over 4950485.91 frames. ], batch size: 100, lr: 2.43e-02, grad_scale: 128.0 2023-12-21 16:12:14,753 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.251e+01 2.607e+01 2.781e+01 3.058e+01 3.851e+01, threshold=5.562e+01, percent-clipped=0.0 2023-12-21 16:12:30,831 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.54 vs. limit=6.0 2023-12-21 16:12:35,705 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.46 vs. limit=15.0 2023-12-21 16:12:52,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=120253.33333333333, ans=0.125 2023-12-21 16:12:55,114 INFO [train.py:886] (1/4) Epoch 4, batch 3750, loss[loss=0.01677, audio_tagging_loss=0.01677, over 24750.00 frames. ], tot_loss[loss=0.01694, audio_tagging_loss=0.01694, over 4946331.27 frames. ], batch size: 99, lr: 2.43e-02, grad_scale: 128.0 2023-12-21 16:12:59,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=120320.0, ans=0.125 2023-12-21 16:13:08,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=120386.66666666667, ans=0.1 2023-12-21 16:13:15,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=120453.33333333333, ans=0.125 2023-12-21 16:13:44,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=120586.66666666667, ans=0.0 2023-12-21 16:13:46,569 INFO [train.py:886] (1/4) Epoch 4, batch 3800, loss[loss=0.016, audio_tagging_loss=0.016, over 24750.00 frames. ], tot_loss[loss=0.01702, audio_tagging_loss=0.01702, over 4943222.72 frames. ], batch size: 99, lr: 2.43e-02, grad_scale: 128.0 2023-12-21 16:13:50,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=120653.33333333333, ans=0.125 2023-12-21 16:13:53,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=120653.33333333333, ans=0.2 2023-12-21 16:13:56,038 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.102e+01 2.567e+01 2.797e+01 3.040e+01 4.165e+01, threshold=5.595e+01, percent-clipped=0.0 2023-12-21 16:14:08,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=120786.66666666667, ans=0.125 2023-12-21 16:14:11,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=120786.66666666667, ans=0.125 2023-12-21 16:14:16,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=120853.33333333333, ans=0.0 2023-12-21 16:14:22,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=120853.33333333333, ans=0.0 2023-12-21 16:14:38,129 INFO [train.py:886] (1/4) Epoch 4, batch 3850, loss[loss=0.02056, audio_tagging_loss=0.02056, over 25000.00 frames. ], tot_loss[loss=0.01706, audio_tagging_loss=0.01706, over 4942230.36 frames. ], batch size: 100, lr: 2.42e-02, grad_scale: 128.0 2023-12-21 16:14:40,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=120986.66666666667, ans=0.1 2023-12-21 16:14:41,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=120986.66666666667, ans=0.125 2023-12-21 16:15:03,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=121120.0, ans=0.125 2023-12-21 16:15:19,223 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.03 vs. limit=22.5 2023-12-21 16:15:25,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=121253.33333333333, ans=0.0 2023-12-21 16:15:28,378 INFO [train.py:886] (1/4) Epoch 4, batch 3900, loss[loss=0.01386, audio_tagging_loss=0.01386, over 25000.00 frames. ], tot_loss[loss=0.0169, audio_tagging_loss=0.0169, over 4948078.20 frames. ], batch size: 100, lr: 2.42e-02, grad_scale: 128.0 2023-12-21 16:15:34,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=121320.0, ans=0.1 2023-12-21 16:15:39,374 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+01 2.569e+01 2.731e+01 2.970e+01 3.861e+01, threshold=5.462e+01, percent-clipped=0.0 2023-12-21 16:15:42,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=121386.66666666667, ans=0.125 2023-12-21 16:15:43,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=121386.66666666667, ans=0.125 2023-12-21 16:15:54,593 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.61 vs. limit=15.0 2023-12-21 16:15:56,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=121453.33333333333, ans=0.125 2023-12-21 16:16:00,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=121520.0, ans=0.125 2023-12-21 16:16:06,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=121520.0, ans=0.1 2023-12-21 16:16:06,814 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.16 vs. limit=22.5 2023-12-21 16:16:14,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.94 vs. limit=15.0 2023-12-21 16:16:21,510 INFO [train.py:886] (1/4) Epoch 4, batch 3950, loss[loss=0.01579, audio_tagging_loss=0.01579, over 25000.00 frames. ], tot_loss[loss=0.01678, audio_tagging_loss=0.01678, over 4952472.08 frames. ], batch size: 100, lr: 2.42e-02, grad_scale: 128.0 2023-12-21 16:16:22,022 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2023-12-21 16:16:24,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=121653.33333333333, ans=0.125 2023-12-21 16:16:32,613 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.35 vs. limit=8.0 2023-12-21 16:17:08,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=121920.0, ans=0.125 2023-12-21 16:17:10,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=121920.0, ans=0.2 2023-12-21 16:17:12,384 INFO [train.py:886] (1/4) Epoch 4, batch 4000, loss[loss=0.01657, audio_tagging_loss=0.01657, over 25000.00 frames. ], tot_loss[loss=0.01683, audio_tagging_loss=0.01683, over 4960136.06 frames. ], batch size: 100, lr: 2.41e-02, grad_scale: 128.0 2023-12-21 16:17:23,134 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.186e+01 2.583e+01 2.753e+01 2.897e+01 3.593e+01, threshold=5.506e+01, percent-clipped=0.0 2023-12-21 16:17:27,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=122053.33333333333, ans=0.0 2023-12-21 16:17:38,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=122120.0, ans=0.125 2023-12-21 16:17:43,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=122186.66666666667, ans=0.125 2023-12-21 16:17:45,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=122186.66666666667, ans=0.0 2023-12-21 16:17:57,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=122253.33333333333, ans=0.2 2023-12-21 16:17:59,514 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.71 vs. limit=15.0 2023-12-21 16:18:03,847 INFO [train.py:886] (1/4) Epoch 4, batch 4050, loss[loss=0.01564, audio_tagging_loss=0.01564, over 24750.00 frames. ], tot_loss[loss=0.01691, audio_tagging_loss=0.01691, over 4964316.18 frames. ], batch size: 99, lr: 2.41e-02, grad_scale: 128.0 2023-12-21 16:18:11,729 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=26.24 vs. limit=22.5 2023-12-21 16:18:24,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=122453.33333333333, ans=0.125 2023-12-21 16:18:32,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=122453.33333333333, ans=0.0 2023-12-21 16:18:43,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=122586.66666666667, ans=0.125 2023-12-21 16:18:53,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=122586.66666666667, ans=0.2 2023-12-21 16:18:56,111 INFO [train.py:886] (1/4) Epoch 4, batch 4100, loss[loss=0.01542, audio_tagging_loss=0.01542, over 24750.00 frames. ], tot_loss[loss=0.01703, audio_tagging_loss=0.01703, over 4962636.47 frames. ], batch size: 99, lr: 2.41e-02, grad_scale: 128.0 2023-12-21 16:19:06,546 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.332e+01 2.561e+01 2.807e+01 3.053e+01 3.767e+01, threshold=5.614e+01, percent-clipped=0.0 2023-12-21 16:19:09,814 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.82 vs. limit=15.0 2023-12-21 16:19:18,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=122786.66666666667, ans=0.1 2023-12-21 16:19:37,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=122920.0, ans=0.0 2023-12-21 16:19:46,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=122920.0, ans=0.2 2023-12-21 16:19:47,656 INFO [train.py:886] (1/4) Epoch 4, batch 4150, loss[loss=0.01577, audio_tagging_loss=0.01577, over 25000.00 frames. ], tot_loss[loss=0.01692, audio_tagging_loss=0.01692, over 4956201.12 frames. ], batch size: 100, lr: 2.41e-02, grad_scale: 128.0 2023-12-21 16:19:52,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=122986.66666666667, ans=0.125 2023-12-21 16:19:57,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=122986.66666666667, ans=0.125 2023-12-21 16:20:21,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=123186.66666666667, ans=0.125 2023-12-21 16:20:40,636 INFO [train.py:886] (1/4) Epoch 4, batch 4200, loss[loss=0.01811, audio_tagging_loss=0.01811, over 23998.00 frames. ], tot_loss[loss=0.01687, audio_tagging_loss=0.01687, over 4945083.80 frames. ], batch size: 100, lr: 2.40e-02, grad_scale: 128.0 2023-12-21 16:20:49,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=123386.66666666667, ans=0.125 2023-12-21 16:20:50,049 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.099e+01 2.575e+01 2.802e+01 3.033e+01 3.875e+01, threshold=5.604e+01, percent-clipped=0.0 2023-12-21 16:20:52,584 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.62 vs. limit=10.0 2023-12-21 16:21:07,933 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.92 vs. limit=15.0 2023-12-21 16:21:11,406 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.75 vs. limit=15.0 2023-12-21 16:21:11,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=123520.0, ans=0.125 2023-12-21 16:21:11,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=123520.0, ans=0.1 2023-12-21 16:21:27,310 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.33 vs. limit=22.5 2023-12-21 16:21:32,185 INFO [train.py:886] (1/4) Epoch 4, batch 4250, loss[loss=0.01552, audio_tagging_loss=0.01552, over 25000.00 frames. ], tot_loss[loss=0.01677, audio_tagging_loss=0.01677, over 4949492.93 frames. ], batch size: 100, lr: 2.40e-02, grad_scale: 128.0 2023-12-21 16:21:35,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=123653.33333333333, ans=0.125 2023-12-21 16:21:37,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=123653.33333333333, ans=0.1 2023-12-21 16:21:38,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=123653.33333333333, ans=0.125 2023-12-21 16:21:38,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=123653.33333333333, ans=0.95 2023-12-21 16:21:40,955 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.08 vs. limit=6.0 2023-12-21 16:21:42,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=123720.0, ans=0.125 2023-12-21 16:21:49,231 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.72 vs. limit=22.5 2023-12-21 16:21:52,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=123786.66666666667, ans=0.1 2023-12-21 16:22:09,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=123853.33333333333, ans=0.0 2023-12-21 16:22:13,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=123920.0, ans=0.2 2023-12-21 16:22:15,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=123920.0, ans=0.0 2023-12-21 16:22:23,666 INFO [train.py:886] (1/4) Epoch 4, batch 4300, loss[loss=0.01782, audio_tagging_loss=0.01782, over 25000.00 frames. ], tot_loss[loss=0.01683, audio_tagging_loss=0.01683, over 4949508.97 frames. ], batch size: 100, lr: 2.40e-02, grad_scale: 128.0 2023-12-21 16:22:23,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=123986.66666666667, ans=10.0 2023-12-21 16:22:28,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=123986.66666666667, ans=0.125 2023-12-21 16:22:29,849 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.82 vs. limit=15.0 2023-12-21 16:22:33,905 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.323e+01 2.616e+01 2.824e+01 3.044e+01 4.145e+01, threshold=5.649e+01, percent-clipped=0.0 2023-12-21 16:22:54,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=124186.66666666667, ans=0.125 2023-12-21 16:22:58,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=124186.66666666667, ans=0.125 2023-12-21 16:22:59,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=124186.66666666667, ans=0.125 2023-12-21 16:23:01,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=124186.66666666667, ans=0.125 2023-12-21 16:23:01,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=124186.66666666667, ans=0.2 2023-12-21 16:23:04,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=124253.33333333333, ans=0.2 2023-12-21 16:23:07,795 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.671e+00 2023-12-21 16:23:13,203 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.93 vs. limit=15.0 2023-12-21 16:23:14,129 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.82 vs. limit=6.0 2023-12-21 16:23:15,660 INFO [train.py:886] (1/4) Epoch 4, batch 4350, loss[loss=0.01665, audio_tagging_loss=0.01665, over 24750.00 frames. ], tot_loss[loss=0.01693, audio_tagging_loss=0.01693, over 4955107.14 frames. ], batch size: 99, lr: 2.40e-02, grad_scale: 128.0 2023-12-21 16:23:31,108 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=15.0 2023-12-21 16:23:38,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.98 vs. limit=15.0 2023-12-21 16:23:58,494 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.57 vs. limit=15.0 2023-12-21 16:24:07,055 INFO [train.py:886] (1/4) Epoch 4, batch 4400, loss[loss=0.01552, audio_tagging_loss=0.01552, over 24750.00 frames. ], tot_loss[loss=0.01693, audio_tagging_loss=0.01693, over 4953150.47 frames. ], batch size: 99, lr: 2.39e-02, grad_scale: 128.0 2023-12-21 16:24:17,862 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.305e+01 2.690e+01 2.854e+01 3.099e+01 4.055e+01, threshold=5.708e+01, percent-clipped=0.0 2023-12-21 16:24:26,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=124786.66666666667, ans=0.2 2023-12-21 16:24:28,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=124786.66666666667, ans=0.125 2023-12-21 16:24:30,474 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=16.03 vs. limit=15.0 2023-12-21 16:24:32,130 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.96 vs. limit=15.0 2023-12-21 16:24:38,863 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.61 vs. limit=6.0 2023-12-21 16:24:39,618 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.90 vs. limit=15.0 2023-12-21 16:24:58,568 INFO [train.py:886] (1/4) Epoch 4, batch 4450, loss[loss=0.01327, audio_tagging_loss=0.01327, over 24750.00 frames. ], tot_loss[loss=0.01695, audio_tagging_loss=0.01695, over 4951527.56 frames. ], batch size: 99, lr: 2.39e-02, grad_scale: 128.0 2023-12-21 16:25:01,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=124986.66666666667, ans=0.0 2023-12-21 16:25:08,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=125053.33333333333, ans=0.125 2023-12-21 16:25:08,357 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=17.19 vs. limit=15.0 2023-12-21 16:25:10,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=125053.33333333333, ans=0.1 2023-12-21 16:25:10,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=125053.33333333333, ans=0.0 2023-12-21 16:25:13,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=125053.33333333333, ans=0.125 2023-12-21 16:25:26,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=125120.0, ans=0.2 2023-12-21 16:25:27,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=125120.0, ans=0.0 2023-12-21 16:25:31,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=125186.66666666667, ans=0.0 2023-12-21 16:25:51,704 INFO [train.py:886] (1/4) Epoch 4, batch 4500, loss[loss=0.01765, audio_tagging_loss=0.01765, over 25000.00 frames. ], tot_loss[loss=0.01693, audio_tagging_loss=0.01693, over 4950431.77 frames. ], batch size: 100, lr: 2.39e-02, grad_scale: 128.0 2023-12-21 16:25:59,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=125320.0, ans=0.0 2023-12-21 16:26:01,672 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.185e+01 2.563e+01 2.777e+01 3.038e+01 3.654e+01, threshold=5.554e+01, percent-clipped=0.0 2023-12-21 16:26:08,046 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.71 vs. limit=15.0 2023-12-21 16:26:20,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=125453.33333333333, ans=0.125 2023-12-21 16:26:22,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=125520.0, ans=0.125 2023-12-21 16:26:25,965 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.22 vs. limit=22.5 2023-12-21 16:26:33,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=125586.66666666667, ans=0.125 2023-12-21 16:26:35,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=125586.66666666667, ans=0.125 2023-12-21 16:26:35,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.79 vs. limit=6.0 2023-12-21 16:26:42,374 INFO [train.py:886] (1/4) Epoch 4, batch 4550, loss[loss=0.01761, audio_tagging_loss=0.01761, over 25000.00 frames. ], tot_loss[loss=0.01694, audio_tagging_loss=0.01694, over 4950926.29 frames. ], batch size: 100, lr: 2.38e-02, grad_scale: 128.0 2023-12-21 16:27:12,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=125853.33333333333, ans=0.1 2023-12-21 16:27:17,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=125853.33333333333, ans=0.125 2023-12-21 16:27:30,509 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=24.95 vs. limit=22.5 2023-12-21 16:27:35,640 INFO [train.py:886] (1/4) Epoch 4, batch 4600, loss[loss=0.01762, audio_tagging_loss=0.01762, over 24906.00 frames. ], tot_loss[loss=0.01697, audio_tagging_loss=0.01697, over 4949902.04 frames. ], batch size: 100, lr: 2.38e-02, grad_scale: 128.0 2023-12-21 16:27:38,755 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.142e+01 2023-12-21 16:27:38,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=125986.66666666667, ans=0.125 2023-12-21 16:27:38,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=125986.66666666667, ans=0.1 2023-12-21 16:27:40,956 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=15.0 2023-12-21 16:27:45,211 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.122e+01 2.614e+01 2.813e+01 2.992e+01 3.999e+01, threshold=5.626e+01, percent-clipped=0.0 2023-12-21 16:27:45,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=126053.33333333333, ans=0.125 2023-12-21 16:27:47,747 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.31 vs. limit=15.0 2023-12-21 16:27:54,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=126053.33333333333, ans=0.1 2023-12-21 16:27:58,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=126120.0, ans=0.125 2023-12-21 16:28:06,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=126186.66666666667, ans=0.125 2023-12-21 16:28:17,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=126253.33333333333, ans=0.125 2023-12-21 16:28:27,570 INFO [train.py:886] (1/4) Epoch 4, batch 4650, loss[loss=0.01631, audio_tagging_loss=0.01631, over 25000.00 frames. ], tot_loss[loss=0.01688, audio_tagging_loss=0.01688, over 4952688.86 frames. ], batch size: 100, lr: 2.38e-02, grad_scale: 128.0 2023-12-21 16:28:30,729 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.52 vs. limit=22.5 2023-12-21 16:28:34,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=126320.0, ans=0.0 2023-12-21 16:28:50,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=126453.33333333333, ans=0.125 2023-12-21 16:29:10,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=126586.66666666667, ans=0.125 2023-12-21 16:29:11,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=126586.66666666667, ans=0.1 2023-12-21 16:29:18,451 INFO [train.py:886] (1/4) Epoch 4, batch 4700, loss[loss=0.01786, audio_tagging_loss=0.01786, over 24750.00 frames. ], tot_loss[loss=0.01702, audio_tagging_loss=0.01702, over 4954352.82 frames. ], batch size: 99, lr: 2.38e-02, grad_scale: 128.0 2023-12-21 16:29:21,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=126653.33333333333, ans=0.0 2023-12-21 16:29:27,728 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.225e+01 2.637e+01 2.864e+01 3.107e+01 3.954e+01, threshold=5.728e+01, percent-clipped=0.0 2023-12-21 16:29:36,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=126786.66666666667, ans=0.125 2023-12-21 16:29:39,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=126786.66666666667, ans=0.125 2023-12-21 16:30:05,317 INFO [train.py:886] (1/4) Epoch 4, batch 4750, loss[loss=0.01947, audio_tagging_loss=0.01947, over 24750.00 frames. ], tot_loss[loss=0.01717, audio_tagging_loss=0.01717, over 4948603.15 frames. ], batch size: 99, lr: 2.37e-02, grad_scale: 128.0 2023-12-21 16:30:09,980 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.08 vs. limit=22.5 2023-12-21 16:30:42,783 INFO [train.py:886] (1/4) Epoch 5, batch 0, loss[loss=0.04546, audio_tagging_loss=0.04546, over 23954.00 frames. ], tot_loss[loss=0.04546, audio_tagging_loss=0.04546, over 23954.00 frames. ], batch size: 100, lr: 2.21e-02, grad_scale: 128.0 2023-12-21 16:30:42,784 INFO [train.py:909] (1/4) Computing validation loss 2023-12-21 16:31:04,474 INFO [train.py:917] (1/4) Epoch 5, validation: loss=0.03772, audio_tagging_loss=0.03772, over 3737520.00 frames. 2023-12-21 16:31:04,474 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-21 16:31:11,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=127093.33333333333, ans=0.1 2023-12-21 16:31:11,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.86 vs. limit=15.0 2023-12-21 16:31:17,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=127160.0, ans=0.1 2023-12-21 16:31:21,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=127160.0, ans=15.0 2023-12-21 16:31:30,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=127226.66666666667, ans=0.05 2023-12-21 16:31:39,102 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=26.83 vs. limit=22.5 2023-12-21 16:31:44,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.73 vs. limit=22.5 2023-12-21 16:31:47,042 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.194e+01 2.748e+01 3.184e+01 3.727e+01 1.037e+02, threshold=6.368e+01, percent-clipped=5.0 2023-12-21 16:31:52,725 INFO [train.py:886] (1/4) Epoch 5, batch 50, loss[loss=0.02128, audio_tagging_loss=0.02128, over 25000.00 frames. ], tot_loss[loss=0.02691, audio_tagging_loss=0.02691, over 1116084.76 frames. ], batch size: 100, lr: 2.21e-02, grad_scale: 128.0 2023-12-21 16:31:52,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=127426.66666666667, ans=0.0 2023-12-21 16:32:11,363 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.83 vs. limit=22.5 2023-12-21 16:32:18,208 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.65 vs. limit=22.5 2023-12-21 16:32:23,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=127626.66666666667, ans=0.125 2023-12-21 16:32:43,325 INFO [train.py:886] (1/4) Epoch 5, batch 100, loss[loss=0.01966, audio_tagging_loss=0.01966, over 25000.00 frames. ], tot_loss[loss=0.02344, audio_tagging_loss=0.02344, over 1965976.86 frames. ], batch size: 100, lr: 2.20e-02, grad_scale: 128.0 2023-12-21 16:32:46,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=127760.0, ans=0.0 2023-12-21 16:32:46,838 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.89 vs. limit=22.5 2023-12-21 16:32:57,796 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.07 vs. limit=22.5 2023-12-21 16:33:03,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=127893.33333333333, ans=0.125 2023-12-21 16:33:08,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=127893.33333333333, ans=0.2 2023-12-21 16:33:20,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=127960.0, ans=0.125 2023-12-21 16:33:21,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=128026.66666666667, ans=0.025 2023-12-21 16:33:26,373 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 2.735e+01 2.959e+01 3.135e+01 3.807e+01, threshold=5.918e+01, percent-clipped=0.0 2023-12-21 16:33:32,082 INFO [train.py:886] (1/4) Epoch 5, batch 150, loss[loss=0.01411, audio_tagging_loss=0.01411, over 25000.00 frames. ], tot_loss[loss=0.02125, audio_tagging_loss=0.02125, over 2631511.98 frames. ], batch size: 100, lr: 2.20e-02, grad_scale: 128.0 2023-12-21 16:33:40,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=128093.33333333333, ans=0.0 2023-12-21 16:33:44,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=128160.0, ans=0.0 2023-12-21 16:33:52,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=128226.66666666667, ans=0.1 2023-12-21 16:33:57,335 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.61 vs. limit=10.0 2023-12-21 16:34:06,234 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=20.09 vs. limit=22.5 2023-12-21 16:34:07,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=128293.33333333333, ans=0.125 2023-12-21 16:34:08,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=128293.33333333333, ans=0.125 2023-12-21 16:34:23,646 INFO [train.py:886] (1/4) Epoch 5, batch 200, loss[loss=0.01911, audio_tagging_loss=0.01911, over 24750.00 frames. ], tot_loss[loss=0.01982, audio_tagging_loss=0.01982, over 3150369.67 frames. ], batch size: 99, lr: 2.20e-02, grad_scale: 128.0 2023-12-21 16:34:24,375 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.95 vs. limit=22.5 2023-12-21 16:34:24,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=128426.66666666667, ans=0.04949747468305833 2023-12-21 16:34:32,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=128493.33333333333, ans=0.2 2023-12-21 16:34:34,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=128493.33333333333, ans=0.1 2023-12-21 16:34:50,124 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.82 vs. limit=15.0 2023-12-21 16:35:07,584 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.138e+01 2.571e+01 2.693e+01 2.979e+01 3.922e+01, threshold=5.386e+01, percent-clipped=0.0 2023-12-21 16:35:13,424 INFO [train.py:886] (1/4) Epoch 5, batch 250, loss[loss=0.0166, audio_tagging_loss=0.0166, over 22515.00 frames. ], tot_loss[loss=0.01914, audio_tagging_loss=0.01914, over 3555099.27 frames. ], batch size: 107, lr: 2.20e-02, grad_scale: 128.0 2023-12-21 16:35:43,691 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=4.305e+00 2023-12-21 16:36:04,549 INFO [train.py:886] (1/4) Epoch 5, batch 300, loss[loss=0.01598, audio_tagging_loss=0.01598, over 24750.00 frames. ], tot_loss[loss=0.01858, audio_tagging_loss=0.01858, over 3860098.14 frames. ], batch size: 99, lr: 2.19e-02, grad_scale: 128.0 2023-12-21 16:36:22,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=129160.0, ans=0.1 2023-12-21 16:36:31,926 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.71 vs. limit=15.0 2023-12-21 16:36:42,434 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.71 vs. limit=15.0 2023-12-21 16:36:50,050 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+01 2.527e+01 2.747e+01 2.947e+01 3.578e+01, threshold=5.493e+01, percent-clipped=0.0 2023-12-21 16:36:55,747 INFO [train.py:886] (1/4) Epoch 5, batch 350, loss[loss=0.01696, audio_tagging_loss=0.01696, over 24750.00 frames. ], tot_loss[loss=0.01822, audio_tagging_loss=0.01822, over 4094910.64 frames. ], batch size: 99, lr: 2.19e-02, grad_scale: 128.0 2023-12-21 16:36:58,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=129426.66666666667, ans=0.025 2023-12-21 16:37:15,021 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.29 vs. limit=15.0 2023-12-21 16:37:21,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=129560.0, ans=0.0 2023-12-21 16:37:23,345 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=15.0 2023-12-21 16:37:31,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=129626.66666666667, ans=0.0 2023-12-21 16:37:46,206 INFO [train.py:886] (1/4) Epoch 5, batch 400, loss[loss=0.01493, audio_tagging_loss=0.01493, over 24750.00 frames. ], tot_loss[loss=0.01778, audio_tagging_loss=0.01778, over 4280512.07 frames. ], batch size: 99, lr: 2.19e-02, grad_scale: 128.0 2023-12-21 16:38:04,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=129826.66666666667, ans=0.1 2023-12-21 16:38:17,319 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.97 vs. limit=22.5 2023-12-21 16:38:31,457 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.088e+01 2.542e+01 2.726e+01 2.924e+01 4.010e+01, threshold=5.453e+01, percent-clipped=0.0 2023-12-21 16:38:35,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=130026.66666666667, ans=0.2 2023-12-21 16:38:36,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=130026.66666666667, ans=0.0 2023-12-21 16:38:37,880 INFO [train.py:886] (1/4) Epoch 5, batch 450, loss[loss=0.01479, audio_tagging_loss=0.01479, over 25000.00 frames. ], tot_loss[loss=0.01742, audio_tagging_loss=0.01742, over 4431431.17 frames. ], batch size: 100, lr: 2.19e-02, grad_scale: 128.0 2023-12-21 16:38:40,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=130093.33333333333, ans=0.125 2023-12-21 16:38:47,028 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2023-12-21 16:38:48,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=130160.0, ans=0.1 2023-12-21 16:39:04,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=130226.66666666667, ans=0.0 2023-12-21 16:39:14,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=130293.33333333333, ans=0.125 2023-12-21 16:39:26,420 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.16 vs. limit=15.0 2023-12-21 16:39:28,827 INFO [train.py:886] (1/4) Epoch 5, batch 500, loss[loss=0.01548, audio_tagging_loss=0.01548, over 21816.00 frames. ], tot_loss[loss=0.01716, audio_tagging_loss=0.01716, over 4545961.36 frames. ], batch size: 107, lr: 2.18e-02, grad_scale: 128.0 2023-12-21 16:39:31,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=130426.66666666667, ans=0.0 2023-12-21 16:39:57,180 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=11.11 vs. limit=10.0 2023-12-21 16:40:05,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=130626.66666666667, ans=0.07 2023-12-21 16:40:13,608 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.089e+01 2.467e+01 2.664e+01 2.880e+01 3.369e+01, threshold=5.329e+01, percent-clipped=0.0 2023-12-21 16:40:19,468 INFO [train.py:886] (1/4) Epoch 5, batch 550, loss[loss=0.01407, audio_tagging_loss=0.01407, over 25000.00 frames. ], tot_loss[loss=0.01705, audio_tagging_loss=0.01705, over 4640122.13 frames. ], batch size: 100, lr: 2.18e-02, grad_scale: 128.0 2023-12-21 16:40:30,985 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.25 vs. limit=15.0 2023-12-21 16:40:35,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=130826.66666666667, ans=15.0 2023-12-21 16:40:43,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=130893.33333333333, ans=0.0 2023-12-21 16:40:58,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=131026.66666666667, ans=0.1 2023-12-21 16:41:10,430 INFO [train.py:886] (1/4) Epoch 5, batch 600, loss[loss=0.01946, audio_tagging_loss=0.01946, over 24946.00 frames. ], tot_loss[loss=0.01711, audio_tagging_loss=0.01711, over 4710611.84 frames. ], batch size: 100, lr: 2.18e-02, grad_scale: 128.0 2023-12-21 16:41:13,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=131093.33333333334, ans=0.125 2023-12-21 16:41:14,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.07 vs. limit=15.0 2023-12-21 16:41:25,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=131160.0, ans=0.125 2023-12-21 16:41:27,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=131160.0, ans=0.0 2023-12-21 16:41:28,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=131160.0, ans=0.125 2023-12-21 16:41:39,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=131293.33333333334, ans=0.1 2023-12-21 16:41:43,489 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.09 vs. limit=15.0 2023-12-21 16:41:44,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=131293.33333333334, ans=0.0 2023-12-21 16:41:53,920 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.178e+01 2.543e+01 2.791e+01 2.900e+01 3.878e+01, threshold=5.581e+01, percent-clipped=0.0 2023-12-21 16:41:55,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=131360.0, ans=0.0 2023-12-21 16:42:00,252 INFO [train.py:886] (1/4) Epoch 5, batch 650, loss[loss=0.01941, audio_tagging_loss=0.01941, over 22970.00 frames. ], tot_loss[loss=0.01715, audio_tagging_loss=0.01715, over 4755685.06 frames. ], batch size: 107, lr: 2.18e-02, grad_scale: 128.0 2023-12-21 16:42:07,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=131426.66666666666, ans=0.125 2023-12-21 16:42:18,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=131493.33333333334, ans=0.125 2023-12-21 16:42:27,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=131560.0, ans=0.125 2023-12-21 16:42:28,065 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=16.25 vs. limit=15.0 2023-12-21 16:42:37,574 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=4.805e+00 2023-12-21 16:42:38,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=131626.66666666666, ans=0.0 2023-12-21 16:42:45,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=131693.33333333334, ans=0.1 2023-12-21 16:42:49,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=131760.0, ans=0.1 2023-12-21 16:42:50,310 INFO [train.py:886] (1/4) Epoch 5, batch 700, loss[loss=0.01521, audio_tagging_loss=0.01521, over 25000.00 frames. ], tot_loss[loss=0.01711, audio_tagging_loss=0.01711, over 4793966.88 frames. ], batch size: 100, lr: 2.18e-02, grad_scale: 128.0 2023-12-21 16:42:50,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=131760.0, ans=0.125 2023-12-21 16:43:06,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=131826.66666666666, ans=0.0 2023-12-21 16:43:17,972 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.75 vs. limit=15.0 2023-12-21 16:43:27,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=131960.0, ans=0.0 2023-12-21 16:43:35,718 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.308e+01 2.570e+01 2.719e+01 2.922e+01 3.851e+01, threshold=5.438e+01, percent-clipped=0.0 2023-12-21 16:43:39,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=132026.66666666666, ans=0.0 2023-12-21 16:43:41,361 INFO [train.py:886] (1/4) Epoch 5, batch 750, loss[loss=0.01352, audio_tagging_loss=0.01352, over 25000.00 frames. ], tot_loss[loss=0.01694, audio_tagging_loss=0.01694, over 4829726.46 frames. ], batch size: 100, lr: 2.17e-02, grad_scale: 128.0 2023-12-21 16:43:43,906 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.65 vs. limit=22.5 2023-12-21 16:44:14,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=132293.33333333334, ans=0.125 2023-12-21 16:44:17,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=132293.33333333334, ans=0.0 2023-12-21 16:44:18,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=132293.33333333334, ans=0.2 2023-12-21 16:44:18,934 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.31 vs. limit=15.0 2023-12-21 16:44:31,582 INFO [train.py:886] (1/4) Epoch 5, batch 800, loss[loss=0.01591, audio_tagging_loss=0.01591, over 24750.00 frames. ], tot_loss[loss=0.01682, audio_tagging_loss=0.01682, over 4854752.70 frames. ], batch size: 99, lr: 2.17e-02, grad_scale: 128.0 2023-12-21 16:44:33,010 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.55 vs. limit=22.5 2023-12-21 16:44:42,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=132493.33333333334, ans=0.0 2023-12-21 16:44:54,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=132560.0, ans=0.125 2023-12-21 16:45:18,289 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.284e+01 2.636e+01 2.805e+01 3.064e+01 4.001e+01, threshold=5.610e+01, percent-clipped=0.0 2023-12-21 16:45:23,044 INFO [train.py:886] (1/4) Epoch 5, batch 850, loss[loss=0.01676, audio_tagging_loss=0.01676, over 25000.00 frames. ], tot_loss[loss=0.01679, audio_tagging_loss=0.01679, over 4876493.48 frames. ], batch size: 100, lr: 2.17e-02, grad_scale: 128.0 2023-12-21 16:45:29,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=132760.0, ans=0.125 2023-12-21 16:45:37,542 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.32 vs. limit=15.0 2023-12-21 16:45:47,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=132893.33333333334, ans=0.125 2023-12-21 16:45:52,792 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.422e+00 2023-12-21 16:46:00,614 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.26 vs. limit=10.0 2023-12-21 16:46:12,885 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.92 vs. limit=15.0 2023-12-21 16:46:14,110 INFO [train.py:886] (1/4) Epoch 5, batch 900, loss[loss=0.01594, audio_tagging_loss=0.01594, over 24750.00 frames. ], tot_loss[loss=0.01679, audio_tagging_loss=0.01679, over 4899618.49 frames. ], batch size: 99, lr: 2.17e-02, grad_scale: 128.0 2023-12-21 16:46:15,453 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.19 vs. limit=22.5 2023-12-21 16:46:20,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=133093.33333333334, ans=0.125 2023-12-21 16:46:40,124 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.82 vs. limit=15.0 2023-12-21 16:46:42,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=133226.66666666666, ans=0.95 2023-12-21 16:46:48,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=133293.33333333334, ans=0.05 2023-12-21 16:47:01,327 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.225e+01 2.545e+01 2.742e+01 2.943e+01 3.695e+01, threshold=5.484e+01, percent-clipped=0.0 2023-12-21 16:47:06,128 INFO [train.py:886] (1/4) Epoch 5, batch 950, loss[loss=0.01804, audio_tagging_loss=0.01804, over 24750.00 frames. ], tot_loss[loss=0.01689, audio_tagging_loss=0.01689, over 4910695.63 frames. ], batch size: 99, lr: 2.16e-02, grad_scale: 128.0 2023-12-21 16:47:09,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=133426.66666666666, ans=0.0 2023-12-21 16:47:14,551 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.71 vs. limit=10.0 2023-12-21 16:47:23,882 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.96 vs. limit=15.0 2023-12-21 16:47:31,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.51 vs. limit=15.0 2023-12-21 16:47:42,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=133626.66666666666, ans=0.125 2023-12-21 16:47:42,613 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.41 vs. limit=15.0 2023-12-21 16:47:46,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=133693.33333333334, ans=0.0 2023-12-21 16:47:51,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=133693.33333333334, ans=0.0 2023-12-21 16:47:57,564 INFO [train.py:886] (1/4) Epoch 5, batch 1000, loss[loss=0.01631, audio_tagging_loss=0.01631, over 25000.00 frames. ], tot_loss[loss=0.01696, audio_tagging_loss=0.01696, over 4915294.50 frames. ], batch size: 100, lr: 2.16e-02, grad_scale: 128.0 2023-12-21 16:48:04,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=133760.0, ans=0.1 2023-12-21 16:48:13,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=133826.66666666666, ans=0.0 2023-12-21 16:48:16,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=133826.66666666666, ans=0.07 2023-12-21 16:48:26,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=133893.33333333334, ans=0.125 2023-12-21 16:48:33,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=133960.0, ans=0.1 2023-12-21 16:48:41,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.42 vs. limit=15.0 2023-12-21 16:48:42,464 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.163e+01 2.511e+01 2.666e+01 2.885e+01 3.641e+01, threshold=5.332e+01, percent-clipped=0.0 2023-12-21 16:48:48,794 INFO [train.py:886] (1/4) Epoch 5, batch 1050, loss[loss=0.01554, audio_tagging_loss=0.01554, over 25000.00 frames. ], tot_loss[loss=0.01681, audio_tagging_loss=0.01681, over 4923615.97 frames. ], batch size: 100, lr: 2.16e-02, grad_scale: 128.0 2023-12-21 16:49:25,120 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.20 vs. limit=12.0 2023-12-21 16:49:30,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=134360.0, ans=0.125 2023-12-21 16:49:38,679 INFO [train.py:886] (1/4) Epoch 5, batch 1100, loss[loss=0.01996, audio_tagging_loss=0.01996, over 25000.00 frames. ], tot_loss[loss=0.0167, audio_tagging_loss=0.0167, over 4931724.29 frames. ], batch size: 100, lr: 2.16e-02, grad_scale: 128.0 2023-12-21 16:49:39,946 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2023-12-21 16:49:42,778 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.96 vs. limit=15.0 2023-12-21 16:49:43,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=134426.66666666666, ans=0.2 2023-12-21 16:49:54,334 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.36 vs. limit=15.0 2023-12-21 16:50:10,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=134626.66666666666, ans=0.125 2023-12-21 16:50:12,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=134626.66666666666, ans=0.125 2023-12-21 16:50:25,435 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.226e+01 2.502e+01 2.717e+01 2.945e+01 3.841e+01, threshold=5.435e+01, percent-clipped=0.0 2023-12-21 16:50:28,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=134693.33333333334, ans=0.2 2023-12-21 16:50:30,927 INFO [train.py:886] (1/4) Epoch 5, batch 1150, loss[loss=0.0157, audio_tagging_loss=0.0157, over 25000.00 frames. ], tot_loss[loss=0.01658, audio_tagging_loss=0.01658, over 4937369.65 frames. ], batch size: 100, lr: 2.15e-02, grad_scale: 128.0 2023-12-21 16:50:31,585 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=14.55 vs. limit=15.0 2023-12-21 16:50:43,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=134826.66666666666, ans=0.0 2023-12-21 16:50:43,466 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=4.353e+00 2023-12-21 16:50:49,536 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.92 vs. limit=15.0 2023-12-21 16:50:50,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=134893.33333333334, ans=0.0 2023-12-21 16:50:55,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=134893.33333333334, ans=0.125 2023-12-21 16:51:06,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=134960.0, ans=0.125 2023-12-21 16:51:11,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=135026.66666666666, ans=0.0 2023-12-21 16:51:16,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=135026.66666666666, ans=0.1 2023-12-21 16:51:21,046 INFO [train.py:886] (1/4) Epoch 5, batch 1200, loss[loss=0.01844, audio_tagging_loss=0.01844, over 24750.00 frames. ], tot_loss[loss=0.01662, audio_tagging_loss=0.01662, over 4943768.54 frames. ], batch size: 99, lr: 2.15e-02, grad_scale: 128.0 2023-12-21 16:51:22,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.08 vs. limit=22.5 2023-12-21 16:51:27,665 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.278e+01 2023-12-21 16:51:39,569 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.06 vs. limit=15.0 2023-12-21 16:51:57,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=135293.33333333334, ans=0.1 2023-12-21 16:52:07,530 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.268e+01 2.551e+01 2.723e+01 2.888e+01 4.328e+01, threshold=5.445e+01, percent-clipped=0.0 2023-12-21 16:52:12,138 INFO [train.py:886] (1/4) Epoch 5, batch 1250, loss[loss=0.01714, audio_tagging_loss=0.01714, over 24750.00 frames. ], tot_loss[loss=0.01681, audio_tagging_loss=0.01681, over 4937314.64 frames. ], batch size: 99, lr: 2.15e-02, grad_scale: 128.0 2023-12-21 16:52:12,646 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.13 vs. limit=15.0 2023-12-21 16:52:26,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=135493.33333333334, ans=0.125 2023-12-21 16:52:28,445 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.66 vs. limit=15.0 2023-12-21 16:52:41,116 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.59 vs. limit=15.0 2023-12-21 16:52:43,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.92 vs. limit=15.0 2023-12-21 16:52:57,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=135693.33333333334, ans=0.125 2023-12-21 16:53:04,284 INFO [train.py:886] (1/4) Epoch 5, batch 1300, loss[loss=0.02006, audio_tagging_loss=0.02006, over 24750.00 frames. ], tot_loss[loss=0.01694, audio_tagging_loss=0.01694, over 4935357.84 frames. ], batch size: 99, lr: 2.15e-02, grad_scale: 128.0 2023-12-21 16:53:05,752 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.62 vs. limit=22.5 2023-12-21 16:53:20,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=135826.66666666666, ans=10.0 2023-12-21 16:53:49,039 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.140e+01 2.561e+01 2.737e+01 2.945e+01 3.593e+01, threshold=5.474e+01, percent-clipped=0.0 2023-12-21 16:53:49,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=136026.66666666666, ans=0.125 2023-12-21 16:53:52,186 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=6.653e+00 2023-12-21 16:53:53,816 INFO [train.py:886] (1/4) Epoch 5, batch 1350, loss[loss=0.01711, audio_tagging_loss=0.01711, over 25000.00 frames. ], tot_loss[loss=0.01689, audio_tagging_loss=0.01689, over 4935626.63 frames. ], batch size: 100, lr: 2.14e-02, grad_scale: 128.0 2023-12-21 16:53:55,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=136093.33333333334, ans=0.125 2023-12-21 16:53:56,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=136093.33333333334, ans=0.1 2023-12-21 16:53:59,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=136093.33333333334, ans=0.0 2023-12-21 16:54:16,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=136226.66666666666, ans=0.125 2023-12-21 16:54:31,152 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.56 vs. limit=6.0 2023-12-21 16:54:34,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=136360.0, ans=0.0 2023-12-21 16:54:36,234 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 16:54:46,090 INFO [train.py:886] (1/4) Epoch 5, batch 1400, loss[loss=0.01627, audio_tagging_loss=0.01627, over 25000.00 frames. ], tot_loss[loss=0.01681, audio_tagging_loss=0.01681, over 4946125.76 frames. ], batch size: 100, lr: 2.14e-02, grad_scale: 128.0 2023-12-21 16:54:49,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=136426.66666666666, ans=0.0 2023-12-21 16:54:58,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=136493.33333333334, ans=0.0 2023-12-21 16:55:07,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=136560.0, ans=0.025 2023-12-21 16:55:19,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=136626.66666666666, ans=0.125 2023-12-21 16:55:20,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=136626.66666666666, ans=0.015 2023-12-21 16:55:22,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=136626.66666666666, ans=0.125 2023-12-21 16:55:31,496 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.136e+01 2.570e+01 2.791e+01 2.991e+01 3.776e+01, threshold=5.582e+01, percent-clipped=0.0 2023-12-21 16:55:36,235 INFO [train.py:886] (1/4) Epoch 5, batch 1450, loss[loss=0.015, audio_tagging_loss=0.015, over 25000.00 frames. ], tot_loss[loss=0.01667, audio_tagging_loss=0.01667, over 4951589.11 frames. ], batch size: 100, lr: 2.14e-02, grad_scale: 128.0 2023-12-21 16:55:51,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=136826.66666666666, ans=0.0 2023-12-21 16:55:54,591 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 16:56:05,346 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.61 vs. limit=10.0 2023-12-21 16:56:14,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=136960.0, ans=0.0 2023-12-21 16:56:20,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=137026.66666666666, ans=0.125 2023-12-21 16:56:29,138 INFO [train.py:886] (1/4) Epoch 5, batch 1500, loss[loss=0.01653, audio_tagging_loss=0.01653, over 25000.00 frames. ], tot_loss[loss=0.01658, audio_tagging_loss=0.01658, over 4954753.01 frames. ], batch size: 100, lr: 2.14e-02, grad_scale: 128.0 2023-12-21 16:56:37,203 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.71 vs. limit=15.0 2023-12-21 16:56:42,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=137160.0, ans=0.0 2023-12-21 16:56:42,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=137160.0, ans=0.125 2023-12-21 16:56:45,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=137160.0, ans=0.5 2023-12-21 16:56:51,577 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.42 vs. limit=15.0 2023-12-21 16:57:05,653 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.03 vs. limit=15.0 2023-12-21 16:57:12,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=137360.0, ans=15.0 2023-12-21 16:57:14,690 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.117e+01 2.623e+01 2.830e+01 3.033e+01 3.476e+01, threshold=5.660e+01, percent-clipped=0.0 2023-12-21 16:57:20,727 INFO [train.py:886] (1/4) Epoch 5, batch 1550, loss[loss=0.01617, audio_tagging_loss=0.01617, over 24750.00 frames. ], tot_loss[loss=0.01685, audio_tagging_loss=0.01685, over 4954852.38 frames. ], batch size: 99, lr: 2.14e-02, grad_scale: 128.0 2023-12-21 16:57:37,012 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.53 vs. limit=15.0 2023-12-21 16:57:48,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=137560.0, ans=0.1 2023-12-21 16:58:00,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.94 vs. limit=15.0 2023-12-21 16:58:06,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=137693.33333333334, ans=0.2 2023-12-21 16:58:10,332 INFO [train.py:886] (1/4) Epoch 5, batch 1600, loss[loss=0.01661, audio_tagging_loss=0.01661, over 24750.00 frames. ], tot_loss[loss=0.01693, audio_tagging_loss=0.01693, over 4945728.90 frames. ], batch size: 99, lr: 2.13e-02, grad_scale: 128.0 2023-12-21 16:58:12,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.99 vs. limit=12.0 2023-12-21 16:58:18,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=137760.0, ans=0.0 2023-12-21 16:58:29,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=137826.66666666666, ans=0.1 2023-12-21 16:58:41,896 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=17.50 vs. limit=15.0 2023-12-21 16:58:44,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=137960.0, ans=0.125 2023-12-21 16:58:55,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=138026.66666666666, ans=0.125 2023-12-21 16:58:56,101 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.238e+01 2.601e+01 2.757e+01 2.954e+01 3.912e+01, threshold=5.513e+01, percent-clipped=0.0 2023-12-21 16:59:01,654 INFO [train.py:886] (1/4) Epoch 5, batch 1650, loss[loss=0.01587, audio_tagging_loss=0.01587, over 24750.00 frames. ], tot_loss[loss=0.01694, audio_tagging_loss=0.01694, over 4946519.62 frames. ], batch size: 99, lr: 2.13e-02, grad_scale: 128.0 2023-12-21 16:59:11,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=138160.0, ans=0.1 2023-12-21 16:59:15,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=138160.0, ans=0.0 2023-12-21 16:59:35,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=138293.33333333334, ans=0.0 2023-12-21 16:59:39,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=138293.33333333334, ans=0.5 2023-12-21 16:59:48,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=138360.0, ans=0.5 2023-12-21 16:59:52,670 INFO [train.py:886] (1/4) Epoch 5, batch 1700, loss[loss=0.01714, audio_tagging_loss=0.01714, over 25000.00 frames. ], tot_loss[loss=0.01678, audio_tagging_loss=0.01678, over 4948605.20 frames. ], batch size: 100, lr: 2.13e-02, grad_scale: 128.0 2023-12-21 16:59:59,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=138426.66666666666, ans=0.2 2023-12-21 17:00:00,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.61 vs. limit=15.0 2023-12-21 17:00:10,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=138493.33333333334, ans=0.125 2023-12-21 17:00:12,277 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.32 vs. limit=15.0 2023-12-21 17:00:18,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=138560.0, ans=0.125 2023-12-21 17:00:19,623 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.102e+00 2023-12-21 17:00:24,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=138626.66666666666, ans=0.2 2023-12-21 17:00:27,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=138626.66666666666, ans=0.0 2023-12-21 17:00:32,572 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.37 vs. limit=15.0 2023-12-21 17:00:40,305 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.202e+01 2.593e+01 2.813e+01 3.022e+01 3.717e+01, threshold=5.626e+01, percent-clipped=0.0 2023-12-21 17:00:43,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.94 vs. limit=6.0 2023-12-21 17:00:44,700 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.92 vs. limit=15.0 2023-12-21 17:00:45,136 INFO [train.py:886] (1/4) Epoch 5, batch 1750, loss[loss=0.01831, audio_tagging_loss=0.01831, over 25000.00 frames. ], tot_loss[loss=0.01666, audio_tagging_loss=0.01666, over 4948520.22 frames. ], batch size: 100, lr: 2.13e-02, grad_scale: 128.0 2023-12-21 17:00:52,563 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.04 vs. limit=8.0 2023-12-21 17:01:06,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=138893.33333333334, ans=0.125 2023-12-21 17:01:06,625 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.97 vs. limit=15.0 2023-12-21 17:01:07,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=138893.33333333334, ans=0.05 2023-12-21 17:01:15,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=138960.0, ans=0.125 2023-12-21 17:01:15,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=138960.0, ans=0.125 2023-12-21 17:01:16,033 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.72 vs. limit=6.0 2023-12-21 17:01:16,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=138960.0, ans=0.125 2023-12-21 17:01:20,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=138960.0, ans=0.125 2023-12-21 17:01:26,593 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.78 vs. limit=22.5 2023-12-21 17:01:37,538 INFO [train.py:886] (1/4) Epoch 5, batch 1800, loss[loss=0.01544, audio_tagging_loss=0.01544, over 25000.00 frames. ], tot_loss[loss=0.01662, audio_tagging_loss=0.01662, over 4953512.69 frames. ], batch size: 100, lr: 2.12e-02, grad_scale: 128.0 2023-12-21 17:01:48,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=139160.0, ans=0.05 2023-12-21 17:01:51,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=139160.0, ans=0.125 2023-12-21 17:02:10,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=139293.33333333334, ans=0.04949747468305833 2023-12-21 17:02:23,764 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.172e+01 2.569e+01 2.782e+01 2.969e+01 3.788e+01, threshold=5.564e+01, percent-clipped=0.0 2023-12-21 17:02:28,445 INFO [train.py:886] (1/4) Epoch 5, batch 1850, loss[loss=0.01715, audio_tagging_loss=0.01715, over 24750.00 frames. ], tot_loss[loss=0.01678, audio_tagging_loss=0.01678, over 4954809.33 frames. ], batch size: 99, lr: 2.12e-02, grad_scale: 128.0 2023-12-21 17:02:30,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=139426.66666666666, ans=0.0 2023-12-21 17:02:32,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=139426.66666666666, ans=0.125 2023-12-21 17:02:44,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=139493.33333333334, ans=0.125 2023-12-21 17:02:55,217 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.037e-02 2023-12-21 17:03:02,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=139626.66666666666, ans=0.0 2023-12-21 17:03:19,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=139760.0, ans=0.0 2023-12-21 17:03:19,903 INFO [train.py:886] (1/4) Epoch 5, batch 1900, loss[loss=0.01763, audio_tagging_loss=0.01763, over 24750.00 frames. ], tot_loss[loss=0.01691, audio_tagging_loss=0.01691, over 4945742.95 frames. ], batch size: 99, lr: 2.12e-02, grad_scale: 128.0 2023-12-21 17:03:21,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=139760.0, ans=0.125 2023-12-21 17:03:26,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=139760.0, ans=0.0 2023-12-21 17:03:26,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=139760.0, ans=0.0 2023-12-21 17:03:39,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=139893.33333333334, ans=0.1 2023-12-21 17:03:45,164 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.40 vs. limit=6.0 2023-12-21 17:03:46,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=139893.33333333334, ans=10.0 2023-12-21 17:04:02,210 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.28 vs. limit=10.0 2023-12-21 17:04:05,393 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.155e+01 2.621e+01 2.837e+01 3.063e+01 3.713e+01, threshold=5.675e+01, percent-clipped=0.0 2023-12-21 17:04:11,611 INFO [train.py:886] (1/4) Epoch 5, batch 1950, loss[loss=0.01468, audio_tagging_loss=0.01468, over 24750.00 frames. ], tot_loss[loss=0.01678, audio_tagging_loss=0.01678, over 4948996.37 frames. ], batch size: 99, lr: 2.12e-02, grad_scale: 128.0 2023-12-21 17:04:23,223 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2023-12-21 17:04:24,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=140160.0, ans=0.125 2023-12-21 17:04:41,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=140293.33333333334, ans=0.1 2023-12-21 17:04:43,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=140293.33333333334, ans=0.0 2023-12-21 17:05:02,943 INFO [train.py:886] (1/4) Epoch 5, batch 2000, loss[loss=0.01742, audio_tagging_loss=0.01742, over 25000.00 frames. ], tot_loss[loss=0.01665, audio_tagging_loss=0.01665, over 4947288.77 frames. ], batch size: 100, lr: 2.11e-02, grad_scale: 64.0 2023-12-21 17:05:10,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=140426.66666666666, ans=0.125 2023-12-21 17:05:24,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=140560.0, ans=0.035 2023-12-21 17:05:25,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=140560.0, ans=0.125 2023-12-21 17:05:28,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=140560.0, ans=0.2 2023-12-21 17:05:32,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=140626.66666666666, ans=0.1 2023-12-21 17:05:36,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=140626.66666666666, ans=0.09899494936611666 2023-12-21 17:05:45,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=140693.33333333334, ans=0.125 2023-12-21 17:05:46,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=140693.33333333334, ans=0.125 2023-12-21 17:05:49,539 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.68 vs. limit=15.0 2023-12-21 17:05:51,609 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.499e+01 2.652e+01 2.874e+01 3.620e+01, threshold=5.305e+01, percent-clipped=0.0 2023-12-21 17:05:55,403 INFO [train.py:886] (1/4) Epoch 5, batch 2050, loss[loss=0.01766, audio_tagging_loss=0.01766, over 25000.00 frames. ], tot_loss[loss=0.01657, audio_tagging_loss=0.01657, over 4948861.44 frames. ], batch size: 100, lr: 2.11e-02, grad_scale: 64.0 2023-12-21 17:05:55,825 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.00 vs. limit=15.0 2023-12-21 17:05:56,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=140760.0, ans=0.0 2023-12-21 17:06:02,708 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.04 vs. limit=15.0 2023-12-21 17:06:03,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=140760.0, ans=0.125 2023-12-21 17:06:11,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=140826.66666666666, ans=0.125 2023-12-21 17:06:14,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=140893.33333333334, ans=0.2 2023-12-21 17:06:16,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=140893.33333333334, ans=0.125 2023-12-21 17:06:27,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=140960.0, ans=0.125 2023-12-21 17:06:30,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.20 vs. limit=10.0 2023-12-21 17:06:36,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=141026.66666666666, ans=0.1 2023-12-21 17:06:45,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=141093.33333333334, ans=0.0 2023-12-21 17:06:46,197 INFO [train.py:886] (1/4) Epoch 5, batch 2100, loss[loss=0.01533, audio_tagging_loss=0.01533, over 24750.00 frames. ], tot_loss[loss=0.01653, audio_tagging_loss=0.01653, over 4952600.25 frames. ], batch size: 99, lr: 2.11e-02, grad_scale: 64.0 2023-12-21 17:06:49,206 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.14 vs. limit=22.5 2023-12-21 17:07:07,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=141226.66666666666, ans=0.0 2023-12-21 17:07:07,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=141226.66666666666, ans=0.2 2023-12-21 17:07:25,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=141293.33333333334, ans=0.5 2023-12-21 17:07:31,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=141360.0, ans=0.0 2023-12-21 17:07:34,669 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.214e+01 2.574e+01 2.738e+01 2.896e+01 3.657e+01, threshold=5.477e+01, percent-clipped=0.0 2023-12-21 17:07:38,498 INFO [train.py:886] (1/4) Epoch 5, batch 2150, loss[loss=0.01649, audio_tagging_loss=0.01649, over 25000.00 frames. ], tot_loss[loss=0.01658, audio_tagging_loss=0.01658, over 4947537.61 frames. ], batch size: 100, lr: 2.11e-02, grad_scale: 64.0 2023-12-21 17:07:49,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=141493.33333333334, ans=0.125 2023-12-21 17:07:55,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=141493.33333333334, ans=0.125 2023-12-21 17:07:58,927 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.00 vs. limit=15.0 2023-12-21 17:08:02,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=141560.0, ans=0.125 2023-12-21 17:08:02,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=141560.0, ans=0.125 2023-12-21 17:08:07,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=141560.0, ans=0.025 2023-12-21 17:08:11,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=141626.66666666666, ans=0.2 2023-12-21 17:08:22,958 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.60 vs. limit=15.0 2023-12-21 17:08:31,225 INFO [train.py:886] (1/4) Epoch 5, batch 2200, loss[loss=0.01632, audio_tagging_loss=0.01632, over 24750.00 frames. ], tot_loss[loss=0.01663, audio_tagging_loss=0.01663, over 4946272.69 frames. ], batch size: 99, lr: 2.11e-02, grad_scale: 64.0 2023-12-21 17:08:33,651 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.12 vs. limit=6.0 2023-12-21 17:08:35,323 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.93 vs. limit=15.0 2023-12-21 17:08:39,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=141826.66666666666, ans=0.07 2023-12-21 17:08:40,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=141826.66666666666, ans=0.125 2023-12-21 17:08:46,005 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.43 vs. limit=15.0 2023-12-21 17:08:49,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=141893.33333333334, ans=0.1 2023-12-21 17:08:50,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=141893.33333333334, ans=0.1 2023-12-21 17:08:58,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=141893.33333333334, ans=0.125 2023-12-21 17:09:06,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=141960.0, ans=0.2 2023-12-21 17:09:17,620 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.285e+01 2.618e+01 2.828e+01 3.065e+01 3.912e+01, threshold=5.656e+01, percent-clipped=0.0 2023-12-21 17:09:21,475 INFO [train.py:886] (1/4) Epoch 5, batch 2250, loss[loss=0.01197, audio_tagging_loss=0.01197, over 24750.00 frames. ], tot_loss[loss=0.01672, audio_tagging_loss=0.01672, over 4939228.26 frames. ], batch size: 99, lr: 2.10e-02, grad_scale: 64.0 2023-12-21 17:09:38,866 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.15 vs. limit=15.0 2023-12-21 17:09:51,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=142293.33333333334, ans=0.5 2023-12-21 17:09:55,904 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.44 vs. limit=12.0 2023-12-21 17:10:07,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=142360.0, ans=0.125 2023-12-21 17:10:07,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=142360.0, ans=0.125 2023-12-21 17:10:14,392 INFO [train.py:886] (1/4) Epoch 5, batch 2300, loss[loss=0.01622, audio_tagging_loss=0.01622, over 25000.00 frames. ], tot_loss[loss=0.01658, audio_tagging_loss=0.01658, over 4943894.06 frames. ], batch size: 100, lr: 2.10e-02, grad_scale: 64.0 2023-12-21 17:10:35,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=142560.0, ans=0.2 2023-12-21 17:10:36,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=142560.0, ans=0.125 2023-12-21 17:10:38,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=142560.0, ans=0.125 2023-12-21 17:10:38,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=142560.0, ans=0.125 2023-12-21 17:10:50,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=142626.66666666666, ans=0.125 2023-12-21 17:10:54,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=142693.33333333334, ans=0.0 2023-12-21 17:11:00,693 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.096e+01 2.568e+01 2.751e+01 2.902e+01 4.993e+01, threshold=5.502e+01, percent-clipped=0.0 2023-12-21 17:11:05,213 INFO [train.py:886] (1/4) Epoch 5, batch 2350, loss[loss=0.01643, audio_tagging_loss=0.01643, over 24750.00 frames. ], tot_loss[loss=0.01645, audio_tagging_loss=0.01645, over 4943123.17 frames. ], batch size: 99, lr: 2.10e-02, grad_scale: 64.0 2023-12-21 17:11:14,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=142760.0, ans=0.1 2023-12-21 17:11:16,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.41 vs. limit=15.0 2023-12-21 17:11:19,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=142826.66666666666, ans=0.125 2023-12-21 17:11:41,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=142960.0, ans=0.0 2023-12-21 17:11:44,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=142960.0, ans=0.0 2023-12-21 17:11:49,631 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 17:11:51,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=143026.66666666666, ans=0.0 2023-12-21 17:11:57,072 INFO [train.py:886] (1/4) Epoch 5, batch 2400, loss[loss=0.01767, audio_tagging_loss=0.01767, over 25000.00 frames. ], tot_loss[loss=0.01649, audio_tagging_loss=0.01649, over 4949928.34 frames. ], batch size: 100, lr: 2.10e-02, grad_scale: 64.0 2023-12-21 17:12:09,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=143160.0, ans=0.1 2023-12-21 17:12:21,450 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.69 vs. limit=15.0 2023-12-21 17:12:33,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=143293.33333333334, ans=0.1 2023-12-21 17:12:36,313 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=12.0 2023-12-21 17:12:44,895 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.199e+01 2.521e+01 2.685e+01 2.894e+01 4.113e+01, threshold=5.370e+01, percent-clipped=0.0 2023-12-21 17:12:49,448 INFO [train.py:886] (1/4) Epoch 5, batch 2450, loss[loss=0.01831, audio_tagging_loss=0.01831, over 25000.00 frames. ], tot_loss[loss=0.01648, audio_tagging_loss=0.01648, over 4947700.60 frames. ], batch size: 100, lr: 2.10e-02, grad_scale: 64.0 2023-12-21 17:12:51,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=143426.66666666666, ans=0.125 2023-12-21 17:12:58,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=143493.33333333334, ans=0.125 2023-12-21 17:13:08,203 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.85 vs. limit=22.5 2023-12-21 17:13:23,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=143626.66666666666, ans=0.0 2023-12-21 17:13:27,838 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=8.242e-01 2023-12-21 17:13:39,907 INFO [train.py:886] (1/4) Epoch 5, batch 2500, loss[loss=0.01636, audio_tagging_loss=0.01636, over 24750.00 frames. ], tot_loss[loss=0.01672, audio_tagging_loss=0.01672, over 4944311.08 frames. ], batch size: 99, lr: 2.09e-02, grad_scale: 64.0 2023-12-21 17:13:40,674 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.63 vs. limit=6.0 2023-12-21 17:14:02,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=143893.33333333334, ans=0.0 2023-12-21 17:14:08,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=143893.33333333334, ans=0.1 2023-12-21 17:14:26,552 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.45 vs. limit=22.5 2023-12-21 17:14:27,831 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.141e+01 2.613e+01 2.844e+01 3.079e+01 3.579e+01, threshold=5.689e+01, percent-clipped=0.0 2023-12-21 17:14:31,608 INFO [train.py:886] (1/4) Epoch 5, batch 2550, loss[loss=0.01977, audio_tagging_loss=0.01977, over 25000.00 frames. ], tot_loss[loss=0.01682, audio_tagging_loss=0.01682, over 4938630.21 frames. ], batch size: 100, lr: 2.09e-02, grad_scale: 64.0 2023-12-21 17:14:43,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=144160.0, ans=0.125 2023-12-21 17:15:00,450 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.70 vs. limit=15.0 2023-12-21 17:15:05,249 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=15.0 2023-12-21 17:15:09,639 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.91 vs. limit=22.5 2023-12-21 17:15:21,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=144426.66666666666, ans=0.0 2023-12-21 17:15:21,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=144426.66666666666, ans=0.0 2023-12-21 17:15:22,874 INFO [train.py:886] (1/4) Epoch 5, batch 2600, loss[loss=0.01613, audio_tagging_loss=0.01613, over 24750.00 frames. ], tot_loss[loss=0.01674, audio_tagging_loss=0.01674, over 4937097.14 frames. ], batch size: 99, lr: 2.09e-02, grad_scale: 64.0 2023-12-21 17:15:39,826 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.25 vs. limit=22.5 2023-12-21 17:15:52,792 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=26.75 vs. limit=22.5 2023-12-21 17:16:00,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.80 vs. limit=22.5 2023-12-21 17:16:04,449 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.31 vs. limit=15.0 2023-12-21 17:16:10,459 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.073e+01 2.544e+01 2.733e+01 3.057e+01 4.063e+01, threshold=5.467e+01, percent-clipped=0.0 2023-12-21 17:16:14,199 INFO [train.py:886] (1/4) Epoch 5, batch 2650, loss[loss=0.01347, audio_tagging_loss=0.01347, over 25000.00 frames. ], tot_loss[loss=0.01655, audio_tagging_loss=0.01655, over 4938676.75 frames. ], batch size: 100, lr: 2.09e-02, grad_scale: 64.0 2023-12-21 17:16:17,520 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.80 vs. limit=15.0 2023-12-21 17:16:33,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=144826.66666666666, ans=0.125 2023-12-21 17:16:50,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=144960.0, ans=0.0 2023-12-21 17:16:51,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=144960.0, ans=0.125 2023-12-21 17:16:57,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=145026.66666666666, ans=0.125 2023-12-21 17:17:02,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=145026.66666666666, ans=0.125 2023-12-21 17:17:07,039 INFO [train.py:886] (1/4) Epoch 5, batch 2700, loss[loss=0.01583, audio_tagging_loss=0.01583, over 24750.00 frames. ], tot_loss[loss=0.01641, audio_tagging_loss=0.01641, over 4944920.91 frames. ], batch size: 99, lr: 2.08e-02, grad_scale: 64.0 2023-12-21 17:17:10,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=145093.33333333334, ans=0.1 2023-12-21 17:17:30,160 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.90 vs. limit=6.0 2023-12-21 17:17:52,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=145360.0, ans=0.125 2023-12-21 17:17:53,874 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.217e+01 2.469e+01 2.654e+01 2.877e+01 3.564e+01, threshold=5.308e+01, percent-clipped=0.0 2023-12-21 17:17:57,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=145426.66666666666, ans=0.0 2023-12-21 17:17:57,717 INFO [train.py:886] (1/4) Epoch 5, batch 2750, loss[loss=0.01671, audio_tagging_loss=0.01671, over 25000.00 frames. ], tot_loss[loss=0.01647, audio_tagging_loss=0.01647, over 4940959.89 frames. ], batch size: 100, lr: 2.08e-02, grad_scale: 64.0 2023-12-21 17:17:59,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=145426.66666666666, ans=0.125 2023-12-21 17:18:12,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=145493.33333333334, ans=0.125 2023-12-21 17:18:32,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=145626.66666666666, ans=0.125 2023-12-21 17:18:37,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=145626.66666666666, ans=0.125 2023-12-21 17:18:50,491 INFO [train.py:886] (1/4) Epoch 5, batch 2800, loss[loss=0.01731, audio_tagging_loss=0.01731, over 24750.00 frames. ], tot_loss[loss=0.01662, audio_tagging_loss=0.01662, over 4937783.29 frames. ], batch size: 99, lr: 2.08e-02, grad_scale: 64.0 2023-12-21 17:18:57,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=145760.0, ans=0.0 2023-12-21 17:19:17,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=145893.33333333334, ans=0.0 2023-12-21 17:19:38,108 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.375e+01 2.607e+01 2.811e+01 3.107e+01 4.329e+01, threshold=5.621e+01, percent-clipped=0.0 2023-12-21 17:19:40,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=146026.66666666666, ans=0.125 2023-12-21 17:19:42,649 INFO [train.py:886] (1/4) Epoch 5, batch 2850, loss[loss=0.01651, audio_tagging_loss=0.01651, over 25000.00 frames. ], tot_loss[loss=0.01663, audio_tagging_loss=0.01663, over 4933788.32 frames. ], batch size: 100, lr: 2.08e-02, grad_scale: 64.0 2023-12-21 17:19:45,113 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.99 vs. limit=15.0 2023-12-21 17:20:02,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=146226.66666666666, ans=0.125 2023-12-21 17:20:04,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=146226.66666666666, ans=0.0 2023-12-21 17:20:20,980 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=18.22 vs. limit=15.0 2023-12-21 17:20:31,800 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=15.0 2023-12-21 17:20:33,325 INFO [train.py:886] (1/4) Epoch 5, batch 2900, loss[loss=0.01492, audio_tagging_loss=0.01492, over 24750.00 frames. ], tot_loss[loss=0.01652, audio_tagging_loss=0.01652, over 4934838.06 frames. ], batch size: 99, lr: 2.08e-02, grad_scale: 64.0 2023-12-21 17:20:42,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=146426.66666666666, ans=0.025 2023-12-21 17:20:42,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=146426.66666666666, ans=0.125 2023-12-21 17:20:58,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=146560.0, ans=0.1 2023-12-21 17:21:02,457 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.04 vs. limit=6.0 2023-12-21 17:21:10,362 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.54 vs. limit=15.0 2023-12-21 17:21:21,921 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.177e+01 2.595e+01 2.753e+01 2.968e+01 3.866e+01, threshold=5.507e+01, percent-clipped=0.0 2023-12-21 17:21:25,877 INFO [train.py:886] (1/4) Epoch 5, batch 2950, loss[loss=0.01784, audio_tagging_loss=0.01784, over 24750.00 frames. ], tot_loss[loss=0.01648, audio_tagging_loss=0.01648, over 4933571.14 frames. ], batch size: 99, lr: 2.07e-02, grad_scale: 64.0 2023-12-21 17:21:28,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=146760.0, ans=0.04949747468305833 2023-12-21 17:21:41,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=146826.66666666666, ans=0.0 2023-12-21 17:21:50,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=146893.33333333334, ans=0.0 2023-12-21 17:21:52,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=146893.33333333334, ans=0.125 2023-12-21 17:21:56,594 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.11 vs. limit=6.0 2023-12-21 17:22:08,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.12 vs. limit=10.0 2023-12-21 17:22:18,287 INFO [train.py:886] (1/4) Epoch 5, batch 3000, loss[loss=0.01728, audio_tagging_loss=0.01728, over 25000.00 frames. ], tot_loss[loss=0.01642, audio_tagging_loss=0.01642, over 4940005.56 frames. ], batch size: 100, lr: 2.07e-02, grad_scale: 64.0 2023-12-21 17:22:18,287 INFO [train.py:909] (1/4) Computing validation loss 2023-12-21 17:22:25,449 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.0212, 1.6368, 2.2704, 1.6895, 1.8795, 2.2889, 2.1843, 2.0050], device='cuda:1') 2023-12-21 17:22:26,994 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8515, 2.7334, 2.7090, 2.6809], device='cuda:1') 2023-12-21 17:22:39,429 INFO [train.py:917] (1/4) Epoch 5, validation: loss=0.04009, audio_tagging_loss=0.04009, over 3737520.00 frames. 2023-12-21 17:22:39,430 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-21 17:22:50,704 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=15.74 vs. limit=15.0 2023-12-21 17:22:51,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=147160.0, ans=0.95 2023-12-21 17:23:14,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=147293.33333333334, ans=0.2 2023-12-21 17:23:14,419 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.82 vs. limit=15.0 2023-12-21 17:23:25,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=147360.0, ans=0.125 2023-12-21 17:23:27,623 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.222e+01 2.543e+01 2.712e+01 2.939e+01 3.335e+01, threshold=5.424e+01, percent-clipped=0.0 2023-12-21 17:23:31,451 INFO [train.py:886] (1/4) Epoch 5, batch 3050, loss[loss=0.01714, audio_tagging_loss=0.01714, over 25000.00 frames. ], tot_loss[loss=0.01637, audio_tagging_loss=0.01637, over 4942812.61 frames. ], batch size: 100, lr: 2.07e-02, grad_scale: 64.0 2023-12-21 17:24:03,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=147626.66666666666, ans=0.1 2023-12-21 17:24:12,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=147693.33333333334, ans=0.2 2023-12-21 17:24:23,426 INFO [train.py:886] (1/4) Epoch 5, batch 3100, loss[loss=0.01368, audio_tagging_loss=0.01368, over 25000.00 frames. ], tot_loss[loss=0.01651, audio_tagging_loss=0.01651, over 4952612.27 frames. ], batch size: 100, lr: 2.07e-02, grad_scale: 64.0 2023-12-21 17:24:24,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=147760.0, ans=0.1 2023-12-21 17:24:29,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=147760.0, ans=0.1 2023-12-21 17:24:32,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=147826.66666666666, ans=0.125 2023-12-21 17:24:40,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=147826.66666666666, ans=0.0 2023-12-21 17:24:51,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=147893.33333333334, ans=0.125 2023-12-21 17:24:54,287 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.63 vs. limit=22.5 2023-12-21 17:24:57,597 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=4.341e-02 2023-12-21 17:25:06,613 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.34 vs. limit=15.0 2023-12-21 17:25:09,745 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.055e+01 2.613e+01 2.780e+01 2.978e+01 3.396e+01, threshold=5.559e+01, percent-clipped=0.0 2023-12-21 17:25:13,596 INFO [train.py:886] (1/4) Epoch 5, batch 3150, loss[loss=0.01859, audio_tagging_loss=0.01859, over 24750.00 frames. ], tot_loss[loss=0.01667, audio_tagging_loss=0.01667, over 4949911.95 frames. ], batch size: 99, lr: 2.07e-02, grad_scale: 64.0 2023-12-21 17:25:16,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=148093.33333333334, ans=0.2 2023-12-21 17:25:17,809 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.00 vs. limit=15.0 2023-12-21 17:25:30,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=148160.0, ans=0.125 2023-12-21 17:25:41,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=148226.66666666666, ans=0.0 2023-12-21 17:25:42,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=148226.66666666666, ans=0.0 2023-12-21 17:25:48,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=148293.33333333334, ans=0.125 2023-12-21 17:25:49,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=148293.33333333334, ans=0.5 2023-12-21 17:25:55,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=148360.0, ans=0.0 2023-12-21 17:25:58,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=148360.0, ans=0.0 2023-12-21 17:26:05,981 INFO [train.py:886] (1/4) Epoch 5, batch 3200, loss[loss=0.01385, audio_tagging_loss=0.01385, over 24750.00 frames. ], tot_loss[loss=0.01668, audio_tagging_loss=0.01668, over 4946820.87 frames. ], batch size: 99, lr: 2.06e-02, grad_scale: 64.0 2023-12-21 17:26:36,692 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.39 vs. limit=15.0 2023-12-21 17:26:43,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=148626.66666666666, ans=0.125 2023-12-21 17:26:43,650 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.37 vs. limit=15.0 2023-12-21 17:26:52,667 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.138e+01 2.499e+01 2.673e+01 2.895e+01 4.175e+01, threshold=5.346e+01, percent-clipped=0.0 2023-12-21 17:26:53,030 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.52 vs. limit=22.5 2023-12-21 17:26:57,283 INFO [train.py:886] (1/4) Epoch 5, batch 3250, loss[loss=0.01452, audio_tagging_loss=0.01452, over 25000.00 frames. ], tot_loss[loss=0.01659, audio_tagging_loss=0.01659, over 4943976.72 frames. ], batch size: 100, lr: 2.06e-02, grad_scale: 64.0 2023-12-21 17:27:01,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=148760.0, ans=0.125 2023-12-21 17:27:08,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=148826.66666666666, ans=0.0 2023-12-21 17:27:37,872 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.87 vs. limit=12.0 2023-12-21 17:27:47,566 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.54 vs. limit=15.0 2023-12-21 17:27:48,887 INFO [train.py:886] (1/4) Epoch 5, batch 3300, loss[loss=0.01676, audio_tagging_loss=0.01676, over 25000.00 frames. ], tot_loss[loss=0.01649, audio_tagging_loss=0.01649, over 4950848.28 frames. ], batch size: 100, lr: 2.06e-02, grad_scale: 64.0 2023-12-21 17:27:53,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=149093.33333333334, ans=0.125 2023-12-21 17:27:53,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=149093.33333333334, ans=0.125 2023-12-21 17:27:54,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=149093.33333333334, ans=0.1 2023-12-21 17:28:08,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=149160.0, ans=0.0 2023-12-21 17:28:13,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=149226.66666666666, ans=0.1 2023-12-21 17:28:18,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=149226.66666666666, ans=0.125 2023-12-21 17:28:36,620 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.199e+01 2.590e+01 2.838e+01 3.066e+01 3.975e+01, threshold=5.677e+01, percent-clipped=0.0 2023-12-21 17:28:41,795 INFO [train.py:886] (1/4) Epoch 5, batch 3350, loss[loss=0.01304, audio_tagging_loss=0.01304, over 25000.00 frames. ], tot_loss[loss=0.01644, audio_tagging_loss=0.01644, over 4954419.37 frames. ], batch size: 100, lr: 2.06e-02, grad_scale: 64.0 2023-12-21 17:28:44,880 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=8.579e-01 2023-12-21 17:28:45,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=149426.66666666666, ans=0.125 2023-12-21 17:28:48,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=149426.66666666666, ans=0.1 2023-12-21 17:28:48,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=149426.66666666666, ans=0.0 2023-12-21 17:28:53,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=149493.33333333334, ans=0.2 2023-12-21 17:28:55,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=149493.33333333334, ans=0.125 2023-12-21 17:28:56,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=149493.33333333334, ans=0.125 2023-12-21 17:28:59,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=149493.33333333334, ans=0.015 2023-12-21 17:29:07,902 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.37 vs. limit=22.5 2023-12-21 17:29:18,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.99 vs. limit=15.0 2023-12-21 17:29:19,723 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.83 vs. limit=15.0 2023-12-21 17:29:31,399 INFO [train.py:886] (1/4) Epoch 5, batch 3400, loss[loss=0.01535, audio_tagging_loss=0.01535, over 25000.00 frames. ], tot_loss[loss=0.0165, audio_tagging_loss=0.0165, over 4954249.60 frames. ], batch size: 100, lr: 2.05e-02, grad_scale: 64.0 2023-12-21 17:29:39,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=149760.0, ans=0.0 2023-12-21 17:29:42,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=149826.66666666666, ans=0.0 2023-12-21 17:29:55,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=149893.33333333334, ans=0.0 2023-12-21 17:30:00,312 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.24 vs. limit=15.0 2023-12-21 17:30:10,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=149960.0, ans=0.125 2023-12-21 17:30:20,656 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.365e+01 2.600e+01 2.817e+01 3.077e+01 4.267e+01, threshold=5.633e+01, percent-clipped=0.0 2023-12-21 17:30:24,437 INFO [train.py:886] (1/4) Epoch 5, batch 3450, loss[loss=0.01748, audio_tagging_loss=0.01748, over 24750.00 frames. ], tot_loss[loss=0.01663, audio_tagging_loss=0.01663, over 4949476.09 frames. ], batch size: 99, lr: 2.05e-02, grad_scale: 64.0 2023-12-21 17:30:28,582 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.71 vs. limit=10.0 2023-12-21 17:30:32,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=150093.33333333334, ans=0.1 2023-12-21 17:30:40,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=150160.0, ans=0.125 2023-12-21 17:30:41,627 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.79 vs. limit=15.0 2023-12-21 17:31:00,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=150293.33333333334, ans=0.0 2023-12-21 17:31:01,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=150293.33333333334, ans=0.1 2023-12-21 17:31:01,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=150293.33333333334, ans=0.1 2023-12-21 17:31:11,740 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.40 vs. limit=15.0 2023-12-21 17:31:15,701 INFO [train.py:886] (1/4) Epoch 5, batch 3500, loss[loss=0.01551, audio_tagging_loss=0.01551, over 24750.00 frames. ], tot_loss[loss=0.01665, audio_tagging_loss=0.01665, over 4946564.98 frames. ], batch size: 99, lr: 2.05e-02, grad_scale: 64.0 2023-12-21 17:31:42,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=150560.0, ans=0.0 2023-12-21 17:31:44,077 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.96 vs. limit=10.0 2023-12-21 17:32:01,236 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.53 vs. limit=12.0 2023-12-21 17:32:03,513 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.549e+01 2.777e+01 3.022e+01 4.198e+01, threshold=5.554e+01, percent-clipped=0.0 2023-12-21 17:32:04,998 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.34 vs. limit=22.5 2023-12-21 17:32:07,372 INFO [train.py:886] (1/4) Epoch 5, batch 3550, loss[loss=0.01545, audio_tagging_loss=0.01545, over 25000.00 frames. ], tot_loss[loss=0.01666, audio_tagging_loss=0.01666, over 4948399.77 frames. ], batch size: 100, lr: 2.05e-02, grad_scale: 64.0 2023-12-21 17:32:19,034 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.59 vs. limit=22.5 2023-12-21 17:32:22,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=150826.66666666666, ans=0.125 2023-12-21 17:32:24,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.07 vs. limit=15.0 2023-12-21 17:32:41,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=150960.0, ans=0.1 2023-12-21 17:32:55,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=151026.66666666666, ans=0.5 2023-12-21 17:32:59,025 INFO [train.py:886] (1/4) Epoch 5, batch 3600, loss[loss=0.01658, audio_tagging_loss=0.01658, over 25000.00 frames. ], tot_loss[loss=0.01652, audio_tagging_loss=0.01652, over 4945015.04 frames. ], batch size: 100, lr: 2.05e-02, grad_scale: 64.0 2023-12-21 17:33:02,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=151093.33333333334, ans=0.0 2023-12-21 17:33:04,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=151093.33333333334, ans=0.125 2023-12-21 17:33:07,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=151093.33333333334, ans=0.1 2023-12-21 17:33:19,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=151226.66666666666, ans=0.125 2023-12-21 17:33:23,578 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.87 vs. limit=6.0 2023-12-21 17:33:32,411 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=6.199e-01 2023-12-21 17:33:39,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=151360.0, ans=0.125 2023-12-21 17:33:46,185 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.142e+01 2.518e+01 2.646e+01 2.841e+01 3.411e+01, threshold=5.291e+01, percent-clipped=0.0 2023-12-21 17:33:49,958 INFO [train.py:886] (1/4) Epoch 5, batch 3650, loss[loss=0.01578, audio_tagging_loss=0.01578, over 25000.00 frames. ], tot_loss[loss=0.01646, audio_tagging_loss=0.01646, over 4951896.67 frames. ], batch size: 100, lr: 2.04e-02, grad_scale: 64.0 2023-12-21 17:33:56,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=14.17 vs. limit=15.0 2023-12-21 17:34:03,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.61 vs. limit=6.0 2023-12-21 17:34:03,980 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.22 vs. limit=22.5 2023-12-21 17:34:11,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=151560.0, ans=0.125 2023-12-21 17:34:17,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=151560.0, ans=0.0 2023-12-21 17:34:43,000 INFO [train.py:886] (1/4) Epoch 5, batch 3700, loss[loss=0.01581, audio_tagging_loss=0.01581, over 24750.00 frames. ], tot_loss[loss=0.01645, audio_tagging_loss=0.01645, over 4956103.71 frames. ], batch size: 99, lr: 2.04e-02, grad_scale: 64.0 2023-12-21 17:34:52,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=151826.66666666666, ans=0.125 2023-12-21 17:35:13,080 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.85 vs. limit=22.5 2023-12-21 17:35:13,284 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.91 vs. limit=6.0 2023-12-21 17:35:26,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=152026.66666666666, ans=0.125 2023-12-21 17:35:26,414 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.50 vs. limit=15.0 2023-12-21 17:35:30,108 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.307e+01 2.577e+01 2.848e+01 3.103e+01 4.088e+01, threshold=5.696e+01, percent-clipped=0.0 2023-12-21 17:35:30,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=152026.66666666666, ans=0.125 2023-12-21 17:35:34,628 INFO [train.py:886] (1/4) Epoch 5, batch 3750, loss[loss=0.01975, audio_tagging_loss=0.01975, over 24750.00 frames. ], tot_loss[loss=0.01661, audio_tagging_loss=0.01661, over 4956737.30 frames. ], batch size: 99, lr: 2.04e-02, grad_scale: 64.0 2023-12-21 17:35:38,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=152093.33333333334, ans=0.1 2023-12-21 17:35:42,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.50 vs. limit=22.5 2023-12-21 17:35:49,934 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.88 vs. limit=6.0 2023-12-21 17:36:13,473 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.356e-02 2023-12-21 17:36:18,447 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.82 vs. limit=15.0 2023-12-21 17:36:26,441 INFO [train.py:886] (1/4) Epoch 5, batch 3800, loss[loss=0.01543, audio_tagging_loss=0.01543, over 24750.00 frames. ], tot_loss[loss=0.01679, audio_tagging_loss=0.01679, over 4949025.87 frames. ], batch size: 99, lr: 2.04e-02, grad_scale: 64.0 2023-12-21 17:36:48,586 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.67 vs. limit=22.5 2023-12-21 17:37:04,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=152626.66666666666, ans=0.2 2023-12-21 17:37:06,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=152626.66666666666, ans=0.125 2023-12-21 17:37:14,680 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.131e+01 2.532e+01 2.751e+01 2.984e+01 4.150e+01, threshold=5.501e+01, percent-clipped=0.0 2023-12-21 17:37:18,459 INFO [train.py:886] (1/4) Epoch 5, batch 3850, loss[loss=0.01667, audio_tagging_loss=0.01667, over 24750.00 frames. ], tot_loss[loss=0.01667, audio_tagging_loss=0.01667, over 4945274.33 frames. ], batch size: 99, lr: 2.04e-02, grad_scale: 64.0 2023-12-21 17:37:27,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=152826.66666666666, ans=0.125 2023-12-21 17:37:44,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=152893.33333333334, ans=0.0 2023-12-21 17:37:57,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=152960.0, ans=0.0 2023-12-21 17:38:08,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=153026.66666666666, ans=0.125 2023-12-21 17:38:11,306 INFO [train.py:886] (1/4) Epoch 5, batch 3900, loss[loss=0.01456, audio_tagging_loss=0.01456, over 25000.00 frames. ], tot_loss[loss=0.01653, audio_tagging_loss=0.01653, over 4948231.65 frames. ], batch size: 100, lr: 2.03e-02, grad_scale: 64.0 2023-12-21 17:38:32,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=153226.66666666666, ans=0.04949747468305833 2023-12-21 17:38:40,059 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 17:38:40,471 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.23 vs. limit=15.0 2023-12-21 17:38:46,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=153293.33333333334, ans=0.125 2023-12-21 17:38:58,378 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.170e+01 2.553e+01 2.751e+01 2.942e+01 3.918e+01, threshold=5.503e+01, percent-clipped=0.0 2023-12-21 17:39:00,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=153360.0, ans=0.1 2023-12-21 17:39:02,242 INFO [train.py:886] (1/4) Epoch 5, batch 3950, loss[loss=0.01485, audio_tagging_loss=0.01485, over 25000.00 frames. ], tot_loss[loss=0.01651, audio_tagging_loss=0.01651, over 4943615.37 frames. ], batch size: 100, lr: 2.03e-02, grad_scale: 64.0 2023-12-21 17:39:21,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=153493.33333333334, ans=0.025 2023-12-21 17:39:21,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=153493.33333333334, ans=0.125 2023-12-21 17:39:32,413 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.86 vs. limit=15.0 2023-12-21 17:39:34,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=153626.66666666666, ans=0.125 2023-12-21 17:39:55,698 INFO [train.py:886] (1/4) Epoch 5, batch 4000, loss[loss=0.01758, audio_tagging_loss=0.01758, over 22127.00 frames. ], tot_loss[loss=0.01654, audio_tagging_loss=0.01654, over 4948273.67 frames. ], batch size: 107, lr: 2.03e-02, grad_scale: 128.0 2023-12-21 17:39:58,179 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.75 vs. limit=12.0 2023-12-21 17:39:59,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=153760.0, ans=6.0 2023-12-21 17:40:01,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=153760.0, ans=0.2 2023-12-21 17:40:07,593 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.96 vs. limit=10.0 2023-12-21 17:40:20,127 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.75 vs. limit=15.0 2023-12-21 17:40:21,428 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.55 vs. limit=15.0 2023-12-21 17:40:42,023 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.284e+01 2.602e+01 2.728e+01 2.900e+01 3.775e+01, threshold=5.457e+01, percent-clipped=0.0 2023-12-21 17:40:46,629 INFO [train.py:886] (1/4) Epoch 5, batch 4050, loss[loss=0.01499, audio_tagging_loss=0.01499, over 24750.00 frames. ], tot_loss[loss=0.0165, audio_tagging_loss=0.0165, over 4954041.08 frames. ], batch size: 99, lr: 2.03e-02, grad_scale: 128.0 2023-12-21 17:40:59,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=154160.0, ans=0.125 2023-12-21 17:41:03,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=154160.0, ans=0.125 2023-12-21 17:41:06,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=154226.66666666666, ans=0.09899494936611666 2023-12-21 17:41:09,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=154226.66666666666, ans=0.125 2023-12-21 17:41:11,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=154226.66666666666, ans=15.0 2023-12-21 17:41:33,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=154360.0, ans=0.125 2023-12-21 17:41:38,558 INFO [train.py:886] (1/4) Epoch 5, batch 4100, loss[loss=0.0167, audio_tagging_loss=0.0167, over 24750.00 frames. ], tot_loss[loss=0.01665, audio_tagging_loss=0.01665, over 4953477.06 frames. ], batch size: 99, lr: 2.03e-02, grad_scale: 128.0 2023-12-21 17:41:38,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=154426.66666666666, ans=0.2 2023-12-21 17:41:38,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=154426.66666666666, ans=0.0 2023-12-21 17:41:42,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=154426.66666666666, ans=0.1 2023-12-21 17:41:42,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=154426.66666666666, ans=0.125 2023-12-21 17:41:47,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=154493.33333333334, ans=0.0 2023-12-21 17:41:47,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=154493.33333333334, ans=0.125 2023-12-21 17:41:59,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=154560.0, ans=0.2 2023-12-21 17:42:01,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=154560.0, ans=0.125 2023-12-21 17:42:18,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=154693.33333333334, ans=0.125 2023-12-21 17:42:22,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=154693.33333333334, ans=0.0 2023-12-21 17:42:26,229 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.535e+01 2.761e+01 2.990e+01 3.473e+01, threshold=5.523e+01, percent-clipped=0.0 2023-12-21 17:42:30,734 INFO [train.py:886] (1/4) Epoch 5, batch 4150, loss[loss=0.01465, audio_tagging_loss=0.01465, over 24750.00 frames. ], tot_loss[loss=0.0167, audio_tagging_loss=0.0167, over 4950801.44 frames. ], batch size: 99, lr: 2.02e-02, grad_scale: 128.0 2023-12-21 17:42:46,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=154826.66666666666, ans=0.1 2023-12-21 17:42:48,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=154826.66666666666, ans=0.1 2023-12-21 17:43:21,615 INFO [train.py:886] (1/4) Epoch 5, batch 4200, loss[loss=0.01503, audio_tagging_loss=0.01503, over 24096.00 frames. ], tot_loss[loss=0.01652, audio_tagging_loss=0.01652, over 4955635.15 frames. ], batch size: 100, lr: 2.02e-02, grad_scale: 128.0 2023-12-21 17:43:31,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=155160.0, ans=0.2 2023-12-21 17:43:36,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=155160.0, ans=0.0 2023-12-21 17:43:42,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=155226.66666666666, ans=0.0 2023-12-21 17:43:56,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=155293.33333333334, ans=0.0 2023-12-21 17:44:09,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=155360.0, ans=0.0 2023-12-21 17:44:09,864 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 2.602e+01 2.757e+01 3.020e+01 3.961e+01, threshold=5.514e+01, percent-clipped=0.0 2023-12-21 17:44:13,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=155426.66666666666, ans=0.125 2023-12-21 17:44:13,610 INFO [train.py:886] (1/4) Epoch 5, batch 4250, loss[loss=0.01493, audio_tagging_loss=0.01493, over 25000.00 frames. ], tot_loss[loss=0.01649, audio_tagging_loss=0.01649, over 4945762.41 frames. ], batch size: 100, lr: 2.02e-02, grad_scale: 128.0 2023-12-21 17:44:31,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=155493.33333333334, ans=0.125 2023-12-21 17:44:36,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=155560.0, ans=0.125 2023-12-21 17:44:39,517 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.133e+01 2023-12-21 17:44:39,791 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.48 vs. limit=15.0 2023-12-21 17:44:40,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=155560.0, ans=0.125 2023-12-21 17:44:40,692 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.25 vs. limit=10.0 2023-12-21 17:44:45,354 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.52 vs. limit=6.0 2023-12-21 17:44:51,154 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=27.26 vs. limit=22.5 2023-12-21 17:44:54,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=155693.33333333334, ans=0.0 2023-12-21 17:45:00,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=155693.33333333334, ans=0.1 2023-12-21 17:45:01,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=155693.33333333334, ans=0.125 2023-12-21 17:45:03,644 INFO [train.py:886] (1/4) Epoch 5, batch 4300, loss[loss=0.01492, audio_tagging_loss=0.01492, over 24023.00 frames. ], tot_loss[loss=0.01643, audio_tagging_loss=0.01643, over 4942419.23 frames. ], batch size: 100, lr: 2.02e-02, grad_scale: 128.0 2023-12-21 17:45:04,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=155760.0, ans=0.1 2023-12-21 17:45:15,723 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.34 vs. limit=22.5 2023-12-21 17:45:29,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=155893.33333333334, ans=0.125 2023-12-21 17:45:36,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=155960.0, ans=0.1 2023-12-21 17:45:53,523 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.195e+01 2.657e+01 2.804e+01 3.021e+01 3.869e+01, threshold=5.607e+01, percent-clipped=0.0 2023-12-21 17:45:56,396 INFO [train.py:886] (1/4) Epoch 5, batch 4350, loss[loss=0.01938, audio_tagging_loss=0.01938, over 25000.00 frames. ], tot_loss[loss=0.0165, audio_tagging_loss=0.0165, over 4942064.25 frames. ], batch size: 100, lr: 2.02e-02, grad_scale: 64.0 2023-12-21 17:45:59,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=156093.33333333334, ans=0.125 2023-12-21 17:46:28,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=156293.33333333334, ans=0.125 2023-12-21 17:46:31,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=156293.33333333334, ans=15.0 2023-12-21 17:46:31,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=156293.33333333334, ans=0.125 2023-12-21 17:46:46,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=156360.0, ans=0.0 2023-12-21 17:46:48,704 INFO [train.py:886] (1/4) Epoch 5, batch 4400, loss[loss=0.02128, audio_tagging_loss=0.02128, over 24750.00 frames. ], tot_loss[loss=0.01667, audio_tagging_loss=0.01667, over 4938285.10 frames. ], batch size: 99, lr: 2.01e-02, grad_scale: 64.0 2023-12-21 17:47:01,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=156493.33333333334, ans=0.125 2023-12-21 17:47:29,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=156693.33333333334, ans=0.125 2023-12-21 17:47:31,973 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.85 vs. limit=15.0 2023-12-21 17:47:32,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=156693.33333333334, ans=0.0 2023-12-21 17:47:34,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=156693.33333333334, ans=0.0 2023-12-21 17:47:35,931 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.224e+01 2.647e+01 2.808e+01 3.114e+01 3.579e+01, threshold=5.615e+01, percent-clipped=0.0 2023-12-21 17:47:38,844 INFO [train.py:886] (1/4) Epoch 5, batch 4450, loss[loss=0.01481, audio_tagging_loss=0.01481, over 24750.00 frames. ], tot_loss[loss=0.01667, audio_tagging_loss=0.01667, over 4931539.79 frames. ], batch size: 99, lr: 2.01e-02, grad_scale: 64.0 2023-12-21 17:47:42,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=156760.0, ans=0.1 2023-12-21 17:47:59,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.42 vs. limit=12.0 2023-12-21 17:48:08,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=156893.33333333334, ans=0.0 2023-12-21 17:48:10,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=156960.0, ans=0.0 2023-12-21 17:48:12,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=156960.0, ans=0.125 2023-12-21 17:48:16,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=156960.0, ans=0.0 2023-12-21 17:48:18,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=156960.0, ans=0.0 2023-12-21 17:48:18,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=156960.0, ans=0.5 2023-12-21 17:48:31,773 INFO [train.py:886] (1/4) Epoch 5, batch 4500, loss[loss=0.01846, audio_tagging_loss=0.01846, over 24750.00 frames. ], tot_loss[loss=0.01649, audio_tagging_loss=0.01649, over 4936637.80 frames. ], batch size: 99, lr: 2.01e-02, grad_scale: 64.0 2023-12-21 17:48:32,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=157093.33333333334, ans=0.0 2023-12-21 17:48:39,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=157093.33333333334, ans=0.2 2023-12-21 17:49:04,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=157293.33333333334, ans=0.2 2023-12-21 17:49:10,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=157293.33333333334, ans=0.0 2023-12-21 17:49:12,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=157360.0, ans=0.2 2023-12-21 17:49:18,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=157360.0, ans=0.125 2023-12-21 17:49:20,479 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.304e+01 2.535e+01 2.688e+01 2.916e+01 3.475e+01, threshold=5.375e+01, percent-clipped=0.0 2023-12-21 17:49:20,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=157360.0, ans=0.0 2023-12-21 17:49:22,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=157426.66666666666, ans=0.1 2023-12-21 17:49:23,261 INFO [train.py:886] (1/4) Epoch 5, batch 4550, loss[loss=0.01725, audio_tagging_loss=0.01725, over 24750.00 frames. ], tot_loss[loss=0.01645, audio_tagging_loss=0.01645, over 4934563.28 frames. ], batch size: 99, lr: 2.01e-02, grad_scale: 64.0 2023-12-21 17:49:36,313 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.99 vs. limit=15.0 2023-12-21 17:49:55,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=157626.66666666666, ans=0.0 2023-12-21 17:50:02,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=157626.66666666666, ans=0.0 2023-12-21 17:50:11,155 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.68 vs. limit=15.0 2023-12-21 17:50:15,392 INFO [train.py:886] (1/4) Epoch 5, batch 4600, loss[loss=0.01784, audio_tagging_loss=0.01784, over 25000.00 frames. ], tot_loss[loss=0.01658, audio_tagging_loss=0.01658, over 4942351.81 frames. ], batch size: 100, lr: 2.01e-02, grad_scale: 64.0 2023-12-21 17:50:18,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=157760.0, ans=0.1 2023-12-21 17:50:20,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=157760.0, ans=0.0 2023-12-21 17:50:25,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=157826.66666666666, ans=0.0 2023-12-21 17:50:26,395 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2023-12-21 17:50:28,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=157826.66666666666, ans=0.0 2023-12-21 17:50:28,358 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.47 vs. limit=15.0 2023-12-21 17:50:29,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=157826.66666666666, ans=0.1 2023-12-21 17:50:44,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=157893.33333333334, ans=0.1 2023-12-21 17:50:59,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=158026.66666666666, ans=0.125 2023-12-21 17:51:00,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=158026.66666666666, ans=0.0 2023-12-21 17:51:04,727 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 2.622e+01 2.861e+01 3.041e+01 4.016e+01, threshold=5.722e+01, percent-clipped=0.0 2023-12-21 17:51:08,382 INFO [train.py:886] (1/4) Epoch 5, batch 4650, loss[loss=0.01611, audio_tagging_loss=0.01611, over 25000.00 frames. ], tot_loss[loss=0.01664, audio_tagging_loss=0.01664, over 4948678.94 frames. ], batch size: 100, lr: 2.01e-02, grad_scale: 64.0 2023-12-21 17:51:12,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=158093.33333333334, ans=0.0 2023-12-21 17:51:22,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=158160.0, ans=0.125 2023-12-21 17:51:36,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=158226.66666666666, ans=0.1 2023-12-21 17:51:58,922 INFO [train.py:886] (1/4) Epoch 5, batch 4700, loss[loss=0.01741, audio_tagging_loss=0.01741, over 24750.00 frames. ], tot_loss[loss=0.01669, audio_tagging_loss=0.01669, over 4945158.97 frames. ], batch size: 99, lr: 2.00e-02, grad_scale: 64.0 2023-12-21 17:52:19,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=158560.0, ans=0.0 2023-12-21 17:52:19,931 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.25 vs. limit=15.0 2023-12-21 17:52:29,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=158626.66666666666, ans=0.95 2023-12-21 17:52:43,576 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.248e+01 2.632e+01 2.850e+01 3.066e+01 3.768e+01, threshold=5.700e+01, percent-clipped=0.0 2023-12-21 17:52:46,344 INFO [train.py:886] (1/4) Epoch 5, batch 4750, loss[loss=0.01857, audio_tagging_loss=0.01857, over 25000.00 frames. ], tot_loss[loss=0.01682, audio_tagging_loss=0.01682, over 4946793.28 frames. ], batch size: 100, lr: 2.00e-02, grad_scale: 64.0 2023-12-21 17:52:53,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=158760.0, ans=0.2 2023-12-21 17:52:57,259 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.90 vs. limit=6.0 2023-12-21 17:53:24,107 INFO [train.py:886] (1/4) Epoch 6, batch 0, loss[loss=0.04036, audio_tagging_loss=0.04036, over 24035.00 frames. ], tot_loss[loss=0.04036, audio_tagging_loss=0.04036, over 24035.00 frames. ], batch size: 100, lr: 1.87e-02, grad_scale: 64.0 2023-12-21 17:53:24,107 INFO [train.py:909] (1/4) Computing validation loss 2023-12-21 17:53:45,410 INFO [train.py:917] (1/4) Epoch 6, validation: loss=0.03649, audio_tagging_loss=0.03649, over 3737520.00 frames. 2023-12-21 17:53:45,411 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-21 17:53:53,228 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.84 vs. limit=15.0 2023-12-21 17:54:02,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=158933.33333333334, ans=0.125 2023-12-21 17:54:12,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=159000.0, ans=0.95 2023-12-21 17:54:27,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=159133.33333333334, ans=0.125 2023-12-21 17:54:30,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=159133.33333333334, ans=0.125 2023-12-21 17:54:31,661 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.34 vs. limit=15.0 2023-12-21 17:54:32,646 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.37 vs. limit=15.0 2023-12-21 17:54:36,666 INFO [train.py:886] (1/4) Epoch 6, batch 50, loss[loss=0.0211, audio_tagging_loss=0.0211, over 25000.00 frames. ], tot_loss[loss=0.02597, audio_tagging_loss=0.02597, over 1126431.70 frames. ], batch size: 100, lr: 1.87e-02, grad_scale: 64.0 2023-12-21 17:54:47,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=159266.66666666666, ans=0.2 2023-12-21 17:54:52,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=159266.66666666666, ans=0.125 2023-12-21 17:55:05,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=159333.33333333334, ans=0.2 2023-12-21 17:55:08,011 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 3.015e+01 3.367e+01 3.698e+01 8.619e+01, threshold=6.734e+01, percent-clipped=4.0 2023-12-21 17:55:09,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=159400.0, ans=0.0 2023-12-21 17:55:10,240 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.56 vs. limit=10.0 2023-12-21 17:55:12,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=159400.0, ans=0.125 2023-12-21 17:55:20,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=159466.66666666666, ans=0.125 2023-12-21 17:55:24,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=159466.66666666666, ans=0.5 2023-12-21 17:55:28,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=159533.33333333334, ans=0.0 2023-12-21 17:55:28,802 INFO [train.py:886] (1/4) Epoch 6, batch 100, loss[loss=0.01795, audio_tagging_loss=0.01795, over 25000.00 frames. ], tot_loss[loss=0.02272, audio_tagging_loss=0.02272, over 1979247.91 frames. ], batch size: 100, lr: 1.86e-02, grad_scale: 64.0 2023-12-21 17:55:39,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.23 vs. limit=15.0 2023-12-21 17:55:49,045 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.30 vs. limit=22.5 2023-12-21 17:55:51,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.34 vs. limit=15.0 2023-12-21 17:55:52,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=159666.66666666666, ans=0.125 2023-12-21 17:55:58,099 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.29 vs. limit=15.0 2023-12-21 17:56:02,614 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.17 vs. limit=22.5 2023-12-21 17:56:12,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=159800.0, ans=0.125 2023-12-21 17:56:15,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=159800.0, ans=10.0 2023-12-21 17:56:19,829 INFO [train.py:886] (1/4) Epoch 6, batch 150, loss[loss=0.0171, audio_tagging_loss=0.0171, over 25000.00 frames. ], tot_loss[loss=0.02053, audio_tagging_loss=0.02053, over 2642910.68 frames. ], batch size: 100, lr: 1.86e-02, grad_scale: 64.0 2023-12-21 17:56:26,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=159866.66666666666, ans=0.125 2023-12-21 17:56:33,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=159933.33333333334, ans=0.125 2023-12-21 17:56:54,053 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.183e+01 2.731e+01 2.909e+01 3.115e+01 3.553e+01, threshold=5.819e+01, percent-clipped=0.0 2023-12-21 17:57:14,135 INFO [train.py:886] (1/4) Epoch 6, batch 200, loss[loss=0.01831, audio_tagging_loss=0.01831, over 25000.00 frames. ], tot_loss[loss=0.01933, audio_tagging_loss=0.01933, over 3155711.17 frames. ], batch size: 100, lr: 1.86e-02, grad_scale: 64.0 2023-12-21 17:57:20,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=160200.0, ans=0.035 2023-12-21 17:57:22,085 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.81 vs. limit=15.0 2023-12-21 17:57:34,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=160333.33333333334, ans=0.0 2023-12-21 17:57:44,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=160400.0, ans=0.2 2023-12-21 17:58:01,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=160466.66666666666, ans=0.02 2023-12-21 17:58:05,845 INFO [train.py:886] (1/4) Epoch 6, batch 250, loss[loss=0.01722, audio_tagging_loss=0.01722, over 25000.00 frames. ], tot_loss[loss=0.01847, audio_tagging_loss=0.01847, over 3552780.92 frames. ], batch size: 100, lr: 1.86e-02, grad_scale: 64.0 2023-12-21 17:58:09,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=160533.33333333334, ans=0.1 2023-12-21 17:58:15,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=160600.0, ans=0.125 2023-12-21 17:58:17,342 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.60 vs. limit=15.0 2023-12-21 17:58:27,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=160666.66666666666, ans=0.125 2023-12-21 17:58:29,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=160666.66666666666, ans=0.125 2023-12-21 17:58:37,629 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.234e+01 2.574e+01 2.757e+01 2.978e+01 3.329e+01, threshold=5.514e+01, percent-clipped=0.0 2023-12-21 17:58:44,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=160733.33333333334, ans=0.04949747468305833 2023-12-21 17:58:50,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=160800.0, ans=0.2 2023-12-21 17:58:55,709 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.89 vs. limit=15.0 2023-12-21 17:58:55,750 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.73 vs. limit=15.0 2023-12-21 17:58:56,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=160866.66666666666, ans=0.1 2023-12-21 17:58:57,047 INFO [train.py:886] (1/4) Epoch 6, batch 300, loss[loss=0.01622, audio_tagging_loss=0.01622, over 24750.00 frames. ], tot_loss[loss=0.01801, audio_tagging_loss=0.01801, over 3858174.31 frames. ], batch size: 99, lr: 1.86e-02, grad_scale: 64.0 2023-12-21 17:59:02,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=160866.66666666666, ans=0.125 2023-12-21 17:59:02,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=160866.66666666666, ans=0.125 2023-12-21 17:59:12,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=160933.33333333334, ans=0.125 2023-12-21 17:59:33,126 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.62 vs. limit=10.0 2023-12-21 17:59:49,341 INFO [train.py:886] (1/4) Epoch 6, batch 350, loss[loss=0.01486, audio_tagging_loss=0.01486, over 24750.00 frames. ], tot_loss[loss=0.01768, audio_tagging_loss=0.01768, over 4089190.38 frames. ], batch size: 99, lr: 1.85e-02, grad_scale: 64.0 2023-12-21 17:59:49,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=161200.0, ans=0.0 2023-12-21 17:59:53,692 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.12 vs. limit=15.0 2023-12-21 17:59:55,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=161200.0, ans=0.0 2023-12-21 18:00:03,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=161266.66666666666, ans=0.0 2023-12-21 18:00:07,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=161266.66666666666, ans=0.125 2023-12-21 18:00:19,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=161400.0, ans=0.1 2023-12-21 18:00:21,394 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.225e+01 2.561e+01 2.750e+01 3.002e+01 3.673e+01, threshold=5.501e+01, percent-clipped=0.0 2023-12-21 18:00:26,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=161400.0, ans=0.125 2023-12-21 18:00:38,649 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.94 vs. limit=22.5 2023-12-21 18:00:40,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=161533.33333333334, ans=0.0 2023-12-21 18:00:40,955 INFO [train.py:886] (1/4) Epoch 6, batch 400, loss[loss=0.01286, audio_tagging_loss=0.01286, over 25000.00 frames. ], tot_loss[loss=0.01739, audio_tagging_loss=0.01739, over 4279846.44 frames. ], batch size: 100, lr: 1.85e-02, grad_scale: 64.0 2023-12-21 18:00:47,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=161533.33333333334, ans=0.07 2023-12-21 18:00:50,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=161533.33333333334, ans=0.2 2023-12-21 18:00:55,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=161600.0, ans=12.0 2023-12-21 18:00:55,133 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.65 vs. limit=12.0 2023-12-21 18:01:02,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=161666.66666666666, ans=0.125 2023-12-21 18:01:12,633 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.61 vs. limit=15.0 2023-12-21 18:01:12,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.82 vs. limit=22.5 2023-12-21 18:01:16,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=161733.33333333334, ans=0.0 2023-12-21 18:01:22,497 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.24 vs. limit=15.0 2023-12-21 18:01:32,975 INFO [train.py:886] (1/4) Epoch 6, batch 450, loss[loss=0.01624, audio_tagging_loss=0.01624, over 25000.00 frames. ], tot_loss[loss=0.0171, audio_tagging_loss=0.0171, over 4427936.31 frames. ], batch size: 100, lr: 1.85e-02, grad_scale: 64.0 2023-12-21 18:01:33,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=161866.66666666666, ans=0.95 2023-12-21 18:02:04,635 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.131e+01 2.486e+01 2.731e+01 2.952e+01 3.646e+01, threshold=5.463e+01, percent-clipped=0.0 2023-12-21 18:02:07,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=162066.66666666666, ans=0.2 2023-12-21 18:02:25,552 INFO [train.py:886] (1/4) Epoch 6, batch 500, loss[loss=0.01441, audio_tagging_loss=0.01441, over 25000.00 frames. ], tot_loss[loss=0.01682, audio_tagging_loss=0.01682, over 4543229.39 frames. ], batch size: 100, lr: 1.85e-02, grad_scale: 64.0 2023-12-21 18:02:44,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=162266.66666666666, ans=0.0 2023-12-21 18:02:46,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=162333.33333333334, ans=0.125 2023-12-21 18:02:56,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=162400.0, ans=0.0 2023-12-21 18:02:57,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=162400.0, ans=0.125 2023-12-21 18:03:01,136 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.24 vs. limit=15.0 2023-12-21 18:03:17,204 INFO [train.py:886] (1/4) Epoch 6, batch 550, loss[loss=0.01865, audio_tagging_loss=0.01865, over 25000.00 frames. ], tot_loss[loss=0.01677, audio_tagging_loss=0.01677, over 4636666.00 frames. ], batch size: 100, lr: 1.85e-02, grad_scale: 64.0 2023-12-21 18:03:32,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=162600.0, ans=0.125 2023-12-21 18:03:40,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=162666.66666666666, ans=0.0 2023-12-21 18:03:49,379 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.251e+01 2.522e+01 2.669e+01 2.950e+01 3.849e+01, threshold=5.338e+01, percent-clipped=0.0 2023-12-21 18:04:08,693 INFO [train.py:886] (1/4) Epoch 6, batch 600, loss[loss=0.01936, audio_tagging_loss=0.01936, over 24750.00 frames. ], tot_loss[loss=0.01684, audio_tagging_loss=0.01684, over 4709184.45 frames. ], batch size: 99, lr: 1.85e-02, grad_scale: 64.0 2023-12-21 18:04:18,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=162933.33333333334, ans=0.125 2023-12-21 18:04:23,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=162933.33333333334, ans=0.125 2023-12-21 18:04:44,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=163066.66666666666, ans=0.2 2023-12-21 18:04:58,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=163133.33333333334, ans=22.5 2023-12-21 18:04:59,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=163200.0, ans=0.125 2023-12-21 18:05:01,100 INFO [train.py:886] (1/4) Epoch 6, batch 650, loss[loss=0.01695, audio_tagging_loss=0.01695, over 24750.00 frames. ], tot_loss[loss=0.01677, audio_tagging_loss=0.01677, over 4762641.06 frames. ], batch size: 99, lr: 1.84e-02, grad_scale: 64.0 2023-12-21 18:05:06,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=163200.0, ans=0.0 2023-12-21 18:05:13,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=163266.66666666666, ans=0.125 2023-12-21 18:05:22,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=163333.33333333334, ans=0.1 2023-12-21 18:05:27,111 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.02 vs. limit=15.0 2023-12-21 18:05:31,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=163400.0, ans=0.1 2023-12-21 18:05:33,145 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.270e+01 2.590e+01 2.810e+01 2.971e+01 4.076e+01, threshold=5.620e+01, percent-clipped=0.0 2023-12-21 18:05:35,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=163400.0, ans=0.2 2023-12-21 18:05:52,690 INFO [train.py:886] (1/4) Epoch 6, batch 700, loss[loss=0.01602, audio_tagging_loss=0.01602, over 24750.00 frames. ], tot_loss[loss=0.01671, audio_tagging_loss=0.01671, over 4801754.72 frames. ], batch size: 99, lr: 1.84e-02, grad_scale: 64.0 2023-12-21 18:05:54,091 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.20 vs. limit=15.0 2023-12-21 18:05:58,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2023-12-21 18:06:00,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=163533.33333333334, ans=0.125 2023-12-21 18:06:13,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=163666.66666666666, ans=0.0 2023-12-21 18:06:17,902 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.06 vs. limit=15.0 2023-12-21 18:06:29,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.19 vs. limit=15.0 2023-12-21 18:06:44,770 INFO [train.py:886] (1/4) Epoch 6, batch 750, loss[loss=0.01497, audio_tagging_loss=0.01497, over 24750.00 frames. ], tot_loss[loss=0.0165, audio_tagging_loss=0.0165, over 4835708.29 frames. ], batch size: 99, lr: 1.84e-02, grad_scale: 64.0 2023-12-21 18:06:46,356 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.76 vs. limit=15.0 2023-12-21 18:06:47,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.77 vs. limit=15.0 2023-12-21 18:06:47,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=163866.66666666666, ans=0.05 2023-12-21 18:07:01,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=163933.33333333334, ans=0.125 2023-12-21 18:07:17,164 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.253e+01 2.591e+01 2.731e+01 2.911e+01 3.574e+01, threshold=5.461e+01, percent-clipped=0.0 2023-12-21 18:07:17,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=164066.66666666666, ans=0.1 2023-12-21 18:07:22,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.09 vs. limit=10.0 2023-12-21 18:07:35,945 INFO [train.py:886] (1/4) Epoch 6, batch 800, loss[loss=0.01794, audio_tagging_loss=0.01794, over 25000.00 frames. ], tot_loss[loss=0.01646, audio_tagging_loss=0.01646, over 4864228.36 frames. ], batch size: 100, lr: 1.84e-02, grad_scale: 64.0 2023-12-21 18:07:40,881 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.19 vs. limit=15.0 2023-12-21 18:07:44,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=164200.0, ans=0.125 2023-12-21 18:07:47,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=164266.66666666666, ans=0.2 2023-12-21 18:08:05,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=164400.0, ans=0.125 2023-12-21 18:08:18,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=164466.66666666666, ans=0.0 2023-12-21 18:08:18,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=164466.66666666666, ans=0.0 2023-12-21 18:08:26,518 INFO [train.py:886] (1/4) Epoch 6, batch 850, loss[loss=0.0174, audio_tagging_loss=0.0174, over 25000.00 frames. ], tot_loss[loss=0.01637, audio_tagging_loss=0.01637, over 4884857.25 frames. ], batch size: 100, lr: 1.84e-02, grad_scale: 64.0 2023-12-21 18:08:33,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=164533.33333333334, ans=0.0 2023-12-21 18:08:38,983 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.71 vs. limit=6.0 2023-12-21 18:08:50,662 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=3.955e-02 2023-12-21 18:08:51,749 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=18.23 vs. limit=15.0 2023-12-21 18:08:55,717 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=17.99 vs. limit=15.0 2023-12-21 18:08:57,860 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+01 2.614e+01 2.752e+01 3.030e+01 4.008e+01, threshold=5.503e+01, percent-clipped=0.0 2023-12-21 18:09:03,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=164733.33333333334, ans=0.2 2023-12-21 18:09:17,287 INFO [train.py:886] (1/4) Epoch 6, batch 900, loss[loss=0.01689, audio_tagging_loss=0.01689, over 25000.00 frames. ], tot_loss[loss=0.01644, audio_tagging_loss=0.01644, over 4905152.49 frames. ], batch size: 100, lr: 1.84e-02, grad_scale: 64.0 2023-12-21 18:09:34,724 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.48 vs. limit=10.0 2023-12-21 18:09:43,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=165000.0, ans=0.09899494936611666 2023-12-21 18:10:08,990 INFO [train.py:886] (1/4) Epoch 6, batch 950, loss[loss=0.01493, audio_tagging_loss=0.01493, over 24750.00 frames. ], tot_loss[loss=0.0165, audio_tagging_loss=0.0165, over 4907990.05 frames. ], batch size: 99, lr: 1.83e-02, grad_scale: 64.0 2023-12-21 18:10:16,147 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.39 vs. limit=15.0 2023-12-21 18:10:24,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=165266.66666666666, ans=0.125 2023-12-21 18:10:41,002 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.197e+01 2.630e+01 2.802e+01 3.017e+01 4.236e+01, threshold=5.603e+01, percent-clipped=0.0 2023-12-21 18:10:48,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=165400.0, ans=15.0 2023-12-21 18:10:58,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=165466.66666666666, ans=0.125 2023-12-21 18:11:01,351 INFO [train.py:886] (1/4) Epoch 6, batch 1000, loss[loss=0.01647, audio_tagging_loss=0.01647, over 24750.00 frames. ], tot_loss[loss=0.01649, audio_tagging_loss=0.01649, over 4915590.19 frames. ], batch size: 99, lr: 1.83e-02, grad_scale: 64.0 2023-12-21 18:11:09,080 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 18:11:17,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=165600.0, ans=0.125 2023-12-21 18:11:33,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=165733.33333333334, ans=0.0 2023-12-21 18:11:40,936 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.53 vs. limit=15.0 2023-12-21 18:11:43,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=165800.0, ans=0.0 2023-12-21 18:11:52,572 INFO [train.py:886] (1/4) Epoch 6, batch 1050, loss[loss=0.01689, audio_tagging_loss=0.01689, over 25000.00 frames. ], tot_loss[loss=0.01638, audio_tagging_loss=0.01638, over 4924176.38 frames. ], batch size: 100, lr: 1.83e-02, grad_scale: 64.0 2023-12-21 18:11:53,682 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 18:11:58,059 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-21 18:11:58,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=165866.66666666666, ans=0.125 2023-12-21 18:12:01,115 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.97 vs. limit=12.0 2023-12-21 18:12:05,443 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 18:12:14,342 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.91 vs. limit=22.5 2023-12-21 18:12:16,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=166000.0, ans=0.0 2023-12-21 18:12:25,143 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.084e+01 2.514e+01 2.651e+01 2.892e+01 3.448e+01, threshold=5.301e+01, percent-clipped=0.0 2023-12-21 18:12:33,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=166066.66666666666, ans=0.1 2023-12-21 18:12:42,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=166133.33333333334, ans=0.2 2023-12-21 18:12:43,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=166133.33333333334, ans=0.125 2023-12-21 18:12:44,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=166200.0, ans=0.2 2023-12-21 18:12:45,372 INFO [train.py:886] (1/4) Epoch 6, batch 1100, loss[loss=0.01735, audio_tagging_loss=0.01735, over 25000.00 frames. ], tot_loss[loss=0.01634, audio_tagging_loss=0.01634, over 4938137.97 frames. ], batch size: 100, lr: 1.83e-02, grad_scale: 64.0 2023-12-21 18:12:55,349 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.92 vs. limit=15.0 2023-12-21 18:13:26,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=166466.66666666666, ans=0.2 2023-12-21 18:13:30,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=166466.66666666666, ans=0.0 2023-12-21 18:13:37,826 INFO [train.py:886] (1/4) Epoch 6, batch 1150, loss[loss=0.01873, audio_tagging_loss=0.01873, over 25000.00 frames. ], tot_loss[loss=0.01627, audio_tagging_loss=0.01627, over 4941734.76 frames. ], batch size: 100, lr: 1.83e-02, grad_scale: 64.0 2023-12-21 18:13:40,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=166533.33333333334, ans=0.125 2023-12-21 18:13:43,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=166533.33333333334, ans=0.0 2023-12-21 18:13:45,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.20 vs. limit=12.0 2023-12-21 18:13:54,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=166600.0, ans=0.2 2023-12-21 18:13:54,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.92 vs. limit=22.5 2023-12-21 18:14:06,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=166666.66666666666, ans=0.035 2023-12-21 18:14:09,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=166733.33333333334, ans=0.1 2023-12-21 18:14:10,171 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.210e+01 2.615e+01 2.716e+01 2.897e+01 3.579e+01, threshold=5.433e+01, percent-clipped=0.0 2023-12-21 18:14:15,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=166733.33333333334, ans=0.125 2023-12-21 18:14:15,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=166733.33333333334, ans=0.125 2023-12-21 18:14:22,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.34 vs. limit=15.0 2023-12-21 18:14:29,585 INFO [train.py:886] (1/4) Epoch 6, batch 1200, loss[loss=0.01624, audio_tagging_loss=0.01624, over 24750.00 frames. ], tot_loss[loss=0.01635, audio_tagging_loss=0.01635, over 4949248.42 frames. ], batch size: 99, lr: 1.83e-02, grad_scale: 64.0 2023-12-21 18:14:29,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=166866.66666666666, ans=0.125 2023-12-21 18:14:47,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=166933.33333333334, ans=22.5 2023-12-21 18:14:49,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=167000.0, ans=22.5 2023-12-21 18:14:50,824 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=12.35 vs. limit=12.0 2023-12-21 18:15:15,820 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.37 vs. limit=15.0 2023-12-21 18:15:18,593 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.00 vs. limit=15.0 2023-12-21 18:15:20,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=167200.0, ans=0.125 2023-12-21 18:15:21,721 INFO [train.py:886] (1/4) Epoch 6, batch 1250, loss[loss=0.01705, audio_tagging_loss=0.01705, over 24750.00 frames. ], tot_loss[loss=0.01643, audio_tagging_loss=0.01643, over 4946231.86 frames. ], batch size: 99, lr: 1.82e-02, grad_scale: 64.0 2023-12-21 18:15:21,985 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=7.559e+00 2023-12-21 18:15:37,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=167266.66666666666, ans=0.1 2023-12-21 18:15:40,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=167266.66666666666, ans=0.1 2023-12-21 18:15:45,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=167333.33333333334, ans=0.125 2023-12-21 18:15:47,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=167333.33333333334, ans=0.0 2023-12-21 18:15:53,904 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.066e+01 2.561e+01 2.733e+01 2.928e+01 3.774e+01, threshold=5.467e+01, percent-clipped=0.0 2023-12-21 18:15:56,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=167400.0, ans=22.5 2023-12-21 18:15:58,200 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.57 vs. limit=15.0 2023-12-21 18:16:11,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.76 vs. limit=15.0 2023-12-21 18:16:13,505 INFO [train.py:886] (1/4) Epoch 6, batch 1300, loss[loss=0.01407, audio_tagging_loss=0.01407, over 25000.00 frames. ], tot_loss[loss=0.01652, audio_tagging_loss=0.01652, over 4949317.76 frames. ], batch size: 100, lr: 1.82e-02, grad_scale: 64.0 2023-12-21 18:16:34,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=167666.66666666666, ans=0.1 2023-12-21 18:16:43,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=167733.33333333334, ans=0.04949747468305833 2023-12-21 18:16:52,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=167733.33333333334, ans=0.125 2023-12-21 18:16:57,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=167800.0, ans=0.125 2023-12-21 18:17:05,876 INFO [train.py:886] (1/4) Epoch 6, batch 1350, loss[loss=0.01565, audio_tagging_loss=0.01565, over 25000.00 frames. ], tot_loss[loss=0.01645, audio_tagging_loss=0.01645, over 4948613.61 frames. ], batch size: 100, lr: 1.82e-02, grad_scale: 64.0 2023-12-21 18:17:11,920 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=15.0 2023-12-21 18:17:20,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=167933.33333333334, ans=0.1 2023-12-21 18:17:30,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=168000.0, ans=0.125 2023-12-21 18:17:31,932 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.90 vs. limit=22.5 2023-12-21 18:17:36,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=168066.66666666666, ans=0.125 2023-12-21 18:17:38,721 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.132e+01 2.576e+01 2.766e+01 2.896e+01 3.623e+01, threshold=5.532e+01, percent-clipped=0.0 2023-12-21 18:17:41,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=168066.66666666666, ans=0.0 2023-12-21 18:17:43,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=168066.66666666666, ans=0.07 2023-12-21 18:17:57,279 INFO [train.py:886] (1/4) Epoch 6, batch 1400, loss[loss=0.01374, audio_tagging_loss=0.01374, over 25000.00 frames. ], tot_loss[loss=0.01635, audio_tagging_loss=0.01635, over 4947740.82 frames. ], batch size: 100, lr: 1.82e-02, grad_scale: 64.0 2023-12-21 18:18:01,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=168200.0, ans=0.125 2023-12-21 18:18:02,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=168200.0, ans=0.0 2023-12-21 18:18:25,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=168333.33333333334, ans=0.0 2023-12-21 18:18:32,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=168400.0, ans=0.125 2023-12-21 18:18:35,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=168400.0, ans=0.125 2023-12-21 18:18:49,001 INFO [train.py:886] (1/4) Epoch 6, batch 1450, loss[loss=0.01827, audio_tagging_loss=0.01827, over 25000.00 frames. ], tot_loss[loss=0.01625, audio_tagging_loss=0.01625, over 4952475.82 frames. ], batch size: 100, lr: 1.82e-02, grad_scale: 64.0 2023-12-21 18:18:49,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=168533.33333333334, ans=0.2 2023-12-21 18:19:10,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=168666.66666666666, ans=0.125 2023-12-21 18:19:21,217 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.205e+01 2.551e+01 2.724e+01 2.909e+01 3.642e+01, threshold=5.448e+01, percent-clipped=0.0 2023-12-21 18:19:31,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=168800.0, ans=0.95 2023-12-21 18:19:38,111 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.87 vs. limit=15.0 2023-12-21 18:19:40,619 INFO [train.py:886] (1/4) Epoch 6, batch 1500, loss[loss=0.01419, audio_tagging_loss=0.01419, over 25000.00 frames. ], tot_loss[loss=0.01628, audio_tagging_loss=0.01628, over 4955660.32 frames. ], batch size: 100, lr: 1.82e-02, grad_scale: 64.0 2023-12-21 18:19:42,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=168866.66666666666, ans=0.125 2023-12-21 18:19:43,818 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.08 vs. limit=15.0 2023-12-21 18:19:44,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=168866.66666666666, ans=15.0 2023-12-21 18:19:54,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=168933.33333333334, ans=0.1 2023-12-21 18:20:04,451 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.98 vs. limit=22.5 2023-12-21 18:20:14,601 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.113e+00 2023-12-21 18:20:21,850 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=12.0 2023-12-21 18:20:26,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=169133.33333333334, ans=0.125 2023-12-21 18:20:33,298 INFO [train.py:886] (1/4) Epoch 6, batch 1550, loss[loss=0.01537, audio_tagging_loss=0.01537, over 24750.00 frames. ], tot_loss[loss=0.01637, audio_tagging_loss=0.01637, over 4946195.67 frames. ], batch size: 99, lr: 1.81e-02, grad_scale: 64.0 2023-12-21 18:20:35,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=169200.0, ans=0.0 2023-12-21 18:20:41,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=169200.0, ans=0.125 2023-12-21 18:20:53,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=169333.33333333334, ans=0.07 2023-12-21 18:21:04,688 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.191e+01 2.623e+01 2.761e+01 2.988e+01 3.455e+01, threshold=5.522e+01, percent-clipped=0.0 2023-12-21 18:21:10,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.31 vs. limit=22.5 2023-12-21 18:21:17,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=169466.66666666666, ans=0.09899494936611666 2023-12-21 18:21:23,954 INFO [train.py:886] (1/4) Epoch 6, batch 1600, loss[loss=0.01763, audio_tagging_loss=0.01763, over 22686.00 frames. ], tot_loss[loss=0.01642, audio_tagging_loss=0.01642, over 4937879.72 frames. ], batch size: 107, lr: 1.81e-02, grad_scale: 64.0 2023-12-21 18:21:30,109 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.69 vs. limit=15.0 2023-12-21 18:21:44,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=169666.66666666666, ans=0.2 2023-12-21 18:21:50,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=169666.66666666666, ans=0.125 2023-12-21 18:21:56,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=169733.33333333334, ans=0.125 2023-12-21 18:21:57,706 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.41 vs. limit=15.0 2023-12-21 18:22:04,912 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.62 vs. limit=15.0 2023-12-21 18:22:14,572 INFO [train.py:886] (1/4) Epoch 6, batch 1650, loss[loss=0.0141, audio_tagging_loss=0.0141, over 25000.00 frames. ], tot_loss[loss=0.01633, audio_tagging_loss=0.01633, over 4941444.16 frames. ], batch size: 100, lr: 1.81e-02, grad_scale: 64.0 2023-12-21 18:22:18,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=169866.66666666666, ans=0.0 2023-12-21 18:22:18,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=169866.66666666666, ans=0.125 2023-12-21 18:22:19,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=169866.66666666666, ans=0.04949747468305833 2023-12-21 18:22:23,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=169866.66666666666, ans=0.1 2023-12-21 18:22:27,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=169933.33333333334, ans=0.035 2023-12-21 18:22:30,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=169933.33333333334, ans=0.125 2023-12-21 18:22:47,810 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.251e+01 2.597e+01 2.773e+01 2.993e+01 4.191e+01, threshold=5.546e+01, percent-clipped=0.0 2023-12-21 18:22:49,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=170066.66666666666, ans=0.125 2023-12-21 18:23:01,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=170133.33333333334, ans=0.0 2023-12-21 18:23:06,298 INFO [train.py:886] (1/4) Epoch 6, batch 1700, loss[loss=0.01659, audio_tagging_loss=0.01659, over 21710.00 frames. ], tot_loss[loss=0.01628, audio_tagging_loss=0.01628, over 4941534.35 frames. ], batch size: 107, lr: 1.81e-02, grad_scale: 64.0 2023-12-21 18:23:25,216 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.59 vs. limit=22.5 2023-12-21 18:23:25,249 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.95 vs. limit=6.0 2023-12-21 18:23:46,465 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.05 vs. limit=15.0 2023-12-21 18:23:54,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=170466.66666666666, ans=0.1 2023-12-21 18:23:58,258 INFO [train.py:886] (1/4) Epoch 6, batch 1750, loss[loss=0.01547, audio_tagging_loss=0.01547, over 24750.00 frames. ], tot_loss[loss=0.01622, audio_tagging_loss=0.01622, over 4943356.42 frames. ], batch size: 99, lr: 1.81e-02, grad_scale: 64.0 2023-12-21 18:24:12,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=170600.0, ans=0.1 2023-12-21 18:24:23,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=170666.66666666666, ans=0.1 2023-12-21 18:24:31,728 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+01 2.494e+01 2.677e+01 2.882e+01 3.741e+01, threshold=5.353e+01, percent-clipped=0.0 2023-12-21 18:24:39,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=170800.0, ans=0.1 2023-12-21 18:24:50,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=170866.66666666666, ans=0.125 2023-12-21 18:24:51,584 INFO [train.py:886] (1/4) Epoch 6, batch 1800, loss[loss=0.01613, audio_tagging_loss=0.01613, over 25000.00 frames. ], tot_loss[loss=0.01621, audio_tagging_loss=0.01621, over 4944402.63 frames. ], batch size: 100, lr: 1.81e-02, grad_scale: 64.0 2023-12-21 18:24:57,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=170866.66666666666, ans=0.2 2023-12-21 18:25:01,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.16 vs. limit=15.0 2023-12-21 18:25:02,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=170933.33333333334, ans=0.125 2023-12-21 18:25:24,056 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.269e+00 2023-12-21 18:25:42,681 INFO [train.py:886] (1/4) Epoch 6, batch 1850, loss[loss=0.01507, audio_tagging_loss=0.01507, over 24081.00 frames. ], tot_loss[loss=0.01624, audio_tagging_loss=0.01624, over 4943472.00 frames. ], batch size: 100, lr: 1.80e-02, grad_scale: 64.0 2023-12-21 18:25:56,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=171266.66666666666, ans=0.1 2023-12-21 18:25:57,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=171266.66666666666, ans=0.07 2023-12-21 18:26:00,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=171266.66666666666, ans=0.0 2023-12-21 18:26:15,599 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.223e+01 2.610e+01 2.790e+01 3.016e+01 3.716e+01, threshold=5.580e+01, percent-clipped=0.0 2023-12-21 18:26:34,786 INFO [train.py:886] (1/4) Epoch 6, batch 1900, loss[loss=0.01444, audio_tagging_loss=0.01444, over 24750.00 frames. ], tot_loss[loss=0.01646, audio_tagging_loss=0.01646, over 4940577.32 frames. ], batch size: 99, lr: 1.80e-02, grad_scale: 64.0 2023-12-21 18:27:14,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=171733.33333333334, ans=0.1 2023-12-21 18:27:27,094 INFO [train.py:886] (1/4) Epoch 6, batch 1950, loss[loss=0.01713, audio_tagging_loss=0.01713, over 25000.00 frames. ], tot_loss[loss=0.01637, audio_tagging_loss=0.01637, over 4943414.98 frames. ], batch size: 100, lr: 1.80e-02, grad_scale: 64.0 2023-12-21 18:27:36,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=171933.33333333334, ans=0.0 2023-12-21 18:27:46,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=172000.0, ans=0.125 2023-12-21 18:27:49,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=172000.0, ans=0.07 2023-12-21 18:28:00,980 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.262e+01 2.565e+01 2.716e+01 2.900e+01 3.603e+01, threshold=5.432e+01, percent-clipped=0.0 2023-12-21 18:28:04,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=172066.66666666666, ans=0.0 2023-12-21 18:28:18,827 INFO [train.py:886] (1/4) Epoch 6, batch 2000, loss[loss=0.01519, audio_tagging_loss=0.01519, over 25000.00 frames. ], tot_loss[loss=0.01626, audio_tagging_loss=0.01626, over 4944834.27 frames. ], batch size: 100, lr: 1.80e-02, grad_scale: 64.0 2023-12-21 18:28:27,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=172266.66666666666, ans=0.1 2023-12-21 18:28:39,401 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.63 vs. limit=22.5 2023-12-21 18:28:43,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=172333.33333333334, ans=0.1 2023-12-21 18:28:44,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=172333.33333333334, ans=0.05 2023-12-21 18:28:45,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=172333.33333333334, ans=0.0 2023-12-21 18:28:45,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=172333.33333333334, ans=0.1 2023-12-21 18:28:47,187 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.29 vs. limit=15.0 2023-12-21 18:28:50,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=172400.0, ans=0.125 2023-12-21 18:29:10,794 INFO [train.py:886] (1/4) Epoch 6, batch 2050, loss[loss=0.01908, audio_tagging_loss=0.01908, over 25000.00 frames. ], tot_loss[loss=0.01626, audio_tagging_loss=0.01626, over 4951102.96 frames. ], batch size: 100, lr: 1.80e-02, grad_scale: 64.0 2023-12-21 18:29:20,189 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.32 vs. limit=15.0 2023-12-21 18:29:28,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=172600.0, ans=0.125 2023-12-21 18:29:43,369 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.241e+01 2.562e+01 2.748e+01 2.968e+01 3.569e+01, threshold=5.496e+01, percent-clipped=0.0 2023-12-21 18:30:01,213 INFO [train.py:886] (1/4) Epoch 6, batch 2100, loss[loss=0.01731, audio_tagging_loss=0.01731, over 25000.00 frames. ], tot_loss[loss=0.01616, audio_tagging_loss=0.01616, over 4954875.27 frames. ], batch size: 100, lr: 1.80e-02, grad_scale: 64.0 2023-12-21 18:30:31,616 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.51 vs. limit=12.0 2023-12-21 18:30:52,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=173200.0, ans=0.125 2023-12-21 18:30:53,353 INFO [train.py:886] (1/4) Epoch 6, batch 2150, loss[loss=0.01906, audio_tagging_loss=0.01906, over 25000.00 frames. ], tot_loss[loss=0.01619, audio_tagging_loss=0.01619, over 4961415.55 frames. ], batch size: 100, lr: 1.79e-02, grad_scale: 64.0 2023-12-21 18:30:55,681 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.02 vs. limit=15.0 2023-12-21 18:31:12,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=173266.66666666666, ans=0.09899494936611666 2023-12-21 18:31:24,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=173400.0, ans=0.0 2023-12-21 18:31:26,794 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 2.590e+01 2.794e+01 3.040e+01 3.581e+01, threshold=5.589e+01, percent-clipped=0.0 2023-12-21 18:31:29,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=173400.0, ans=0.1 2023-12-21 18:31:33,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=173400.0, ans=0.125 2023-12-21 18:31:38,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=173466.66666666666, ans=0.0 2023-12-21 18:31:45,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=173533.33333333334, ans=0.125 2023-12-21 18:31:46,025 INFO [train.py:886] (1/4) Epoch 6, batch 2200, loss[loss=0.0173, audio_tagging_loss=0.0173, over 24750.00 frames. ], tot_loss[loss=0.01633, audio_tagging_loss=0.01633, over 4947356.41 frames. ], batch size: 99, lr: 1.79e-02, grad_scale: 64.0 2023-12-21 18:31:50,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=173533.33333333334, ans=0.0 2023-12-21 18:32:00,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=173600.0, ans=0.0 2023-12-21 18:32:27,048 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.52 vs. limit=15.0 2023-12-21 18:32:33,524 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.58 vs. limit=15.0 2023-12-21 18:32:37,604 INFO [train.py:886] (1/4) Epoch 6, batch 2250, loss[loss=0.01512, audio_tagging_loss=0.01512, over 24030.00 frames. ], tot_loss[loss=0.01641, audio_tagging_loss=0.01641, over 4938090.11 frames. ], batch size: 100, lr: 1.79e-02, grad_scale: 64.0 2023-12-21 18:32:53,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=173933.33333333334, ans=0.0 2023-12-21 18:33:02,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=174000.0, ans=0.0 2023-12-21 18:33:04,728 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.35 vs. limit=15.0 2023-12-21 18:33:07,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=174066.66666666666, ans=0.2 2023-12-21 18:33:08,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=174066.66666666666, ans=0.02 2023-12-21 18:33:08,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=174066.66666666666, ans=0.0 2023-12-21 18:33:10,618 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.133e+01 2.579e+01 2.731e+01 2.928e+01 3.593e+01, threshold=5.463e+01, percent-clipped=0.0 2023-12-21 18:33:18,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=174133.33333333334, ans=0.0 2023-12-21 18:33:30,083 INFO [train.py:886] (1/4) Epoch 6, batch 2300, loss[loss=0.01456, audio_tagging_loss=0.01456, over 25000.00 frames. ], tot_loss[loss=0.01637, audio_tagging_loss=0.01637, over 4939679.78 frames. ], batch size: 100, lr: 1.79e-02, grad_scale: 64.0 2023-12-21 18:33:43,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=174266.66666666666, ans=0.125 2023-12-21 18:33:46,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.65 vs. limit=12.0 2023-12-21 18:34:07,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=174400.0, ans=0.1 2023-12-21 18:34:21,990 INFO [train.py:886] (1/4) Epoch 6, batch 2350, loss[loss=0.0175, audio_tagging_loss=0.0175, over 25000.00 frames. ], tot_loss[loss=0.0163, audio_tagging_loss=0.0163, over 4947204.93 frames. ], batch size: 100, lr: 1.79e-02, grad_scale: 64.0 2023-12-21 18:34:29,044 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.94 vs. limit=15.0 2023-12-21 18:34:29,742 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.12 vs. limit=15.0 2023-12-21 18:34:55,214 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.161e+01 2.528e+01 2.689e+01 2.848e+01 3.552e+01, threshold=5.378e+01, percent-clipped=0.0 2023-12-21 18:35:13,770 INFO [train.py:886] (1/4) Epoch 6, batch 2400, loss[loss=0.01323, audio_tagging_loss=0.01323, over 25000.00 frames. ], tot_loss[loss=0.01619, audio_tagging_loss=0.01619, over 4950021.05 frames. ], batch size: 100, lr: 1.79e-02, grad_scale: 64.0 2023-12-21 18:35:22,366 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.75 vs. limit=22.5 2023-12-21 18:35:34,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=175000.0, ans=0.0 2023-12-21 18:35:44,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=175066.66666666666, ans=0.125 2023-12-21 18:36:05,898 INFO [train.py:886] (1/4) Epoch 6, batch 2450, loss[loss=0.01415, audio_tagging_loss=0.01415, over 25000.00 frames. ], tot_loss[loss=0.0162, audio_tagging_loss=0.0162, over 4953040.51 frames. ], batch size: 100, lr: 1.79e-02, grad_scale: 64.0 2023-12-21 18:36:33,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=175333.33333333334, ans=0.0 2023-12-21 18:36:38,854 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.112e+01 2.641e+01 2.797e+01 2.976e+01 3.945e+01, threshold=5.593e+01, percent-clipped=0.0 2023-12-21 18:36:48,654 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.67 vs. limit=15.0 2023-12-21 18:36:53,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=175466.66666666666, ans=0.1 2023-12-21 18:36:57,347 INFO [train.py:886] (1/4) Epoch 6, batch 2500, loss[loss=0.0152, audio_tagging_loss=0.0152, over 24750.00 frames. ], tot_loss[loss=0.01637, audio_tagging_loss=0.01637, over 4951246.15 frames. ], batch size: 99, lr: 1.78e-02, grad_scale: 64.0 2023-12-21 18:37:13,491 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.56 vs. limit=15.0 2023-12-21 18:37:14,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=175600.0, ans=0.125 2023-12-21 18:37:15,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=175600.0, ans=15.0 2023-12-21 18:37:24,864 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.82 vs. limit=15.0 2023-12-21 18:37:26,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=175666.66666666666, ans=0.1 2023-12-21 18:37:30,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=175733.33333333334, ans=0.025 2023-12-21 18:37:33,191 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.40 vs. limit=10.0 2023-12-21 18:37:37,213 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.09 vs. limit=15.0 2023-12-21 18:37:42,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=175800.0, ans=0.125 2023-12-21 18:37:49,686 INFO [train.py:886] (1/4) Epoch 6, batch 2550, loss[loss=0.01779, audio_tagging_loss=0.01779, over 25000.00 frames. ], tot_loss[loss=0.01652, audio_tagging_loss=0.01652, over 4946246.48 frames. ], batch size: 100, lr: 1.78e-02, grad_scale: 64.0 2023-12-21 18:38:01,954 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.84 vs. limit=15.0 2023-12-21 18:38:02,945 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.38 vs. limit=15.0 2023-12-21 18:38:07,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=175933.33333333334, ans=0.1 2023-12-21 18:38:08,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=175933.33333333334, ans=0.0 2023-12-21 18:38:14,099 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.08 vs. limit=15.0 2023-12-21 18:38:22,986 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.230e+01 2.592e+01 2.752e+01 3.040e+01 4.422e+01, threshold=5.504e+01, percent-clipped=0.0 2023-12-21 18:38:23,472 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.14 vs. limit=15.0 2023-12-21 18:38:26,420 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.73 vs. limit=15.0 2023-12-21 18:38:31,703 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.76 vs. limit=15.0 2023-12-21 18:38:41,814 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.85 vs. limit=22.5 2023-12-21 18:38:42,276 INFO [train.py:886] (1/4) Epoch 6, batch 2600, loss[loss=0.01608, audio_tagging_loss=0.01608, over 25000.00 frames. ], tot_loss[loss=0.01639, audio_tagging_loss=0.01639, over 4941566.18 frames. ], batch size: 100, lr: 1.78e-02, grad_scale: 64.0 2023-12-21 18:38:56,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=176266.66666666666, ans=0.1 2023-12-21 18:39:13,144 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.34 vs. limit=22.5 2023-12-21 18:39:33,961 INFO [train.py:886] (1/4) Epoch 6, batch 2650, loss[loss=0.01931, audio_tagging_loss=0.01931, over 25000.00 frames. ], tot_loss[loss=0.01635, audio_tagging_loss=0.01635, over 4936943.17 frames. ], batch size: 100, lr: 1.78e-02, grad_scale: 64.0 2023-12-21 18:39:35,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=176533.33333333334, ans=0.0 2023-12-21 18:40:01,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=176666.66666666666, ans=0.125 2023-12-21 18:40:07,085 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.240e+01 2.558e+01 2.691e+01 2.831e+01 3.904e+01, threshold=5.381e+01, percent-clipped=0.0 2023-12-21 18:40:08,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=176733.33333333334, ans=0.125 2023-12-21 18:40:26,266 INFO [train.py:886] (1/4) Epoch 6, batch 2700, loss[loss=0.01484, audio_tagging_loss=0.01484, over 24750.00 frames. ], tot_loss[loss=0.0163, audio_tagging_loss=0.0163, over 4940632.43 frames. ], batch size: 99, lr: 1.78e-02, grad_scale: 64.0 2023-12-21 18:40:35,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=176933.33333333334, ans=0.1 2023-12-21 18:40:57,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=177066.66666666666, ans=0.125 2023-12-21 18:40:59,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=177066.66666666666, ans=0.0 2023-12-21 18:41:16,675 INFO [train.py:886] (1/4) Epoch 6, batch 2750, loss[loss=0.01601, audio_tagging_loss=0.01601, over 25000.00 frames. ], tot_loss[loss=0.0162, audio_tagging_loss=0.0162, over 4945774.37 frames. ], batch size: 100, lr: 1.78e-02, grad_scale: 64.0 2023-12-21 18:41:21,887 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.14 vs. limit=22.5 2023-12-21 18:41:22,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=177200.0, ans=0.1 2023-12-21 18:41:39,676 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=27.16 vs. limit=15.0 2023-12-21 18:41:49,368 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.263e+01 2.557e+01 2.736e+01 2.928e+01 3.710e+01, threshold=5.471e+01, percent-clipped=0.0 2023-12-21 18:41:58,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=177466.66666666666, ans=0.125 2023-12-21 18:42:00,679 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.19 vs. limit=10.0 2023-12-21 18:42:07,726 INFO [train.py:886] (1/4) Epoch 6, batch 2800, loss[loss=0.01418, audio_tagging_loss=0.01418, over 24750.00 frames. ], tot_loss[loss=0.01622, audio_tagging_loss=0.01622, over 4949346.24 frames. ], batch size: 99, lr: 1.77e-02, grad_scale: 64.0 2023-12-21 18:42:08,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=177533.33333333334, ans=0.0 2023-12-21 18:42:12,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.58 vs. limit=6.0 2023-12-21 18:42:12,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.17 vs. limit=15.0 2023-12-21 18:42:15,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=177533.33333333334, ans=0.125 2023-12-21 18:42:24,210 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.218e-01 2023-12-21 18:42:24,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=177600.0, ans=0.125 2023-12-21 18:42:25,370 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.69 vs. limit=12.0 2023-12-21 18:42:33,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=177666.66666666666, ans=0.0 2023-12-21 18:42:59,863 INFO [train.py:886] (1/4) Epoch 6, batch 2850, loss[loss=0.01622, audio_tagging_loss=0.01622, over 24750.00 frames. ], tot_loss[loss=0.01626, audio_tagging_loss=0.01626, over 4944680.56 frames. ], batch size: 99, lr: 1.77e-02, grad_scale: 64.0 2023-12-21 18:43:02,005 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.20 vs. limit=15.0 2023-12-21 18:43:15,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=177933.33333333334, ans=0.125 2023-12-21 18:43:18,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=177933.33333333334, ans=0.1 2023-12-21 18:43:21,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=178000.0, ans=0.125 2023-12-21 18:43:33,496 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.215e+01 2.570e+01 2.729e+01 2.961e+01 3.657e+01, threshold=5.459e+01, percent-clipped=0.0 2023-12-21 18:43:36,002 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.12 vs. limit=10.0 2023-12-21 18:43:51,167 INFO [train.py:886] (1/4) Epoch 6, batch 2900, loss[loss=0.01653, audio_tagging_loss=0.01653, over 21816.00 frames. ], tot_loss[loss=0.01624, audio_tagging_loss=0.01624, over 4942567.93 frames. ], batch size: 107, lr: 1.77e-02, grad_scale: 64.0 2023-12-21 18:43:55,879 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.025e-02 2023-12-21 18:44:02,571 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.74 vs. limit=15.0 2023-12-21 18:44:08,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=178266.66666666666, ans=0.2 2023-12-21 18:44:17,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=178333.33333333334, ans=0.5 2023-12-21 18:44:22,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=178400.0, ans=0.125 2023-12-21 18:44:22,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=178400.0, ans=0.0 2023-12-21 18:44:42,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=178533.33333333334, ans=0.125 2023-12-21 18:44:42,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=178533.33333333334, ans=0.0 2023-12-21 18:44:43,481 INFO [train.py:886] (1/4) Epoch 6, batch 2950, loss[loss=0.01479, audio_tagging_loss=0.01479, over 24092.00 frames. ], tot_loss[loss=0.01623, audio_tagging_loss=0.01623, over 4946248.78 frames. ], batch size: 100, lr: 1.77e-02, grad_scale: 64.0 2023-12-21 18:44:44,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=178533.33333333334, ans=0.04949747468305833 2023-12-21 18:44:45,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=178533.33333333334, ans=0.125 2023-12-21 18:44:55,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=178600.0, ans=0.0 2023-12-21 18:45:07,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=178666.66666666666, ans=10.0 2023-12-21 18:45:13,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=178666.66666666666, ans=0.1 2023-12-21 18:45:16,980 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.209e+01 2.523e+01 2.674e+01 2.981e+01 3.708e+01, threshold=5.347e+01, percent-clipped=0.0 2023-12-21 18:45:31,422 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.16 vs. limit=22.5 2023-12-21 18:45:34,823 INFO [train.py:886] (1/4) Epoch 6, batch 3000, loss[loss=0.01546, audio_tagging_loss=0.01546, over 25000.00 frames. ], tot_loss[loss=0.01612, audio_tagging_loss=0.01612, over 4950994.85 frames. ], batch size: 100, lr: 1.77e-02, grad_scale: 64.0 2023-12-21 18:45:34,824 INFO [train.py:909] (1/4) Computing validation loss 2023-12-21 18:45:56,020 INFO [train.py:917] (1/4) Epoch 6, validation: loss=0.03776, audio_tagging_loss=0.03776, over 3737520.00 frames. 2023-12-21 18:45:56,020 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-21 18:46:00,171 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.24 vs. limit=15.0 2023-12-21 18:46:10,792 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.94 vs. limit=22.5 2023-12-21 18:46:18,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=179000.0, ans=0.2 2023-12-21 18:46:26,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=179066.66666666666, ans=0.1 2023-12-21 18:46:34,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=179066.66666666666, ans=0.2 2023-12-21 18:46:42,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=179133.33333333334, ans=0.125 2023-12-21 18:46:48,375 INFO [train.py:886] (1/4) Epoch 6, batch 3050, loss[loss=0.01513, audio_tagging_loss=0.01513, over 25000.00 frames. ], tot_loss[loss=0.01602, audio_tagging_loss=0.01602, over 4955440.94 frames. ], batch size: 100, lr: 1.77e-02, grad_scale: 64.0 2023-12-21 18:46:48,849 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.04 vs. limit=22.5 2023-12-21 18:46:56,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=179200.0, ans=0.0 2023-12-21 18:47:03,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=179266.66666666666, ans=22.5 2023-12-21 18:47:13,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=179333.33333333334, ans=0.0 2023-12-21 18:47:21,408 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.290e+01 2.570e+01 2.697e+01 2.943e+01 3.684e+01, threshold=5.394e+01, percent-clipped=0.0 2023-12-21 18:47:32,619 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.11 vs. limit=15.0 2023-12-21 18:47:40,091 INFO [train.py:886] (1/4) Epoch 6, batch 3100, loss[loss=0.01871, audio_tagging_loss=0.01871, over 24750.00 frames. ], tot_loss[loss=0.01609, audio_tagging_loss=0.01609, over 4959719.72 frames. ], batch size: 99, lr: 1.77e-02, grad_scale: 64.0 2023-12-21 18:47:44,164 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.36 vs. limit=15.0 2023-12-21 18:47:48,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=179533.33333333334, ans=10.0 2023-12-21 18:48:06,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=179666.66666666666, ans=15.0 2023-12-21 18:48:31,648 INFO [train.py:886] (1/4) Epoch 6, batch 3150, loss[loss=0.01634, audio_tagging_loss=0.01634, over 24750.00 frames. ], tot_loss[loss=0.01617, audio_tagging_loss=0.01617, over 4953165.80 frames. ], batch size: 99, lr: 1.76e-02, grad_scale: 64.0 2023-12-21 18:48:45,044 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.27 vs. limit=22.5 2023-12-21 18:48:50,547 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.43 vs. limit=6.0 2023-12-21 18:49:04,097 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.326e+01 2.612e+01 2.785e+01 2.963e+01 3.956e+01, threshold=5.570e+01, percent-clipped=0.0 2023-12-21 18:49:10,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=180066.66666666666, ans=0.1 2023-12-21 18:49:23,229 INFO [train.py:886] (1/4) Epoch 6, batch 3200, loss[loss=0.01477, audio_tagging_loss=0.01477, over 25000.00 frames. ], tot_loss[loss=0.01628, audio_tagging_loss=0.01628, over 4952346.23 frames. ], batch size: 100, lr: 1.76e-02, grad_scale: 64.0 2023-12-21 18:49:28,088 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 18:49:45,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=180333.33333333334, ans=0.2 2023-12-21 18:49:47,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=180333.33333333334, ans=0.0 2023-12-21 18:49:53,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=180400.0, ans=0.09899494936611666 2023-12-21 18:49:54,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.66 vs. limit=15.0 2023-12-21 18:50:07,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=180466.66666666666, ans=0.125 2023-12-21 18:50:12,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=180466.66666666666, ans=0.1 2023-12-21 18:50:14,312 INFO [train.py:886] (1/4) Epoch 6, batch 3250, loss[loss=0.01922, audio_tagging_loss=0.01922, over 25000.00 frames. ], tot_loss[loss=0.0162, audio_tagging_loss=0.0162, over 4956169.31 frames. ], batch size: 100, lr: 1.76e-02, grad_scale: 64.0 2023-12-21 18:50:17,367 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.73 vs. limit=15.0 2023-12-21 18:50:19,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=180533.33333333334, ans=0.2 2023-12-21 18:50:35,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=180666.66666666666, ans=0.2 2023-12-21 18:50:38,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=180666.66666666666, ans=0.2 2023-12-21 18:50:46,032 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2023-12-21 18:50:47,528 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.514e+01 2.742e+01 2.966e+01 4.089e+01, threshold=5.485e+01, percent-clipped=0.0 2023-12-21 18:50:59,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=180800.0, ans=0.1 2023-12-21 18:51:05,179 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.60 vs. limit=15.0 2023-12-21 18:51:06,744 INFO [train.py:886] (1/4) Epoch 6, batch 3300, loss[loss=0.01456, audio_tagging_loss=0.01456, over 25000.00 frames. ], tot_loss[loss=0.01617, audio_tagging_loss=0.01617, over 4958246.19 frames. ], batch size: 100, lr: 1.76e-02, grad_scale: 64.0 2023-12-21 18:51:06,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=180866.66666666666, ans=0.125 2023-12-21 18:51:11,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=180866.66666666666, ans=0.1 2023-12-21 18:51:13,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=180866.66666666666, ans=0.1 2023-12-21 18:51:17,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=180933.33333333334, ans=0.0 2023-12-21 18:51:22,200 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.65 vs. limit=5.0 2023-12-21 18:51:23,824 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.82 vs. limit=10.0 2023-12-21 18:51:27,075 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 18:51:27,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.82 vs. limit=15.0 2023-12-21 18:51:36,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=181000.0, ans=0.125 2023-12-21 18:51:43,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2023-12-21 18:51:44,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=181066.66666666666, ans=0.0 2023-12-21 18:51:54,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=181133.33333333334, ans=0.1 2023-12-21 18:51:55,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.40 vs. limit=22.5 2023-12-21 18:51:59,250 INFO [train.py:886] (1/4) Epoch 6, batch 3350, loss[loss=0.01558, audio_tagging_loss=0.01558, over 25000.00 frames. ], tot_loss[loss=0.01613, audio_tagging_loss=0.01613, over 4962620.12 frames. ], batch size: 100, lr: 1.76e-02, grad_scale: 64.0 2023-12-21 18:52:06,339 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.86 vs. limit=15.0 2023-12-21 18:52:07,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=181200.0, ans=0.125 2023-12-21 18:52:14,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=181266.66666666666, ans=6.0 2023-12-21 18:52:14,595 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.65 vs. limit=22.5 2023-12-21 18:52:17,367 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.19 vs. limit=15.0 2023-12-21 18:52:30,452 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.34 vs. limit=15.0 2023-12-21 18:52:32,376 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.289e+01 2.583e+01 2.776e+01 2.913e+01 4.067e+01, threshold=5.551e+01, percent-clipped=0.0 2023-12-21 18:52:46,018 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.75 vs. limit=6.0 2023-12-21 18:52:46,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=181466.66666666666, ans=0.2 2023-12-21 18:52:50,298 INFO [train.py:886] (1/4) Epoch 6, batch 3400, loss[loss=0.01713, audio_tagging_loss=0.01713, over 25000.00 frames. ], tot_loss[loss=0.01617, audio_tagging_loss=0.01617, over 4958361.65 frames. ], batch size: 100, lr: 1.76e-02, grad_scale: 64.0 2023-12-21 18:52:54,996 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.902e+00 2023-12-21 18:53:12,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=181666.66666666666, ans=0.125 2023-12-21 18:53:15,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=181666.66666666666, ans=10.0 2023-12-21 18:53:20,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=181733.33333333334, ans=0.125 2023-12-21 18:53:28,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=181733.33333333334, ans=0.2 2023-12-21 18:53:32,993 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.82 vs. limit=15.0 2023-12-21 18:53:42,540 INFO [train.py:886] (1/4) Epoch 6, batch 3450, loss[loss=0.01702, audio_tagging_loss=0.01702, over 24750.00 frames. ], tot_loss[loss=0.01625, audio_tagging_loss=0.01625, over 4949860.18 frames. ], batch size: 99, lr: 1.75e-02, grad_scale: 64.0 2023-12-21 18:53:52,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=181933.33333333334, ans=0.0 2023-12-21 18:54:08,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=182000.0, ans=0.0 2023-12-21 18:54:15,473 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.142e+01 2.567e+01 2.759e+01 2.912e+01 3.537e+01, threshold=5.518e+01, percent-clipped=0.0 2023-12-21 18:54:16,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=182066.66666666666, ans=0.125 2023-12-21 18:54:32,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=182133.33333333334, ans=0.125 2023-12-21 18:54:34,847 INFO [train.py:886] (1/4) Epoch 6, batch 3500, loss[loss=0.01643, audio_tagging_loss=0.01643, over 24750.00 frames. ], tot_loss[loss=0.01627, audio_tagging_loss=0.01627, over 4937521.04 frames. ], batch size: 99, lr: 1.75e-02, grad_scale: 64.0 2023-12-21 18:54:55,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=182333.33333333334, ans=0.125 2023-12-21 18:54:58,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=182333.33333333334, ans=0.2 2023-12-21 18:55:00,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=182333.33333333334, ans=0.125 2023-12-21 18:55:03,615 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.00 vs. limit=22.5 2023-12-21 18:55:14,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=182400.0, ans=0.0 2023-12-21 18:55:21,358 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=17.50 vs. limit=15.0 2023-12-21 18:55:26,237 INFO [train.py:886] (1/4) Epoch 6, batch 3550, loss[loss=0.01526, audio_tagging_loss=0.01526, over 24750.00 frames. ], tot_loss[loss=0.01624, audio_tagging_loss=0.01624, over 4938283.22 frames. ], batch size: 99, lr: 1.75e-02, grad_scale: 64.0 2023-12-21 18:55:30,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=182533.33333333334, ans=0.2 2023-12-21 18:55:32,297 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.02 vs. limit=15.0 2023-12-21 18:55:32,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=182533.33333333334, ans=0.125 2023-12-21 18:55:35,207 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.91 vs. limit=15.0 2023-12-21 18:55:58,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=182733.33333333334, ans=0.04949747468305833 2023-12-21 18:55:59,189 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.280e+01 2.568e+01 2.734e+01 3.047e+01 3.818e+01, threshold=5.468e+01, percent-clipped=0.0 2023-12-21 18:56:03,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=182733.33333333334, ans=0.05 2023-12-21 18:56:18,368 INFO [train.py:886] (1/4) Epoch 6, batch 3600, loss[loss=0.02195, audio_tagging_loss=0.02195, over 25000.00 frames. ], tot_loss[loss=0.01617, audio_tagging_loss=0.01617, over 4938256.53 frames. ], batch size: 100, lr: 1.75e-02, grad_scale: 128.0 2023-12-21 18:56:25,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=182866.66666666666, ans=0.125 2023-12-21 18:56:47,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=183000.0, ans=0.125 2023-12-21 18:57:08,788 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.27 vs. limit=10.0 2023-12-21 18:57:09,937 INFO [train.py:886] (1/4) Epoch 6, batch 3650, loss[loss=0.0192, audio_tagging_loss=0.0192, over 25000.00 frames. ], tot_loss[loss=0.01613, audio_tagging_loss=0.01613, over 4945063.11 frames. ], batch size: 100, lr: 1.75e-02, grad_scale: 128.0 2023-12-21 18:57:22,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=183266.66666666666, ans=0.125 2023-12-21 18:57:34,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=183333.33333333334, ans=0.125 2023-12-21 18:57:43,147 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.299e+01 2.536e+01 2.775e+01 2.969e+01 4.342e+01, threshold=5.551e+01, percent-clipped=0.0 2023-12-21 18:57:49,116 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.72 vs. limit=15.0 2023-12-21 18:57:50,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=183466.66666666666, ans=0.125 2023-12-21 18:57:55,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=183466.66666666666, ans=0.125 2023-12-21 18:57:58,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=183466.66666666666, ans=0.0 2023-12-21 18:58:00,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=183466.66666666666, ans=0.0 2023-12-21 18:58:01,817 INFO [train.py:886] (1/4) Epoch 6, batch 3700, loss[loss=0.015, audio_tagging_loss=0.015, over 24750.00 frames. ], tot_loss[loss=0.01611, audio_tagging_loss=0.01611, over 4951995.54 frames. ], batch size: 99, lr: 1.75e-02, grad_scale: 128.0 2023-12-21 18:58:07,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=183533.33333333334, ans=0.125 2023-12-21 18:58:08,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=183533.33333333334, ans=0.2 2023-12-21 18:58:12,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=183600.0, ans=0.125 2023-12-21 18:58:30,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=183666.66666666666, ans=0.125 2023-12-21 18:58:40,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=183733.33333333334, ans=0.125 2023-12-21 18:58:49,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=183800.0, ans=0.125 2023-12-21 18:58:50,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=183800.0, ans=0.95 2023-12-21 18:58:54,274 INFO [train.py:886] (1/4) Epoch 6, batch 3750, loss[loss=0.01588, audio_tagging_loss=0.01588, over 25000.00 frames. ], tot_loss[loss=0.01623, audio_tagging_loss=0.01623, over 4953555.80 frames. ], batch size: 100, lr: 1.75e-02, grad_scale: 128.0 2023-12-21 18:59:00,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=183866.66666666666, ans=0.0 2023-12-21 18:59:12,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=183933.33333333334, ans=0.0 2023-12-21 18:59:28,152 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.065e+01 2.576e+01 2.747e+01 2.976e+01 3.504e+01, threshold=5.494e+01, percent-clipped=0.0 2023-12-21 18:59:40,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=184133.33333333334, ans=0.125 2023-12-21 18:59:45,105 INFO [train.py:886] (1/4) Epoch 6, batch 3800, loss[loss=0.01946, audio_tagging_loss=0.01946, over 22680.00 frames. ], tot_loss[loss=0.01629, audio_tagging_loss=0.01629, over 4939421.76 frames. ], batch size: 107, lr: 1.74e-02, grad_scale: 128.0 2023-12-21 18:59:47,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=184200.0, ans=0.125 2023-12-21 18:59:54,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=184200.0, ans=0.0 2023-12-21 18:59:58,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=184266.66666666666, ans=0.125 2023-12-21 19:00:02,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=184266.66666666666, ans=0.1 2023-12-21 19:00:35,091 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=19.50 vs. limit=15.0 2023-12-21 19:00:37,460 INFO [train.py:886] (1/4) Epoch 6, batch 3850, loss[loss=0.01476, audio_tagging_loss=0.01476, over 25000.00 frames. ], tot_loss[loss=0.01626, audio_tagging_loss=0.01626, over 4942868.85 frames. ], batch size: 100, lr: 1.74e-02, grad_scale: 128.0 2023-12-21 19:00:45,356 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.79 vs. limit=15.0 2023-12-21 19:00:50,841 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.99 vs. limit=22.5 2023-12-21 19:00:57,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=184666.66666666666, ans=0.0 2023-12-21 19:01:05,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=184666.66666666666, ans=0.125 2023-12-21 19:01:11,864 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.230e+01 2.661e+01 2.816e+01 3.118e+01 3.976e+01, threshold=5.631e+01, percent-clipped=0.0 2023-12-21 19:01:14,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=184733.33333333334, ans=0.1 2023-12-21 19:01:24,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=184800.0, ans=0.95 2023-12-21 19:01:29,363 INFO [train.py:886] (1/4) Epoch 6, batch 3900, loss[loss=0.01611, audio_tagging_loss=0.01611, over 25000.00 frames. ], tot_loss[loss=0.01631, audio_tagging_loss=0.01631, over 4950912.74 frames. ], batch size: 100, lr: 1.74e-02, grad_scale: 64.0 2023-12-21 19:01:40,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=184933.33333333334, ans=0.0 2023-12-21 19:01:42,450 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.90 vs. limit=15.0 2023-12-21 19:01:43,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=184933.33333333334, ans=0.1 2023-12-21 19:01:44,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=184933.33333333334, ans=0.125 2023-12-21 19:01:50,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=185000.0, ans=0.2 2023-12-21 19:01:53,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=185000.0, ans=0.0 2023-12-21 19:01:58,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=185000.0, ans=0.125 2023-12-21 19:01:59,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=185066.66666666666, ans=22.5 2023-12-21 19:02:20,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=185200.0, ans=0.2 2023-12-21 19:02:20,936 INFO [train.py:886] (1/4) Epoch 6, batch 3950, loss[loss=0.01704, audio_tagging_loss=0.01704, over 25000.00 frames. ], tot_loss[loss=0.01621, audio_tagging_loss=0.01621, over 4949130.67 frames. ], batch size: 100, lr: 1.74e-02, grad_scale: 64.0 2023-12-21 19:02:22,239 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.13 vs. limit=15.0 2023-12-21 19:02:23,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=185200.0, ans=10.0 2023-12-21 19:02:23,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=185200.0, ans=0.125 2023-12-21 19:02:26,273 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.69 vs. limit=22.5 2023-12-21 19:02:42,252 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.75 vs. limit=15.0 2023-12-21 19:02:52,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=185400.0, ans=0.2 2023-12-21 19:02:55,023 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.244e+01 2.574e+01 2.731e+01 2.913e+01 3.749e+01, threshold=5.463e+01, percent-clipped=0.0 2023-12-21 19:02:55,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=185400.0, ans=0.0 2023-12-21 19:02:57,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=185400.0, ans=0.0 2023-12-21 19:03:11,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=185466.66666666666, ans=0.125 2023-12-21 19:03:13,950 INFO [train.py:886] (1/4) Epoch 6, batch 4000, loss[loss=0.01606, audio_tagging_loss=0.01606, over 25000.00 frames. ], tot_loss[loss=0.01624, audio_tagging_loss=0.01624, over 4954278.68 frames. ], batch size: 100, lr: 1.74e-02, grad_scale: 64.0 2023-12-21 19:03:57,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=185800.0, ans=0.1 2023-12-21 19:04:03,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=185866.66666666666, ans=0.0 2023-12-21 19:04:04,182 INFO [train.py:886] (1/4) Epoch 6, batch 4050, loss[loss=0.01986, audio_tagging_loss=0.01986, over 25000.00 frames. ], tot_loss[loss=0.01638, audio_tagging_loss=0.01638, over 4956406.64 frames. ], batch size: 100, lr: 1.74e-02, grad_scale: 64.0 2023-12-21 19:04:11,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=185866.66666666666, ans=0.025 2023-12-21 19:04:15,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=185933.33333333334, ans=0.125 2023-12-21 19:04:38,166 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.265e+01 2.667e+01 2.852e+01 3.052e+01 4.692e+01, threshold=5.704e+01, percent-clipped=0.0 2023-12-21 19:04:44,194 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.48 vs. limit=15.0 2023-12-21 19:04:45,006 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.73 vs. limit=15.0 2023-12-21 19:04:53,083 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.33 vs. limit=15.0 2023-12-21 19:04:53,776 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.337e-01 2023-12-21 19:04:56,394 INFO [train.py:886] (1/4) Epoch 6, batch 4100, loss[loss=0.01762, audio_tagging_loss=0.01762, over 24750.00 frames. ], tot_loss[loss=0.0164, audio_tagging_loss=0.0164, over 4952054.39 frames. ], batch size: 99, lr: 1.74e-02, grad_scale: 64.0 2023-12-21 19:05:09,103 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.08 vs. limit=15.0 2023-12-21 19:05:13,224 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 19:05:18,741 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.29 vs. limit=22.5 2023-12-21 19:05:30,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=186400.0, ans=0.0 2023-12-21 19:05:30,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=186400.0, ans=0.125 2023-12-21 19:05:36,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=186466.66666666666, ans=0.125 2023-12-21 19:05:36,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=186466.66666666666, ans=0.125 2023-12-21 19:05:39,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=186466.66666666666, ans=0.0 2023-12-21 19:05:47,588 INFO [train.py:886] (1/4) Epoch 6, batch 4150, loss[loss=0.01571, audio_tagging_loss=0.01571, over 24750.00 frames. ], tot_loss[loss=0.01629, audio_tagging_loss=0.01629, over 4948307.65 frames. ], batch size: 99, lr: 1.73e-02, grad_scale: 64.0 2023-12-21 19:05:51,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.08 vs. limit=12.0 2023-12-21 19:06:15,395 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 19:06:17,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=186666.66666666666, ans=0.125 2023-12-21 19:06:23,379 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.572e+01 2.768e+01 2.919e+01 3.427e+01, threshold=5.536e+01, percent-clipped=0.0 2023-12-21 19:06:27,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=186733.33333333334, ans=0.2 2023-12-21 19:06:33,629 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 19:06:36,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=186800.0, ans=0.0 2023-12-21 19:06:41,008 INFO [train.py:886] (1/4) Epoch 6, batch 4200, loss[loss=0.01612, audio_tagging_loss=0.01612, over 24750.00 frames. ], tot_loss[loss=0.01617, audio_tagging_loss=0.01617, over 4947566.79 frames. ], batch size: 99, lr: 1.73e-02, grad_scale: 64.0 2023-12-21 19:07:04,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=187000.0, ans=0.125 2023-12-21 19:07:14,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.12 vs. limit=12.0 2023-12-21 19:07:16,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=187066.66666666666, ans=0.1 2023-12-21 19:07:16,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=187066.66666666666, ans=0.125 2023-12-21 19:07:22,836 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.69 vs. limit=22.5 2023-12-21 19:07:23,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=187133.33333333334, ans=0.05 2023-12-21 19:07:30,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=187133.33333333334, ans=0.2 2023-12-21 19:07:33,781 INFO [train.py:886] (1/4) Epoch 6, batch 4250, loss[loss=0.01672, audio_tagging_loss=0.01672, over 24750.00 frames. ], tot_loss[loss=0.01616, audio_tagging_loss=0.01616, over 4946659.18 frames. ], batch size: 99, lr: 1.73e-02, grad_scale: 64.0 2023-12-21 19:07:50,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=187266.66666666666, ans=0.2 2023-12-21 19:07:55,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=187333.33333333334, ans=0.5 2023-12-21 19:07:59,525 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.96 vs. limit=15.0 2023-12-21 19:08:01,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=187333.33333333334, ans=0.1 2023-12-21 19:08:07,998 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.128e+01 2.574e+01 2.753e+01 2.984e+01 3.993e+01, threshold=5.507e+01, percent-clipped=0.0 2023-12-21 19:08:16,078 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.33 vs. limit=15.0 2023-12-21 19:08:18,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=187466.66666666666, ans=0.1 2023-12-21 19:08:21,470 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.79 vs. limit=15.0 2023-12-21 19:08:24,687 INFO [train.py:886] (1/4) Epoch 6, batch 4300, loss[loss=0.01397, audio_tagging_loss=0.01397, over 23921.00 frames. ], tot_loss[loss=0.01619, audio_tagging_loss=0.01619, over 4948343.39 frames. ], batch size: 100, lr: 1.73e-02, grad_scale: 64.0 2023-12-21 19:08:41,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=187600.0, ans=0.125 2023-12-21 19:09:07,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=187800.0, ans=0.035 2023-12-21 19:09:13,724 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=19.49 vs. limit=15.0 2023-12-21 19:09:17,053 INFO [train.py:886] (1/4) Epoch 6, batch 4350, loss[loss=0.01728, audio_tagging_loss=0.01728, over 25000.00 frames. ], tot_loss[loss=0.01626, audio_tagging_loss=0.01626, over 4945527.89 frames. ], batch size: 100, lr: 1.73e-02, grad_scale: 64.0 2023-12-21 19:09:20,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=187866.66666666666, ans=0.0 2023-12-21 19:09:20,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=187866.66666666666, ans=0.0 2023-12-21 19:09:27,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=187933.33333333334, ans=0.125 2023-12-21 19:09:30,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=187933.33333333334, ans=0.2 2023-12-21 19:09:38,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=188000.0, ans=0.125 2023-12-21 19:09:51,261 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.309e+01 2.707e+01 2.868e+01 3.047e+01 3.925e+01, threshold=5.736e+01, percent-clipped=0.0 2023-12-21 19:10:08,776 INFO [train.py:886] (1/4) Epoch 6, batch 4400, loss[loss=0.01828, audio_tagging_loss=0.01828, over 24945.00 frames. ], tot_loss[loss=0.01627, audio_tagging_loss=0.01627, over 4943078.04 frames. ], batch size: 100, lr: 1.73e-02, grad_scale: 64.0 2023-12-21 19:10:08,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=188200.0, ans=0.125 2023-12-21 19:10:14,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=188200.0, ans=0.07 2023-12-21 19:10:16,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=188200.0, ans=0.125 2023-12-21 19:11:00,378 INFO [train.py:886] (1/4) Epoch 6, batch 4450, loss[loss=0.02008, audio_tagging_loss=0.02008, over 22682.00 frames. ], tot_loss[loss=0.01639, audio_tagging_loss=0.01639, over 4940459.38 frames. ], batch size: 107, lr: 1.73e-02, grad_scale: 64.0 2023-12-21 19:11:27,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=188666.66666666666, ans=0.1 2023-12-21 19:11:34,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=188733.33333333334, ans=0.125 2023-12-21 19:11:35,072 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.258e+01 2.682e+01 2.838e+01 3.055e+01 3.746e+01, threshold=5.675e+01, percent-clipped=0.0 2023-12-21 19:11:47,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=188800.0, ans=0.1 2023-12-21 19:11:52,453 INFO [train.py:886] (1/4) Epoch 6, batch 4500, loss[loss=0.0162, audio_tagging_loss=0.0162, over 25000.00 frames. ], tot_loss[loss=0.01631, audio_tagging_loss=0.01631, over 4942688.89 frames. ], batch size: 100, lr: 1.72e-02, grad_scale: 64.0 2023-12-21 19:12:01,447 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.67 vs. limit=8.0 2023-12-21 19:12:13,987 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.30 vs. limit=22.5 2023-12-21 19:12:15,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.83 vs. limit=15.0 2023-12-21 19:12:25,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=189066.66666666666, ans=0.0 2023-12-21 19:12:44,043 INFO [train.py:886] (1/4) Epoch 6, batch 4550, loss[loss=0.01948, audio_tagging_loss=0.01948, over 25000.00 frames. ], tot_loss[loss=0.0163, audio_tagging_loss=0.0163, over 4942325.33 frames. ], batch size: 100, lr: 1.72e-02, grad_scale: 64.0 2023-12-21 19:12:52,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=189200.0, ans=0.125 2023-12-21 19:12:57,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=189266.66666666666, ans=0.0 2023-12-21 19:12:59,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=189266.66666666666, ans=0.125 2023-12-21 19:13:13,030 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.91 vs. limit=15.0 2023-12-21 19:13:18,809 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.221e+01 2.583e+01 2.791e+01 2.970e+01 3.966e+01, threshold=5.581e+01, percent-clipped=0.0 2023-12-21 19:13:25,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=189466.66666666666, ans=0.1 2023-12-21 19:13:36,224 INFO [train.py:886] (1/4) Epoch 6, batch 4600, loss[loss=0.01793, audio_tagging_loss=0.01793, over 25000.00 frames. ], tot_loss[loss=0.01625, audio_tagging_loss=0.01625, over 4949401.95 frames. ], batch size: 100, lr: 1.72e-02, grad_scale: 64.0 2023-12-21 19:13:38,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=189533.33333333334, ans=0.1 2023-12-21 19:13:48,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=189600.0, ans=0.05 2023-12-21 19:14:00,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=189666.66666666666, ans=0.125 2023-12-21 19:14:00,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=189666.66666666666, ans=0.0 2023-12-21 19:14:06,035 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=15.0 2023-12-21 19:14:15,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=189733.33333333334, ans=0.125 2023-12-21 19:14:23,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=189800.0, ans=0.1 2023-12-21 19:14:27,548 INFO [train.py:886] (1/4) Epoch 6, batch 4650, loss[loss=0.01804, audio_tagging_loss=0.01804, over 25000.00 frames. ], tot_loss[loss=0.01626, audio_tagging_loss=0.01626, over 4955196.76 frames. ], batch size: 100, lr: 1.72e-02, grad_scale: 64.0 2023-12-21 19:14:35,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=189866.66666666666, ans=0.125 2023-12-21 19:14:57,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=190000.0, ans=0.125 2023-12-21 19:15:02,399 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.265e+01 2.615e+01 2.807e+01 2.981e+01 3.491e+01, threshold=5.613e+01, percent-clipped=0.0 2023-12-21 19:15:02,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=190066.66666666666, ans=0.0 2023-12-21 19:15:18,003 INFO [train.py:886] (1/4) Epoch 6, batch 4700, loss[loss=0.01839, audio_tagging_loss=0.01839, over 24750.00 frames. ], tot_loss[loss=0.01631, audio_tagging_loss=0.01631, over 4947502.05 frames. ], batch size: 99, lr: 1.72e-02, grad_scale: 64.0 2023-12-21 19:15:37,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=190333.33333333334, ans=0.0 2023-12-21 19:15:47,967 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.77 vs. limit=22.5 2023-12-21 19:15:56,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=190466.66666666666, ans=0.07 2023-12-21 19:15:57,520 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.41 vs. limit=15.0 2023-12-21 19:16:05,841 INFO [train.py:886] (1/4) Epoch 6, batch 4750, loss[loss=0.01806, audio_tagging_loss=0.01806, over 24750.00 frames. ], tot_loss[loss=0.01637, audio_tagging_loss=0.01637, over 4942551.45 frames. ], batch size: 99, lr: 1.72e-02, grad_scale: 64.0 2023-12-21 19:16:13,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=190533.33333333334, ans=0.1 2023-12-21 19:16:16,056 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.60 vs. limit=15.0 2023-12-21 19:16:43,666 INFO [train.py:886] (1/4) Epoch 7, batch 0, loss[loss=0.0473, audio_tagging_loss=0.0473, over 20780.00 frames. ], tot_loss[loss=0.0473, audio_tagging_loss=0.0473, over 20780.00 frames. ], batch size: 107, lr: 1.61e-02, grad_scale: 64.0 2023-12-21 19:16:43,667 INFO [train.py:909] (1/4) Computing validation loss 2023-12-21 19:17:05,424 INFO [train.py:917] (1/4) Epoch 7, validation: loss=0.03667, audio_tagging_loss=0.03667, over 3737520.00 frames. 2023-12-21 19:17:05,425 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-21 19:17:07,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=190640.0, ans=0.0 2023-12-21 19:17:23,800 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.430e+01 2.615e+01 2.821e+01 3.087e+01 1.022e+02, threshold=5.642e+01, percent-clipped=4.0 2023-12-21 19:17:32,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=190773.33333333334, ans=0.0 2023-12-21 19:17:33,371 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.06 vs. limit=15.0 2023-12-21 19:17:40,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=190840.0, ans=0.0 2023-12-21 19:17:46,207 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=4.401e-02 2023-12-21 19:17:56,689 INFO [train.py:886] (1/4) Epoch 7, batch 50, loss[loss=0.02153, audio_tagging_loss=0.02153, over 25000.00 frames. ], tot_loss[loss=0.02539, audio_tagging_loss=0.02539, over 1121099.93 frames. ], batch size: 100, lr: 1.61e-02, grad_scale: 32.0 2023-12-21 19:18:03,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=190973.33333333334, ans=0.125 2023-12-21 19:18:28,093 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.72 vs. limit=15.0 2023-12-21 19:18:47,575 INFO [train.py:886] (1/4) Epoch 7, batch 100, loss[loss=0.0168, audio_tagging_loss=0.0168, over 25000.00 frames. ], tot_loss[loss=0.02211, audio_tagging_loss=0.02211, over 1971574.30 frames. ], batch size: 100, lr: 1.61e-02, grad_scale: 32.0 2023-12-21 19:18:48,028 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.58 vs. limit=15.0 2023-12-21 19:18:55,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=191306.66666666666, ans=0.2 2023-12-21 19:19:01,888 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.05 vs. limit=15.0 2023-12-21 19:19:05,817 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.395e+01 2.945e+01 3.158e+01 3.404e+01 4.637e+01, threshold=6.317e+01, percent-clipped=0.0 2023-12-21 19:19:17,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=191440.0, ans=0.125 2023-12-21 19:19:20,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=191506.66666666666, ans=0.125 2023-12-21 19:19:27,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=191573.33333333334, ans=0.1 2023-12-21 19:19:33,682 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.38 vs. limit=15.0 2023-12-21 19:19:36,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=191573.33333333334, ans=0.0 2023-12-21 19:19:38,901 INFO [train.py:886] (1/4) Epoch 7, batch 150, loss[loss=0.017, audio_tagging_loss=0.017, over 25000.00 frames. ], tot_loss[loss=0.02, audio_tagging_loss=0.02, over 2638547.24 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0 2023-12-21 19:19:40,458 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.50 vs. limit=22.5 2023-12-21 19:19:50,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=191706.66666666666, ans=0.125 2023-12-21 19:19:56,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=191706.66666666666, ans=0.125 2023-12-21 19:19:56,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.67 vs. limit=22.5 2023-12-21 19:20:09,395 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.28 vs. limit=15.0 2023-12-21 19:20:10,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=191840.0, ans=0.0 2023-12-21 19:20:17,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=191840.0, ans=0.125 2023-12-21 19:20:23,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=191906.66666666666, ans=0.125 2023-12-21 19:20:25,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=191906.66666666666, ans=0.1 2023-12-21 19:20:29,290 INFO [train.py:886] (1/4) Epoch 7, batch 200, loss[loss=0.01538, audio_tagging_loss=0.01538, over 25000.00 frames. ], tot_loss[loss=0.01864, audio_tagging_loss=0.01864, over 3152919.38 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0 2023-12-21 19:20:45,801 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.92 vs. limit=22.5 2023-12-21 19:20:48,759 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.132e+01 2.567e+01 2.755e+01 2.935e+01 3.522e+01, threshold=5.511e+01, percent-clipped=0.0 2023-12-21 19:21:00,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=192173.33333333334, ans=15.0 2023-12-21 19:21:05,234 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.53 vs. limit=22.5 2023-12-21 19:21:11,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=192240.0, ans=0.0 2023-12-21 19:21:22,180 INFO [train.py:886] (1/4) Epoch 7, batch 250, loss[loss=0.01522, audio_tagging_loss=0.01522, over 25000.00 frames. ], tot_loss[loss=0.01798, audio_tagging_loss=0.01798, over 3552012.06 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0 2023-12-21 19:21:59,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=192506.66666666666, ans=0.125 2023-12-21 19:22:13,475 INFO [train.py:886] (1/4) Epoch 7, batch 300, loss[loss=0.0171, audio_tagging_loss=0.0171, over 25000.00 frames. ], tot_loss[loss=0.01756, audio_tagging_loss=0.01756, over 3856886.42 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0 2023-12-21 19:22:18,886 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.34 vs. limit=15.0 2023-12-21 19:22:26,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=192706.66666666666, ans=0.125 2023-12-21 19:22:31,661 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.239e+01 2.537e+01 2.670e+01 2.875e+01 3.479e+01, threshold=5.340e+01, percent-clipped=0.0 2023-12-21 19:22:42,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=192840.0, ans=0.125 2023-12-21 19:22:46,692 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.08 vs. limit=15.0 2023-12-21 19:23:04,682 INFO [train.py:886] (1/4) Epoch 7, batch 350, loss[loss=0.01527, audio_tagging_loss=0.01527, over 25000.00 frames. ], tot_loss[loss=0.01723, audio_tagging_loss=0.01723, over 4098475.67 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0 2023-12-21 19:23:08,908 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.45 vs. limit=12.0 2023-12-21 19:23:14,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=193040.0, ans=0.0 2023-12-21 19:23:34,468 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.30 vs. limit=6.0 2023-12-21 19:23:43,779 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.12 vs. limit=15.0 2023-12-21 19:23:56,034 INFO [train.py:886] (1/4) Epoch 7, batch 400, loss[loss=0.01751, audio_tagging_loss=0.01751, over 25000.00 frames. ], tot_loss[loss=0.01695, audio_tagging_loss=0.01695, over 4287673.97 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0 2023-12-21 19:24:00,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=193306.66666666666, ans=0.125 2023-12-21 19:24:01,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=193306.66666666666, ans=0.05 2023-12-21 19:24:15,296 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.543e+01 2.742e+01 2.935e+01 3.819e+01, threshold=5.484e+01, percent-clipped=0.0 2023-12-21 19:24:17,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=193440.0, ans=0.125 2023-12-21 19:24:25,643 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.669e+00 2023-12-21 19:24:26,166 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.20 vs. limit=6.0 2023-12-21 19:24:26,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=193506.66666666666, ans=0.1 2023-12-21 19:24:35,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=193506.66666666666, ans=0.05 2023-12-21 19:24:41,986 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.28 vs. limit=15.0 2023-12-21 19:24:48,529 INFO [train.py:886] (1/4) Epoch 7, batch 450, loss[loss=0.01714, audio_tagging_loss=0.01714, over 24014.00 frames. ], tot_loss[loss=0.01671, audio_tagging_loss=0.01671, over 4442286.90 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0 2023-12-21 19:24:52,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=193640.0, ans=0.2 2023-12-21 19:24:55,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=193640.0, ans=0.2 2023-12-21 19:25:00,034 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.82 vs. limit=22.5 2023-12-21 19:25:02,528 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.83 vs. limit=22.5 2023-12-21 19:25:17,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=193773.33333333334, ans=0.05 2023-12-21 19:25:31,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=193906.66666666666, ans=0.125 2023-12-21 19:25:32,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=193906.66666666666, ans=0.09899494936611666 2023-12-21 19:25:36,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.86 vs. limit=12.0 2023-12-21 19:25:40,743 INFO [train.py:886] (1/4) Epoch 7, batch 500, loss[loss=0.01161, audio_tagging_loss=0.01161, over 25000.00 frames. ], tot_loss[loss=0.01645, audio_tagging_loss=0.01645, over 4556281.95 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0 2023-12-21 19:25:41,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=193973.33333333334, ans=0.125 2023-12-21 19:25:55,518 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.91 vs. limit=12.0 2023-12-21 19:25:58,612 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.152e+01 2.495e+01 2.691e+01 2.855e+01 3.742e+01, threshold=5.381e+01, percent-clipped=0.0 2023-12-21 19:26:00,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=194106.66666666666, ans=0.0 2023-12-21 19:26:04,183 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 19:26:04,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=194106.66666666666, ans=0.125 2023-12-21 19:26:19,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=194173.33333333334, ans=0.125 2023-12-21 19:26:28,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=194240.0, ans=0.1 2023-12-21 19:26:30,866 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-21 19:26:31,531 INFO [train.py:886] (1/4) Epoch 7, batch 550, loss[loss=0.0158, audio_tagging_loss=0.0158, over 25000.00 frames. ], tot_loss[loss=0.01636, audio_tagging_loss=0.01636, over 4643966.04 frames. ], batch size: 100, lr: 1.59e-02, grad_scale: 32.0 2023-12-21 19:26:39,570 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.27 vs. limit=22.5 2023-12-21 19:26:40,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=194306.66666666666, ans=0.2 2023-12-21 19:26:48,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=194373.33333333334, ans=0.0 2023-12-21 19:26:52,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=194440.0, ans=0.2 2023-12-21 19:27:02,291 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=22.02 vs. limit=15.0 2023-12-21 19:27:02,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=194506.66666666666, ans=0.0 2023-12-21 19:27:23,615 INFO [train.py:886] (1/4) Epoch 7, batch 600, loss[loss=0.0171, audio_tagging_loss=0.0171, over 25000.00 frames. ], tot_loss[loss=0.01637, audio_tagging_loss=0.01637, over 4703540.41 frames. ], batch size: 100, lr: 1.59e-02, grad_scale: 32.0 2023-12-21 19:27:23,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=194640.0, ans=0.0 2023-12-21 19:27:24,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=194640.0, ans=0.125 2023-12-21 19:27:29,939 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.31 vs. limit=10.0 2023-12-21 19:27:33,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=194706.66666666666, ans=0.1 2023-12-21 19:27:36,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=194706.66666666666, ans=0.125 2023-12-21 19:27:38,304 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.98 vs. limit=6.0 2023-12-21 19:27:42,308 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+01 2.620e+01 2.779e+01 2.985e+01 3.932e+01, threshold=5.559e+01, percent-clipped=0.0 2023-12-21 19:27:47,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=194773.33333333334, ans=0.125 2023-12-21 19:28:01,196 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.09 vs. limit=15.0 2023-12-21 19:28:07,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=194906.66666666666, ans=0.125 2023-12-21 19:28:10,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=194906.66666666666, ans=0.1 2023-12-21 19:28:14,645 INFO [train.py:886] (1/4) Epoch 7, batch 650, loss[loss=0.01799, audio_tagging_loss=0.01799, over 22609.00 frames. ], tot_loss[loss=0.01643, audio_tagging_loss=0.01643, over 4743429.95 frames. ], batch size: 107, lr: 1.59e-02, grad_scale: 32.0 2023-12-21 19:28:30,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=195040.0, ans=0.125 2023-12-21 19:28:51,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=195173.33333333334, ans=0.125 2023-12-21 19:28:52,304 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.54 vs. limit=22.5 2023-12-21 19:29:05,862 INFO [train.py:886] (1/4) Epoch 7, batch 700, loss[loss=0.01462, audio_tagging_loss=0.01462, over 25000.00 frames. ], tot_loss[loss=0.0164, audio_tagging_loss=0.0164, over 4789386.61 frames. ], batch size: 100, lr: 1.59e-02, grad_scale: 32.0 2023-12-21 19:29:24,333 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.205e+01 2.528e+01 2.672e+01 2.888e+01 3.469e+01, threshold=5.344e+01, percent-clipped=0.0 2023-12-21 19:29:25,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=195440.0, ans=0.0 2023-12-21 19:29:28,514 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2023-12-21 19:29:31,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=195440.0, ans=0.1 2023-12-21 19:29:32,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=195440.0, ans=0.0 2023-12-21 19:29:55,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=195573.33333333334, ans=0.125 2023-12-21 19:29:56,836 INFO [train.py:886] (1/4) Epoch 7, batch 750, loss[loss=0.01218, audio_tagging_loss=0.01218, over 24750.00 frames. ], tot_loss[loss=0.01626, audio_tagging_loss=0.01626, over 4826004.97 frames. ], batch size: 99, lr: 1.59e-02, grad_scale: 32.0 2023-12-21 19:29:58,023 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.735e+00 2023-12-21 19:30:07,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.42 vs. limit=15.0 2023-12-21 19:30:11,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=195706.66666666666, ans=0.1 2023-12-21 19:30:20,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=195773.33333333334, ans=0.2 2023-12-21 19:30:44,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.09 vs. limit=15.0 2023-12-21 19:30:45,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=195906.66666666666, ans=0.125 2023-12-21 19:30:46,833 INFO [train.py:886] (1/4) Epoch 7, batch 800, loss[loss=0.01728, audio_tagging_loss=0.01728, over 25000.00 frames. ], tot_loss[loss=0.0162, audio_tagging_loss=0.0162, over 4859393.78 frames. ], batch size: 100, lr: 1.59e-02, grad_scale: 32.0 2023-12-21 19:31:05,900 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.300e+01 2.570e+01 2.779e+01 3.006e+01 3.610e+01, threshold=5.558e+01, percent-clipped=0.0 2023-12-21 19:31:09,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=196106.66666666666, ans=0.2 2023-12-21 19:31:27,356 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2023-12-21 19:31:35,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=196240.0, ans=0.125 2023-12-21 19:31:35,970 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2023-12-21 19:31:38,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=196306.66666666666, ans=0.1 2023-12-21 19:31:39,177 INFO [train.py:886] (1/4) Epoch 7, batch 850, loss[loss=0.0143, audio_tagging_loss=0.0143, over 24750.00 frames. ], tot_loss[loss=0.01614, audio_tagging_loss=0.01614, over 4883610.73 frames. ], batch size: 99, lr: 1.59e-02, grad_scale: 32.0 2023-12-21 19:31:39,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=196306.66666666666, ans=0.0 2023-12-21 19:31:42,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=196306.66666666666, ans=0.0 2023-12-21 19:32:00,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=196440.0, ans=0.125 2023-12-21 19:32:05,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=196440.0, ans=0.125 2023-12-21 19:32:08,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=196440.0, ans=0.125 2023-12-21 19:32:14,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=196506.66666666666, ans=0.2 2023-12-21 19:32:31,639 INFO [train.py:886] (1/4) Epoch 7, batch 900, loss[loss=0.01226, audio_tagging_loss=0.01226, over 24750.00 frames. ], tot_loss[loss=0.01604, audio_tagging_loss=0.01604, over 4895436.41 frames. ], batch size: 99, lr: 1.59e-02, grad_scale: 32.0 2023-12-21 19:32:32,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=196640.0, ans=0.1 2023-12-21 19:32:35,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=196640.0, ans=0.1 2023-12-21 19:32:41,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=196706.66666666666, ans=0.1 2023-12-21 19:32:50,059 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.294e+01 2.564e+01 2.733e+01 2.886e+01 3.706e+01, threshold=5.467e+01, percent-clipped=0.0 2023-12-21 19:32:55,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=196773.33333333334, ans=0.125 2023-12-21 19:33:22,390 INFO [train.py:886] (1/4) Epoch 7, batch 950, loss[loss=0.0147, audio_tagging_loss=0.0147, over 24750.00 frames. ], tot_loss[loss=0.01614, audio_tagging_loss=0.01614, over 4896752.13 frames. ], batch size: 99, lr: 1.58e-02, grad_scale: 32.0 2023-12-21 19:33:26,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=196973.33333333334, ans=0.125 2023-12-21 19:33:33,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=197040.0, ans=0.2 2023-12-21 19:34:14,447 INFO [train.py:886] (1/4) Epoch 7, batch 1000, loss[loss=0.01686, audio_tagging_loss=0.01686, over 24750.00 frames. ], tot_loss[loss=0.01614, audio_tagging_loss=0.01614, over 4910911.41 frames. ], batch size: 99, lr: 1.58e-02, grad_scale: 32.0 2023-12-21 19:34:32,233 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.096e+01 2.519e+01 2.705e+01 2.941e+01 3.391e+01, threshold=5.409e+01, percent-clipped=0.0 2023-12-21 19:34:42,435 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.55 vs. limit=15.0 2023-12-21 19:34:46,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=197506.66666666666, ans=0.07 2023-12-21 19:34:51,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=197506.66666666666, ans=0.025 2023-12-21 19:34:54,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=197573.33333333334, ans=0.1 2023-12-21 19:35:05,196 INFO [train.py:886] (1/4) Epoch 7, batch 1050, loss[loss=0.01478, audio_tagging_loss=0.01478, over 24750.00 frames. ], tot_loss[loss=0.01603, audio_tagging_loss=0.01603, over 4920012.53 frames. ], batch size: 99, lr: 1.58e-02, grad_scale: 32.0 2023-12-21 19:35:13,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=197640.0, ans=0.0 2023-12-21 19:35:19,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=197706.66666666666, ans=0.0 2023-12-21 19:35:26,371 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.14 vs. limit=12.0 2023-12-21 19:35:28,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=197773.33333333334, ans=0.125 2023-12-21 19:35:34,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=197773.33333333334, ans=0.0 2023-12-21 19:35:49,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=197906.66666666666, ans=0.125 2023-12-21 19:35:52,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=197906.66666666666, ans=0.0 2023-12-21 19:35:53,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=197906.66666666666, ans=0.0 2023-12-21 19:35:56,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=197906.66666666666, ans=0.0 2023-12-21 19:35:57,193 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.17 vs. limit=15.0 2023-12-21 19:35:57,693 INFO [train.py:886] (1/4) Epoch 7, batch 1100, loss[loss=0.01674, audio_tagging_loss=0.01674, over 24750.00 frames. ], tot_loss[loss=0.01593, audio_tagging_loss=0.01593, over 4928380.66 frames. ], batch size: 99, lr: 1.58e-02, grad_scale: 32.0 2023-12-21 19:36:03,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=197973.33333333334, ans=0.125 2023-12-21 19:36:03,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=197973.33333333334, ans=0.0 2023-12-21 19:36:15,999 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.153e+01 2.563e+01 2.707e+01 2.877e+01 3.432e+01, threshold=5.414e+01, percent-clipped=0.0 2023-12-21 19:36:19,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=198106.66666666666, ans=0.05 2023-12-21 19:36:28,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=198173.33333333334, ans=0.125 2023-12-21 19:36:33,992 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.46 vs. limit=10.0 2023-12-21 19:36:49,288 INFO [train.py:886] (1/4) Epoch 7, batch 1150, loss[loss=0.01421, audio_tagging_loss=0.01421, over 25000.00 frames. ], tot_loss[loss=0.01593, audio_tagging_loss=0.01593, over 4933188.39 frames. ], batch size: 100, lr: 1.58e-02, grad_scale: 32.0 2023-12-21 19:37:06,602 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-21 19:37:16,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=198440.0, ans=0.0 2023-12-21 19:37:17,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=198440.0, ans=0.125 2023-12-21 19:37:33,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=198573.33333333334, ans=0.0 2023-12-21 19:37:39,458 INFO [train.py:886] (1/4) Epoch 7, batch 1200, loss[loss=0.01543, audio_tagging_loss=0.01543, over 25000.00 frames. ], tot_loss[loss=0.016, audio_tagging_loss=0.016, over 4936610.08 frames. ], batch size: 100, lr: 1.58e-02, grad_scale: 32.0 2023-12-21 19:37:39,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=198640.0, ans=0.125 2023-12-21 19:37:49,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=198706.66666666666, ans=0.1 2023-12-21 19:37:52,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=198706.66666666666, ans=0.2 2023-12-21 19:37:58,028 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.173e+01 2.536e+01 2.723e+01 2.860e+01 3.472e+01, threshold=5.446e+01, percent-clipped=0.0 2023-12-21 19:38:00,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=198773.33333333334, ans=0.125 2023-12-21 19:38:12,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=198840.0, ans=0.0 2023-12-21 19:38:19,354 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.597e-01 2023-12-21 19:38:31,184 INFO [train.py:886] (1/4) Epoch 7, batch 1250, loss[loss=0.01491, audio_tagging_loss=0.01491, over 24750.00 frames. ], tot_loss[loss=0.01613, audio_tagging_loss=0.01613, over 4931479.28 frames. ], batch size: 99, lr: 1.58e-02, grad_scale: 32.0 2023-12-21 19:38:40,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=198973.33333333334, ans=0.2 2023-12-21 19:38:44,254 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.53 vs. limit=22.5 2023-12-21 19:38:50,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=199040.0, ans=0.125 2023-12-21 19:39:00,986 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.97 vs. limit=15.0 2023-12-21 19:39:03,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=199173.33333333334, ans=0.0 2023-12-21 19:39:23,916 INFO [train.py:886] (1/4) Epoch 7, batch 1300, loss[loss=0.01798, audio_tagging_loss=0.01798, over 24750.00 frames. ], tot_loss[loss=0.01625, audio_tagging_loss=0.01625, over 4930734.51 frames. ], batch size: 99, lr: 1.58e-02, grad_scale: 32.0 2023-12-21 19:39:24,046 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 19:39:42,182 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.258e+01 2.626e+01 2.798e+01 3.036e+01 3.776e+01, threshold=5.596e+01, percent-clipped=0.0 2023-12-21 19:39:49,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=199440.0, ans=15.0 2023-12-21 19:40:15,021 INFO [train.py:886] (1/4) Epoch 7, batch 1350, loss[loss=0.015, audio_tagging_loss=0.015, over 24750.00 frames. ], tot_loss[loss=0.01624, audio_tagging_loss=0.01624, over 4928972.82 frames. ], batch size: 99, lr: 1.57e-02, grad_scale: 32.0 2023-12-21 19:40:21,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=199640.0, ans=0.1 2023-12-21 19:40:39,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=199773.33333333334, ans=0.2 2023-12-21 19:41:02,546 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.21 vs. limit=15.0 2023-12-21 19:41:04,767 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.19 vs. limit=15.0 2023-12-21 19:41:06,915 INFO [train.py:886] (1/4) Epoch 7, batch 1400, loss[loss=0.01588, audio_tagging_loss=0.01588, over 24750.00 frames. ], tot_loss[loss=0.01616, audio_tagging_loss=0.01616, over 4940968.32 frames. ], batch size: 99, lr: 1.57e-02, grad_scale: 32.0 2023-12-21 19:41:14,184 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.07 vs. limit=10.0 2023-12-21 19:41:14,243 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.90 vs. limit=6.0 2023-12-21 19:41:14,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=199973.33333333334, ans=0.1 2023-12-21 19:41:18,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=200040.0, ans=0.125 2023-12-21 19:41:21,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=200040.0, ans=0.125 2023-12-21 19:41:25,722 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.262e+01 2.540e+01 2.768e+01 3.021e+01 3.899e+01, threshold=5.536e+01, percent-clipped=0.0 2023-12-21 19:41:31,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=200106.66666666666, ans=0.125 2023-12-21 19:41:35,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=200106.66666666666, ans=0.125 2023-12-21 19:41:40,043 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.93 vs. limit=10.0 2023-12-21 19:41:44,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=200173.33333333334, ans=0.0 2023-12-21 19:41:54,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=200240.0, ans=0.125 2023-12-21 19:41:58,864 INFO [train.py:886] (1/4) Epoch 7, batch 1450, loss[loss=0.01779, audio_tagging_loss=0.01779, over 25000.00 frames. ], tot_loss[loss=0.01596, audio_tagging_loss=0.01596, over 4942584.58 frames. ], batch size: 100, lr: 1.57e-02, grad_scale: 32.0 2023-12-21 19:42:06,218 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 19:42:12,746 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.308e-01 2023-12-21 19:42:21,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=200440.0, ans=0.1 2023-12-21 19:42:30,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=200506.66666666666, ans=0.0 2023-12-21 19:42:32,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=200506.66666666666, ans=0.0 2023-12-21 19:42:38,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=200573.33333333334, ans=0.1 2023-12-21 19:42:48,865 INFO [train.py:886] (1/4) Epoch 7, batch 1500, loss[loss=0.01485, audio_tagging_loss=0.01485, over 25000.00 frames. ], tot_loss[loss=0.01595, audio_tagging_loss=0.01595, over 4947145.92 frames. ], batch size: 100, lr: 1.57e-02, grad_scale: 32.0 2023-12-21 19:42:59,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=200706.66666666666, ans=0.125 2023-12-21 19:43:07,817 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.223e+01 2.525e+01 2.780e+01 2.987e+01 4.498e+01, threshold=5.560e+01, percent-clipped=0.0 2023-12-21 19:43:12,145 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.90 vs. limit=15.0 2023-12-21 19:43:15,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=200773.33333333334, ans=0.125 2023-12-21 19:43:33,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=200906.66666666666, ans=0.1 2023-12-21 19:43:40,013 INFO [train.py:886] (1/4) Epoch 7, batch 1550, loss[loss=0.01475, audio_tagging_loss=0.01475, over 24750.00 frames. ], tot_loss[loss=0.01602, audio_tagging_loss=0.01602, over 4947839.78 frames. ], batch size: 99, lr: 1.57e-02, grad_scale: 32.0 2023-12-21 19:43:43,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=200973.33333333334, ans=0.2 2023-12-21 19:43:49,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=201040.0, ans=0.0 2023-12-21 19:43:53,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=201040.0, ans=0.125 2023-12-21 19:44:03,675 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.38 vs. limit=15.0 2023-12-21 19:44:05,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=201106.66666666666, ans=0.125 2023-12-21 19:44:10,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=201173.33333333334, ans=0.125 2023-12-21 19:44:14,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=201173.33333333334, ans=0.09899494936611666 2023-12-21 19:44:24,474 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.498e-01 2023-12-21 19:44:24,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=201240.0, ans=0.0 2023-12-21 19:44:29,925 INFO [train.py:886] (1/4) Epoch 7, batch 1600, loss[loss=0.01467, audio_tagging_loss=0.01467, over 25000.00 frames. ], tot_loss[loss=0.01601, audio_tagging_loss=0.01601, over 4939010.10 frames. ], batch size: 100, lr: 1.57e-02, grad_scale: 32.0 2023-12-21 19:44:30,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=201306.66666666666, ans=0.125 2023-12-21 19:44:42,115 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.65 vs. limit=15.0 2023-12-21 19:44:49,145 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.239e+01 2.627e+01 2.765e+01 2.991e+01 3.550e+01, threshold=5.529e+01, percent-clipped=0.0 2023-12-21 19:45:07,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=201506.66666666666, ans=0.2 2023-12-21 19:45:14,488 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.63 vs. limit=22.5 2023-12-21 19:45:17,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=201573.33333333334, ans=0.125 2023-12-21 19:45:21,529 INFO [train.py:886] (1/4) Epoch 7, batch 1650, loss[loss=0.01525, audio_tagging_loss=0.01525, over 25000.00 frames. ], tot_loss[loss=0.01591, audio_tagging_loss=0.01591, over 4941035.30 frames. ], batch size: 100, lr: 1.57e-02, grad_scale: 32.0 2023-12-21 19:45:31,139 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.75 vs. limit=15.0 2023-12-21 19:45:57,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=201840.0, ans=0.1 2023-12-21 19:46:12,802 INFO [train.py:886] (1/4) Epoch 7, batch 1700, loss[loss=0.01493, audio_tagging_loss=0.01493, over 25000.00 frames. ], tot_loss[loss=0.01589, audio_tagging_loss=0.01589, over 4940025.03 frames. ], batch size: 100, lr: 1.57e-02, grad_scale: 32.0 2023-12-21 19:46:13,314 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.81 vs. limit=6.0 2023-12-21 19:46:21,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=202040.0, ans=0.125 2023-12-21 19:46:22,614 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.68 vs. limit=12.0 2023-12-21 19:46:30,527 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.180e+01 2.547e+01 2.700e+01 2.895e+01 3.451e+01, threshold=5.401e+01, percent-clipped=0.0 2023-12-21 19:46:36,887 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.63 vs. limit=10.0 2023-12-21 19:46:37,579 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=8.447e-01 2023-12-21 19:46:38,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=202106.66666666666, ans=0.05 2023-12-21 19:46:44,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=202173.33333333334, ans=0.125 2023-12-21 19:46:45,980 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.38 vs. limit=10.0 2023-12-21 19:46:49,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=202173.33333333334, ans=0.125 2023-12-21 19:46:56,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=202240.0, ans=0.0 2023-12-21 19:47:02,777 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.02 vs. limit=15.0 2023-12-21 19:47:03,154 INFO [train.py:886] (1/4) Epoch 7, batch 1750, loss[loss=0.01915, audio_tagging_loss=0.01915, over 25000.00 frames. ], tot_loss[loss=0.01584, audio_tagging_loss=0.01584, over 4938740.08 frames. ], batch size: 100, lr: 1.56e-02, grad_scale: 32.0 2023-12-21 19:47:07,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=24.57 vs. limit=22.5 2023-12-21 19:47:18,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=202373.33333333334, ans=0.125 2023-12-21 19:47:22,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=202440.0, ans=0.0 2023-12-21 19:47:26,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=202440.0, ans=0.0 2023-12-21 19:47:31,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=202440.0, ans=0.1 2023-12-21 19:47:45,447 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.02 vs. limit=22.5 2023-12-21 19:47:53,953 INFO [train.py:886] (1/4) Epoch 7, batch 1800, loss[loss=0.0174, audio_tagging_loss=0.0174, over 24750.00 frames. ], tot_loss[loss=0.0158, audio_tagging_loss=0.0158, over 4945328.02 frames. ], batch size: 99, lr: 1.56e-02, grad_scale: 32.0 2023-12-21 19:48:01,107 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.852e-01 2023-12-21 19:48:02,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=202640.0, ans=0.1 2023-12-21 19:48:12,067 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+01 2.560e+01 2.744e+01 2.950e+01 3.454e+01, threshold=5.487e+01, percent-clipped=0.0 2023-12-21 19:48:22,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=202773.33333333334, ans=0.2 2023-12-21 19:48:31,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=202840.0, ans=0.1 2023-12-21 19:48:33,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=202840.0, ans=22.5 2023-12-21 19:48:35,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=202906.66666666666, ans=0.125 2023-12-21 19:48:45,171 INFO [train.py:886] (1/4) Epoch 7, batch 1850, loss[loss=0.01549, audio_tagging_loss=0.01549, over 23982.00 frames. ], tot_loss[loss=0.01582, audio_tagging_loss=0.01582, over 4949132.89 frames. ], batch size: 100, lr: 1.56e-02, grad_scale: 32.0 2023-12-21 19:48:49,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=202973.33333333334, ans=0.09899494936611666 2023-12-21 19:48:51,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=202973.33333333334, ans=0.0 2023-12-21 19:49:29,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=203240.0, ans=0.0 2023-12-21 19:49:37,320 INFO [train.py:886] (1/4) Epoch 7, batch 1900, loss[loss=0.01504, audio_tagging_loss=0.01504, over 24750.00 frames. ], tot_loss[loss=0.01589, audio_tagging_loss=0.01589, over 4946394.34 frames. ], batch size: 99, lr: 1.56e-02, grad_scale: 32.0 2023-12-21 19:49:45,149 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.60 vs. limit=15.0 2023-12-21 19:49:45,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=203306.66666666666, ans=0.125 2023-12-21 19:49:48,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=203373.33333333334, ans=0.025 2023-12-21 19:49:56,105 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 2.649e+01 2.834e+01 2.981e+01 3.501e+01, threshold=5.668e+01, percent-clipped=0.0 2023-12-21 19:50:20,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=203573.33333333334, ans=0.125 2023-12-21 19:50:29,006 INFO [train.py:886] (1/4) Epoch 7, batch 1950, loss[loss=0.01827, audio_tagging_loss=0.01827, over 24750.00 frames. ], tot_loss[loss=0.01593, audio_tagging_loss=0.01593, over 4943045.57 frames. ], batch size: 99, lr: 1.56e-02, grad_scale: 32.0 2023-12-21 19:50:34,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=203640.0, ans=0.125 2023-12-21 19:51:20,727 INFO [train.py:886] (1/4) Epoch 7, batch 2000, loss[loss=0.01506, audio_tagging_loss=0.01506, over 25000.00 frames. ], tot_loss[loss=0.01579, audio_tagging_loss=0.01579, over 4934782.21 frames. ], batch size: 100, lr: 1.56e-02, grad_scale: 32.0 2023-12-21 19:51:23,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=203973.33333333334, ans=0.0 2023-12-21 19:51:33,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=204040.0, ans=0.2 2023-12-21 19:51:40,085 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.281e+01 2.631e+01 2.743e+01 2.994e+01 3.542e+01, threshold=5.486e+01, percent-clipped=0.0 2023-12-21 19:51:46,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=204106.66666666666, ans=0.125 2023-12-21 19:51:55,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=204173.33333333334, ans=0.0 2023-12-21 19:52:05,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=204240.0, ans=0.025 2023-12-21 19:52:07,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=204240.0, ans=15.0 2023-12-21 19:52:12,973 INFO [train.py:886] (1/4) Epoch 7, batch 2050, loss[loss=0.01666, audio_tagging_loss=0.01666, over 25000.00 frames. ], tot_loss[loss=0.01574, audio_tagging_loss=0.01574, over 4940819.06 frames. ], batch size: 100, lr: 1.56e-02, grad_scale: 64.0 2023-12-21 19:52:30,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=204373.33333333334, ans=0.125 2023-12-21 19:52:32,112 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.91 vs. limit=12.0 2023-12-21 19:52:32,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=204440.0, ans=0.125 2023-12-21 19:52:37,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=204440.0, ans=0.2 2023-12-21 19:52:58,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=204573.33333333334, ans=0.125 2023-12-21 19:53:03,697 INFO [train.py:886] (1/4) Epoch 7, batch 2100, loss[loss=0.01712, audio_tagging_loss=0.01712, over 25000.00 frames. ], tot_loss[loss=0.01571, audio_tagging_loss=0.01571, over 4951046.43 frames. ], batch size: 100, lr: 1.56e-02, grad_scale: 64.0 2023-12-21 19:53:09,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=204640.0, ans=0.125 2023-12-21 19:53:11,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=204640.0, ans=0.125 2023-12-21 19:53:15,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=204706.66666666666, ans=0.0 2023-12-21 19:53:16,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=204706.66666666666, ans=0.0 2023-12-21 19:53:23,313 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.249e+01 2.557e+01 2.714e+01 2.947e+01 3.593e+01, threshold=5.429e+01, percent-clipped=0.0 2023-12-21 19:53:29,707 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.38 vs. limit=12.0 2023-12-21 19:53:42,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=204840.0, ans=0.2 2023-12-21 19:53:43,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=204840.0, ans=0.2 2023-12-21 19:53:44,180 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.03 vs. limit=10.0 2023-12-21 19:53:46,304 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.77 vs. limit=15.0 2023-12-21 19:53:54,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=204906.66666666666, ans=0.2 2023-12-21 19:53:56,456 INFO [train.py:886] (1/4) Epoch 7, batch 2150, loss[loss=0.01634, audio_tagging_loss=0.01634, over 24750.00 frames. ], tot_loss[loss=0.0159, audio_tagging_loss=0.0159, over 4956234.89 frames. ], batch size: 99, lr: 1.55e-02, grad_scale: 64.0 2023-12-21 19:53:56,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=204973.33333333334, ans=0.1 2023-12-21 19:53:57,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=204973.33333333334, ans=0.125 2023-12-21 19:54:07,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=205040.0, ans=0.125 2023-12-21 19:54:07,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=205040.0, ans=0.0 2023-12-21 19:54:17,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=205106.66666666666, ans=0.125 2023-12-21 19:54:17,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=205106.66666666666, ans=0.125 2023-12-21 19:54:20,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=205106.66666666666, ans=0.0 2023-12-21 19:54:29,109 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2023-12-21 19:54:29,277 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.08 vs. limit=22.5 2023-12-21 19:54:30,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=205173.33333333334, ans=0.0 2023-12-21 19:54:36,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=205240.0, ans=0.1 2023-12-21 19:54:40,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=205240.0, ans=0.0 2023-12-21 19:54:42,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=205240.0, ans=0.125 2023-12-21 19:54:47,854 INFO [train.py:886] (1/4) Epoch 7, batch 2200, loss[loss=0.01757, audio_tagging_loss=0.01757, over 24750.00 frames. ], tot_loss[loss=0.01594, audio_tagging_loss=0.01594, over 4951520.58 frames. ], batch size: 99, lr: 1.55e-02, grad_scale: 64.0 2023-12-21 19:54:56,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=205306.66666666666, ans=0.2 2023-12-21 19:55:06,281 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 2.584e+01 2.750e+01 3.020e+01 3.487e+01, threshold=5.500e+01, percent-clipped=0.0 2023-12-21 19:55:13,858 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.612e-01 2023-12-21 19:55:16,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=205440.0, ans=0.0 2023-12-21 19:55:16,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=205440.0, ans=0.04949747468305833 2023-12-21 19:55:19,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=205506.66666666666, ans=0.125 2023-12-21 19:55:25,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=205506.66666666666, ans=0.0 2023-12-21 19:55:35,293 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.29 vs. limit=22.5 2023-12-21 19:55:38,662 INFO [train.py:886] (1/4) Epoch 7, batch 2250, loss[loss=0.01771, audio_tagging_loss=0.01771, over 24750.00 frames. ], tot_loss[loss=0.016, audio_tagging_loss=0.016, over 4942078.53 frames. ], batch size: 99, lr: 1.55e-02, grad_scale: 64.0 2023-12-21 19:55:49,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=205706.66666666666, ans=0.125 2023-12-21 19:56:27,151 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=6.219e-01 2023-12-21 19:56:29,787 INFO [train.py:886] (1/4) Epoch 7, batch 2300, loss[loss=0.0144, audio_tagging_loss=0.0144, over 25000.00 frames. ], tot_loss[loss=0.0159, audio_tagging_loss=0.0159, over 4943005.81 frames. ], batch size: 100, lr: 1.55e-02, grad_scale: 64.0 2023-12-21 19:56:45,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=206040.0, ans=0.125 2023-12-21 19:56:48,195 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+01 2.578e+01 2.756e+01 2.990e+01 3.667e+01, threshold=5.511e+01, percent-clipped=0.0 2023-12-21 19:56:50,241 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.21 vs. limit=15.0 2023-12-21 19:57:18,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=206240.0, ans=0.05 2023-12-21 19:57:20,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=206306.66666666666, ans=0.125 2023-12-21 19:57:21,960 INFO [train.py:886] (1/4) Epoch 7, batch 2350, loss[loss=0.01618, audio_tagging_loss=0.01618, over 25000.00 frames. ], tot_loss[loss=0.0159, audio_tagging_loss=0.0159, over 4946662.94 frames. ], batch size: 100, lr: 1.55e-02, grad_scale: 64.0 2023-12-21 19:57:27,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=206306.66666666666, ans=0.0 2023-12-21 19:57:29,627 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.234e+00 2023-12-21 19:57:31,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=206373.33333333334, ans=0.125 2023-12-21 19:57:32,915 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=27.43 vs. limit=22.5 2023-12-21 19:57:33,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=206373.33333333334, ans=0.5 2023-12-21 19:57:36,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=206373.33333333334, ans=0.1 2023-12-21 19:57:36,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=206373.33333333334, ans=0.1 2023-12-21 19:57:44,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=206440.0, ans=0.2 2023-12-21 19:57:50,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.75 vs. limit=12.0 2023-12-21 19:58:01,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=206573.33333333334, ans=0.125 2023-12-21 19:58:13,400 INFO [train.py:886] (1/4) Epoch 7, batch 2400, loss[loss=0.01709, audio_tagging_loss=0.01709, over 25000.00 frames. ], tot_loss[loss=0.01584, audio_tagging_loss=0.01584, over 4949335.91 frames. ], batch size: 100, lr: 1.55e-02, grad_scale: 64.0 2023-12-21 19:58:19,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=206640.0, ans=0.0 2023-12-21 19:58:21,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=206640.0, ans=0.0 2023-12-21 19:58:31,991 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.300e+01 2.547e+01 2.719e+01 2.913e+01 3.717e+01, threshold=5.437e+01, percent-clipped=0.0 2023-12-21 19:58:36,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=206773.33333333334, ans=0.0 2023-12-21 19:58:46,165 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.41 vs. limit=22.5 2023-12-21 19:58:53,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=206906.66666666666, ans=0.125 2023-12-21 19:59:05,242 INFO [train.py:886] (1/4) Epoch 7, batch 2450, loss[loss=0.01651, audio_tagging_loss=0.01651, over 25000.00 frames. ], tot_loss[loss=0.01585, audio_tagging_loss=0.01585, over 4952546.92 frames. ], batch size: 100, lr: 1.55e-02, grad_scale: 64.0 2023-12-21 19:59:10,623 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.82 vs. limit=10.0 2023-12-21 19:59:11,423 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.48 vs. limit=15.0 2023-12-21 19:59:14,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=207040.0, ans=0.0 2023-12-21 19:59:27,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=207106.66666666666, ans=0.125 2023-12-21 19:59:32,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=207106.66666666666, ans=0.125 2023-12-21 19:59:56,866 INFO [train.py:886] (1/4) Epoch 7, batch 2500, loss[loss=0.01541, audio_tagging_loss=0.01541, over 24750.00 frames. ], tot_loss[loss=0.01604, audio_tagging_loss=0.01604, over 4951406.61 frames. ], batch size: 99, lr: 1.55e-02, grad_scale: 64.0 2023-12-21 20:00:02,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=207306.66666666666, ans=0.1 2023-12-21 20:00:09,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=207373.33333333334, ans=0.125 2023-12-21 20:00:13,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=207373.33333333334, ans=0.125 2023-12-21 20:00:15,329 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.250e+01 2.623e+01 2.777e+01 2.967e+01 3.606e+01, threshold=5.553e+01, percent-clipped=0.0 2023-12-21 20:00:17,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=207440.0, ans=0.0 2023-12-21 20:00:30,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=207506.66666666666, ans=15.0 2023-12-21 20:00:39,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=207573.33333333334, ans=0.0 2023-12-21 20:00:48,973 INFO [train.py:886] (1/4) Epoch 7, batch 2550, loss[loss=0.01698, audio_tagging_loss=0.01698, over 25000.00 frames. ], tot_loss[loss=0.01617, audio_tagging_loss=0.01617, over 4944584.96 frames. ], batch size: 100, lr: 1.55e-02, grad_scale: 64.0 2023-12-21 20:01:16,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=207773.33333333334, ans=0.1 2023-12-21 20:01:30,041 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2023-12-21 20:01:34,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=207906.66666666666, ans=0.1 2023-12-21 20:01:38,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=207906.66666666666, ans=0.0 2023-12-21 20:01:41,314 INFO [train.py:886] (1/4) Epoch 7, batch 2600, loss[loss=0.01468, audio_tagging_loss=0.01468, over 25000.00 frames. ], tot_loss[loss=0.01608, audio_tagging_loss=0.01608, over 4938983.03 frames. ], batch size: 100, lr: 1.54e-02, grad_scale: 64.0 2023-12-21 20:01:48,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=207973.33333333334, ans=0.125 2023-12-21 20:01:53,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=208040.0, ans=0.125 2023-12-21 20:01:58,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=208040.0, ans=0.09899494936611666 2023-12-21 20:01:59,604 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.206e+01 2.546e+01 2.741e+01 2.943e+01 4.018e+01, threshold=5.482e+01, percent-clipped=0.0 2023-12-21 20:02:12,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=208173.33333333334, ans=0.125 2023-12-21 20:02:13,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=208173.33333333334, ans=0.0 2023-12-21 20:02:30,439 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.88 vs. limit=22.5 2023-12-21 20:02:32,848 INFO [train.py:886] (1/4) Epoch 7, batch 2650, loss[loss=0.01538, audio_tagging_loss=0.01538, over 24750.00 frames. ], tot_loss[loss=0.01604, audio_tagging_loss=0.01604, over 4939797.64 frames. ], batch size: 99, lr: 1.54e-02, grad_scale: 64.0 2023-12-21 20:02:45,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=208373.33333333334, ans=0.1 2023-12-21 20:02:51,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=208373.33333333334, ans=15.0 2023-12-21 20:02:58,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=208440.0, ans=0.0 2023-12-21 20:03:01,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=208440.0, ans=0.0 2023-12-21 20:03:04,540 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.82 vs. limit=15.0 2023-12-21 20:03:05,477 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.97 vs. limit=15.0 2023-12-21 20:03:06,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=208506.66666666666, ans=0.2 2023-12-21 20:03:15,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=208573.33333333334, ans=0.125 2023-12-21 20:03:24,918 INFO [train.py:886] (1/4) Epoch 7, batch 2700, loss[loss=0.01754, audio_tagging_loss=0.01754, over 25000.00 frames. ], tot_loss[loss=0.01603, audio_tagging_loss=0.01603, over 4945073.22 frames. ], batch size: 100, lr: 1.54e-02, grad_scale: 64.0 2023-12-21 20:03:35,554 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.33 vs. limit=22.5 2023-12-21 20:03:43,481 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.123e+01 2.523e+01 2.665e+01 2.872e+01 3.649e+01, threshold=5.330e+01, percent-clipped=0.0 2023-12-21 20:03:50,210 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.95 vs. limit=15.0 2023-12-21 20:03:54,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=208773.33333333334, ans=0.2 2023-12-21 20:03:59,707 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.33 vs. limit=12.0 2023-12-21 20:04:00,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=208840.0, ans=0.0 2023-12-21 20:04:16,740 INFO [train.py:886] (1/4) Epoch 7, batch 2750, loss[loss=0.01709, audio_tagging_loss=0.01709, over 25000.00 frames. ], tot_loss[loss=0.016, audio_tagging_loss=0.016, over 4948293.85 frames. ], batch size: 100, lr: 1.54e-02, grad_scale: 64.0 2023-12-21 20:04:28,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=209040.0, ans=0.125 2023-12-21 20:04:29,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=209040.0, ans=0.0 2023-12-21 20:04:49,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=209173.33333333334, ans=0.125 2023-12-21 20:04:54,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=209173.33333333334, ans=0.0 2023-12-21 20:05:08,637 INFO [train.py:886] (1/4) Epoch 7, batch 2800, loss[loss=0.02039, audio_tagging_loss=0.02039, over 24955.00 frames. ], tot_loss[loss=0.016, audio_tagging_loss=0.016, over 4948700.90 frames. ], batch size: 100, lr: 1.54e-02, grad_scale: 64.0 2023-12-21 20:05:10,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=209306.66666666666, ans=0.125 2023-12-21 20:05:12,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=209306.66666666666, ans=0.125 2023-12-21 20:05:15,228 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.21 vs. limit=15.0 2023-12-21 20:05:28,167 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.269e+01 2.667e+01 2.782e+01 2.958e+01 3.854e+01, threshold=5.564e+01, percent-clipped=0.0 2023-12-21 20:05:45,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=209506.66666666666, ans=0.125 2023-12-21 20:05:48,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=209506.66666666666, ans=0.125 2023-12-21 20:06:00,781 INFO [train.py:886] (1/4) Epoch 7, batch 2850, loss[loss=0.01752, audio_tagging_loss=0.01752, over 24750.00 frames. ], tot_loss[loss=0.01607, audio_tagging_loss=0.01607, over 4944484.10 frames. ], batch size: 99, lr: 1.54e-02, grad_scale: 64.0 2023-12-21 20:06:14,453 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.96 vs. limit=22.5 2023-12-21 20:06:20,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=209773.33333333334, ans=0.025 2023-12-21 20:06:24,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=209773.33333333334, ans=0.0 2023-12-21 20:06:26,319 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.55 vs. limit=15.0 2023-12-21 20:06:51,797 INFO [train.py:886] (1/4) Epoch 7, batch 2900, loss[loss=0.01359, audio_tagging_loss=0.01359, over 25000.00 frames. ], tot_loss[loss=0.01611, audio_tagging_loss=0.01611, over 4943630.49 frames. ], batch size: 100, lr: 1.54e-02, grad_scale: 64.0 2023-12-21 20:06:55,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=209973.33333333334, ans=0.125 2023-12-21 20:07:00,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=209973.33333333334, ans=0.1 2023-12-21 20:07:10,771 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.261e+01 2.581e+01 2.783e+01 3.000e+01 3.656e+01, threshold=5.566e+01, percent-clipped=0.0 2023-12-21 20:07:36,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=210240.0, ans=0.0 2023-12-21 20:07:43,857 INFO [train.py:886] (1/4) Epoch 7, batch 2950, loss[loss=0.01445, audio_tagging_loss=0.01445, over 24750.00 frames. ], tot_loss[loss=0.01605, audio_tagging_loss=0.01605, over 4940936.33 frames. ], batch size: 99, lr: 1.54e-02, grad_scale: 64.0 2023-12-21 20:07:49,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=210306.66666666666, ans=0.125 2023-12-21 20:07:50,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=210306.66666666666, ans=0.1 2023-12-21 20:07:55,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=210373.33333333334, ans=0.2 2023-12-21 20:07:57,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=210373.33333333334, ans=0.2 2023-12-21 20:08:04,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=210440.0, ans=0.125 2023-12-21 20:08:06,577 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.90 vs. limit=15.0 2023-12-21 20:08:09,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=210440.0, ans=0.125 2023-12-21 20:08:22,980 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.13 vs. limit=22.5 2023-12-21 20:08:36,033 INFO [train.py:886] (1/4) Epoch 7, batch 3000, loss[loss=0.01386, audio_tagging_loss=0.01386, over 25000.00 frames. ], tot_loss[loss=0.016, audio_tagging_loss=0.016, over 4947513.98 frames. ], batch size: 100, lr: 1.54e-02, grad_scale: 64.0 2023-12-21 20:08:36,033 INFO [train.py:909] (1/4) Computing validation loss 2023-12-21 20:08:50,555 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.7014, 2.3144, 2.8295, 2.9379, 2.7235, 2.6919, 1.7996, 2.6588], device='cuda:1') 2023-12-21 20:08:57,465 INFO [train.py:917] (1/4) Epoch 7, validation: loss=0.03818, audio_tagging_loss=0.03818, over 3737520.00 frames. 2023-12-21 20:08:57,466 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-21 20:09:05,432 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.52 vs. limit=15.0 2023-12-21 20:09:12,616 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.86 vs. limit=15.0 2023-12-21 20:09:13,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=210706.66666666666, ans=0.0 2023-12-21 20:09:15,949 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+01 2.521e+01 2.655e+01 2.830e+01 3.730e+01, threshold=5.311e+01, percent-clipped=0.0 2023-12-21 20:09:19,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=210773.33333333334, ans=0.2 2023-12-21 20:09:22,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=210773.33333333334, ans=0.035 2023-12-21 20:09:27,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=210840.0, ans=0.125 2023-12-21 20:09:29,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=210840.0, ans=0.125 2023-12-21 20:09:42,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.03 vs. limit=6.0 2023-12-21 20:09:49,769 INFO [train.py:886] (1/4) Epoch 7, batch 3050, loss[loss=0.01468, audio_tagging_loss=0.01468, over 25000.00 frames. ], tot_loss[loss=0.01595, audio_tagging_loss=0.01595, over 4949785.12 frames. ], batch size: 100, lr: 1.53e-02, grad_scale: 64.0 2023-12-21 20:10:22,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=211173.33333333334, ans=0.125 2023-12-21 20:10:32,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=211240.0, ans=0.2 2023-12-21 20:10:42,211 INFO [train.py:886] (1/4) Epoch 7, batch 3100, loss[loss=0.01585, audio_tagging_loss=0.01585, over 25000.00 frames. ], tot_loss[loss=0.01599, audio_tagging_loss=0.01599, over 4953430.51 frames. ], batch size: 100, lr: 1.53e-02, grad_scale: 64.0 2023-12-21 20:10:51,715 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.39 vs. limit=15.0 2023-12-21 20:11:00,332 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.401e+01 2.611e+01 2.756e+01 2.922e+01 4.082e+01, threshold=5.512e+01, percent-clipped=0.0 2023-12-21 20:11:05,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=211440.0, ans=0.0 2023-12-21 20:11:28,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=211573.33333333334, ans=0.07 2023-12-21 20:11:30,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=211573.33333333334, ans=0.07 2023-12-21 20:11:33,195 INFO [train.py:886] (1/4) Epoch 7, batch 3150, loss[loss=0.01479, audio_tagging_loss=0.01479, over 24750.00 frames. ], tot_loss[loss=0.01601, audio_tagging_loss=0.01601, over 4941082.56 frames. ], batch size: 99, lr: 1.53e-02, grad_scale: 64.0 2023-12-21 20:12:03,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=211840.0, ans=0.5 2023-12-21 20:12:05,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=211840.0, ans=0.125 2023-12-21 20:12:11,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=211840.0, ans=0.05 2023-12-21 20:12:25,089 INFO [train.py:886] (1/4) Epoch 7, batch 3200, loss[loss=0.01777, audio_tagging_loss=0.01777, over 24750.00 frames. ], tot_loss[loss=0.01609, audio_tagging_loss=0.01609, over 4935392.21 frames. ], batch size: 99, lr: 1.53e-02, grad_scale: 64.0 2023-12-21 20:12:39,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=212040.0, ans=0.0 2023-12-21 20:12:40,712 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.63 vs. limit=15.0 2023-12-21 20:12:43,125 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.209e+01 2.609e+01 2.784e+01 2.998e+01 3.894e+01, threshold=5.569e+01, percent-clipped=0.0 2023-12-21 20:12:47,961 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.10 vs. limit=10.0 2023-12-21 20:12:49,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=212106.66666666666, ans=0.02 2023-12-21 20:13:07,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=212240.0, ans=0.0 2023-12-21 20:13:07,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=212240.0, ans=0.125 2023-12-21 20:13:16,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=212306.66666666666, ans=0.125 2023-12-21 20:13:17,068 INFO [train.py:886] (1/4) Epoch 7, batch 3250, loss[loss=0.01635, audio_tagging_loss=0.01635, over 24750.00 frames. ], tot_loss[loss=0.01603, audio_tagging_loss=0.01603, over 4936633.23 frames. ], batch size: 99, lr: 1.53e-02, grad_scale: 64.0 2023-12-21 20:13:19,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.96 vs. limit=15.0 2023-12-21 20:13:34,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=212373.33333333334, ans=0.125 2023-12-21 20:13:45,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn2.whiten.whitening_limit, batch_count=212440.0, ans=22.5 2023-12-21 20:14:08,617 INFO [train.py:886] (1/4) Epoch 7, batch 3300, loss[loss=0.01495, audio_tagging_loss=0.01495, over 24750.00 frames. ], tot_loss[loss=0.0159, audio_tagging_loss=0.0159, over 4936239.82 frames. ], batch size: 99, lr: 1.53e-02, grad_scale: 64.0 2023-12-21 20:14:08,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=212640.0, ans=0.125 2023-12-21 20:14:27,796 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.135e+01 2.590e+01 2.789e+01 2.998e+01 3.537e+01, threshold=5.578e+01, percent-clipped=0.0 2023-12-21 20:14:39,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=212840.0, ans=0.07 2023-12-21 20:14:51,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=212906.66666666666, ans=0.0 2023-12-21 20:15:01,348 INFO [train.py:886] (1/4) Epoch 7, batch 3350, loss[loss=0.01436, audio_tagging_loss=0.01436, over 24750.00 frames. ], tot_loss[loss=0.01585, audio_tagging_loss=0.01585, over 4945181.90 frames. ], batch size: 99, lr: 1.53e-02, grad_scale: 64.0 2023-12-21 20:15:13,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.99 vs. limit=10.0 2023-12-21 20:15:26,719 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.93 vs. limit=15.0 2023-12-21 20:15:30,958 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.615e-01 2023-12-21 20:15:45,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=213240.0, ans=0.125 2023-12-21 20:15:53,144 INFO [train.py:886] (1/4) Epoch 7, batch 3400, loss[loss=0.01679, audio_tagging_loss=0.01679, over 25000.00 frames. ], tot_loss[loss=0.01588, audio_tagging_loss=0.01588, over 4943460.26 frames. ], batch size: 100, lr: 1.53e-02, grad_scale: 64.0 2023-12-21 20:16:09,348 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.45 vs. limit=15.0 2023-12-21 20:16:13,541 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.145e+01 2.594e+01 2.749e+01 2.971e+01 3.801e+01, threshold=5.499e+01, percent-clipped=0.0 2023-12-21 20:16:15,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.99 vs. limit=15.0 2023-12-21 20:16:21,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=213440.0, ans=0.04949747468305833 2023-12-21 20:16:30,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=213506.66666666666, ans=0.07 2023-12-21 20:16:36,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=213573.33333333334, ans=0.1 2023-12-21 20:16:46,481 INFO [train.py:886] (1/4) Epoch 7, batch 3450, loss[loss=0.01565, audio_tagging_loss=0.01565, over 24750.00 frames. ], tot_loss[loss=0.01589, audio_tagging_loss=0.01589, over 4941047.05 frames. ], batch size: 99, lr: 1.52e-02, grad_scale: 64.0 2023-12-21 20:16:49,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=213640.0, ans=0.0 2023-12-21 20:16:56,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.02 vs. limit=15.0 2023-12-21 20:17:38,622 INFO [train.py:886] (1/4) Epoch 7, batch 3500, loss[loss=0.01493, audio_tagging_loss=0.01493, over 21302.00 frames. ], tot_loss[loss=0.01598, audio_tagging_loss=0.01598, over 4932525.07 frames. ], batch size: 107, lr: 1.52e-02, grad_scale: 64.0 2023-12-21 20:17:50,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=214040.0, ans=0.125 2023-12-21 20:17:50,424 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.78 vs. limit=15.0 2023-12-21 20:17:56,290 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.629e+01 2.792e+01 2.990e+01 3.562e+01, threshold=5.583e+01, percent-clipped=0.0 2023-12-21 20:17:59,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=214106.66666666666, ans=0.125 2023-12-21 20:18:06,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=214106.66666666666, ans=0.125 2023-12-21 20:18:17,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=214240.0, ans=0.125 2023-12-21 20:18:29,578 INFO [train.py:886] (1/4) Epoch 7, batch 3550, loss[loss=0.02025, audio_tagging_loss=0.02025, over 24750.00 frames. ], tot_loss[loss=0.01588, audio_tagging_loss=0.01588, over 4938812.95 frames. ], batch size: 99, lr: 1.52e-02, grad_scale: 64.0 2023-12-21 20:18:50,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=214440.0, ans=0.125 2023-12-21 20:19:20,725 INFO [train.py:886] (1/4) Epoch 7, batch 3600, loss[loss=0.01395, audio_tagging_loss=0.01395, over 24750.00 frames. ], tot_loss[loss=0.01578, audio_tagging_loss=0.01578, over 4938322.61 frames. ], batch size: 99, lr: 1.52e-02, grad_scale: 64.0 2023-12-21 20:19:30,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=214706.66666666666, ans=0.0 2023-12-21 20:19:39,453 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.220e+01 2.510e+01 2.669e+01 2.897e+01 3.609e+01, threshold=5.338e+01, percent-clipped=0.0 2023-12-21 20:19:42,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=214773.33333333334, ans=0.1 2023-12-21 20:19:44,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=214773.33333333334, ans=0.0 2023-12-21 20:19:53,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=214840.0, ans=0.0 2023-12-21 20:19:59,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=214840.0, ans=0.05 2023-12-21 20:20:07,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.49 vs. limit=15.0 2023-12-21 20:20:12,788 INFO [train.py:886] (1/4) Epoch 7, batch 3650, loss[loss=0.01367, audio_tagging_loss=0.01367, over 24750.00 frames. ], tot_loss[loss=0.01564, audio_tagging_loss=0.01564, over 4943558.56 frames. ], batch size: 99, lr: 1.52e-02, grad_scale: 64.0 2023-12-21 20:20:16,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=214973.33333333334, ans=0.125 2023-12-21 20:20:31,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=215040.0, ans=0.0 2023-12-21 20:20:37,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=215106.66666666666, ans=0.0 2023-12-21 20:20:56,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.33 vs. limit=22.5 2023-12-21 20:21:02,739 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.54 vs. limit=12.0 2023-12-21 20:21:04,226 INFO [train.py:886] (1/4) Epoch 7, batch 3700, loss[loss=0.01787, audio_tagging_loss=0.01787, over 24750.00 frames. ], tot_loss[loss=0.0157, audio_tagging_loss=0.0157, over 4946015.57 frames. ], batch size: 99, lr: 1.52e-02, grad_scale: 64.0 2023-12-21 20:21:20,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=215373.33333333334, ans=0.09899494936611666 2023-12-21 20:21:24,016 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.185e+01 2.509e+01 2.714e+01 2.873e+01 3.497e+01, threshold=5.429e+01, percent-clipped=0.0 2023-12-21 20:21:41,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=215506.66666666666, ans=0.1 2023-12-21 20:21:47,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=215573.33333333334, ans=0.0 2023-12-21 20:21:50,461 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.00 vs. limit=15.0 2023-12-21 20:21:55,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.39 vs. limit=22.5 2023-12-21 20:21:56,705 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.60 vs. limit=22.5 2023-12-21 20:21:56,907 INFO [train.py:886] (1/4) Epoch 7, batch 3750, loss[loss=0.01932, audio_tagging_loss=0.01932, over 24750.00 frames. ], tot_loss[loss=0.01585, audio_tagging_loss=0.01585, over 4945524.50 frames. ], batch size: 99, lr: 1.52e-02, grad_scale: 64.0 2023-12-21 20:22:02,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=215640.0, ans=0.125 2023-12-21 20:22:04,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=215640.0, ans=0.125 2023-12-21 20:22:10,651 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.13 vs. limit=15.0 2023-12-21 20:22:13,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=215706.66666666666, ans=0.0 2023-12-21 20:22:19,789 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.61 vs. limit=15.0 2023-12-21 20:22:31,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=215840.0, ans=0.125 2023-12-21 20:22:47,568 INFO [train.py:886] (1/4) Epoch 7, batch 3800, loss[loss=0.01405, audio_tagging_loss=0.01405, over 25000.00 frames. ], tot_loss[loss=0.01591, audio_tagging_loss=0.01591, over 4944086.63 frames. ], batch size: 100, lr: 1.52e-02, grad_scale: 64.0 2023-12-21 20:23:06,292 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.355e+01 2.628e+01 2.789e+01 3.025e+01 3.707e+01, threshold=5.578e+01, percent-clipped=0.0 2023-12-21 20:23:39,554 INFO [train.py:886] (1/4) Epoch 7, batch 3850, loss[loss=0.01983, audio_tagging_loss=0.01983, over 25000.00 frames. ], tot_loss[loss=0.01598, audio_tagging_loss=0.01598, over 4939584.79 frames. ], batch size: 100, lr: 1.52e-02, grad_scale: 64.0 2023-12-21 20:24:00,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=216440.0, ans=0.125 2023-12-21 20:24:17,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=216506.66666666666, ans=0.2 2023-12-21 20:24:30,983 INFO [train.py:886] (1/4) Epoch 7, batch 3900, loss[loss=0.01811, audio_tagging_loss=0.01811, over 25000.00 frames. ], tot_loss[loss=0.01585, audio_tagging_loss=0.01585, over 4948670.13 frames. ], batch size: 100, lr: 1.51e-02, grad_scale: 64.0 2023-12-21 20:24:38,570 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.241e+00 2023-12-21 20:24:48,638 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.252e+01 2.626e+01 2.763e+01 2.975e+01 3.870e+01, threshold=5.525e+01, percent-clipped=0.0 2023-12-21 20:24:55,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=216773.33333333334, ans=0.5 2023-12-21 20:25:07,247 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.64 vs. limit=12.0 2023-12-21 20:25:12,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=216906.66666666666, ans=0.2 2023-12-21 20:25:12,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=216906.66666666666, ans=0.0 2023-12-21 20:25:21,835 INFO [train.py:886] (1/4) Epoch 7, batch 3950, loss[loss=0.01373, audio_tagging_loss=0.01373, over 24750.00 frames. ], tot_loss[loss=0.01581, audio_tagging_loss=0.01581, over 4947066.92 frames. ], batch size: 99, lr: 1.51e-02, grad_scale: 64.0 2023-12-21 20:25:39,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=217040.0, ans=0.125 2023-12-21 20:25:51,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=217106.66666666666, ans=0.0 2023-12-21 20:25:56,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=217173.33333333334, ans=15.0 2023-12-21 20:26:04,532 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.77 vs. limit=6.0 2023-12-21 20:26:08,879 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.80 vs. limit=15.0 2023-12-21 20:26:12,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=217240.0, ans=0.07 2023-12-21 20:26:14,573 INFO [train.py:886] (1/4) Epoch 7, batch 4000, loss[loss=0.01621, audio_tagging_loss=0.01621, over 25000.00 frames. ], tot_loss[loss=0.01595, audio_tagging_loss=0.01595, over 4951973.30 frames. ], batch size: 100, lr: 1.51e-02, grad_scale: 64.0 2023-12-21 20:26:23,912 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2023-12-21 20:26:29,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=217373.33333333334, ans=0.125 2023-12-21 20:26:32,675 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.228e+01 2.642e+01 2.775e+01 3.007e+01 3.915e+01, threshold=5.549e+01, percent-clipped=0.0 2023-12-21 20:26:34,117 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.00 vs. limit=15.0 2023-12-21 20:26:35,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=217440.0, ans=0.2 2023-12-21 20:26:49,269 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.74 vs. limit=15.0 2023-12-21 20:26:52,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=217506.66666666666, ans=0.09899494936611666 2023-12-21 20:26:56,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=217573.33333333334, ans=10.0 2023-12-21 20:27:05,537 INFO [train.py:886] (1/4) Epoch 7, batch 4050, loss[loss=0.01615, audio_tagging_loss=0.01615, over 24750.00 frames. ], tot_loss[loss=0.01607, audio_tagging_loss=0.01607, over 4949679.43 frames. ], batch size: 99, lr: 1.51e-02, grad_scale: 64.0 2023-12-21 20:27:27,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=217773.33333333334, ans=0.125 2023-12-21 20:27:51,217 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.29 vs. limit=15.0 2023-12-21 20:27:58,024 INFO [train.py:886] (1/4) Epoch 7, batch 4100, loss[loss=0.01608, audio_tagging_loss=0.01608, over 24750.00 frames. ], tot_loss[loss=0.01615, audio_tagging_loss=0.01615, over 4939979.50 frames. ], batch size: 99, lr: 1.51e-02, grad_scale: 64.0 2023-12-21 20:28:02,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=217973.33333333334, ans=0.025 2023-12-21 20:28:13,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=218040.0, ans=0.125 2023-12-21 20:28:18,077 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.263e+01 2.585e+01 2.755e+01 2.921e+01 3.340e+01, threshold=5.510e+01, percent-clipped=0.0 2023-12-21 20:28:42,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=218240.0, ans=0.04949747468305833 2023-12-21 20:28:50,134 INFO [train.py:886] (1/4) Epoch 7, batch 4150, loss[loss=0.01325, audio_tagging_loss=0.01325, over 24750.00 frames. ], tot_loss[loss=0.01609, audio_tagging_loss=0.01609, over 4935750.92 frames. ], batch size: 99, lr: 1.51e-02, grad_scale: 64.0 2023-12-21 20:28:52,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=218306.66666666666, ans=0.1 2023-12-21 20:28:57,924 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.85 vs. limit=15.0 2023-12-21 20:28:58,565 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=4.371e-02 2023-12-21 20:29:11,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=218440.0, ans=0.125 2023-12-21 20:29:41,476 INFO [train.py:886] (1/4) Epoch 7, batch 4200, loss[loss=0.01477, audio_tagging_loss=0.01477, over 24750.00 frames. ], tot_loss[loss=0.01603, audio_tagging_loss=0.01603, over 4937553.40 frames. ], batch size: 99, lr: 1.51e-02, grad_scale: 64.0 2023-12-21 20:29:46,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=218640.0, ans=10.0 2023-12-21 20:30:01,127 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.249e+01 2.567e+01 2.744e+01 2.993e+01 3.872e+01, threshold=5.489e+01, percent-clipped=0.0 2023-12-21 20:30:02,591 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.62 vs. limit=15.0 2023-12-21 20:30:05,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=218773.33333333334, ans=0.0 2023-12-21 20:30:06,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=218773.33333333334, ans=0.05 2023-12-21 20:30:07,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=218773.33333333334, ans=0.0 2023-12-21 20:30:10,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.96 vs. limit=15.0 2023-12-21 20:30:19,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=218840.0, ans=0.0 2023-12-21 20:30:28,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=218906.66666666666, ans=0.1 2023-12-21 20:30:31,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=218906.66666666666, ans=0.0 2023-12-21 20:30:33,462 INFO [train.py:886] (1/4) Epoch 7, batch 4250, loss[loss=0.01788, audio_tagging_loss=0.01788, over 24750.00 frames. ], tot_loss[loss=0.01604, audio_tagging_loss=0.01604, over 4942962.33 frames. ], batch size: 99, lr: 1.51e-02, grad_scale: 64.0 2023-12-21 20:30:45,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=219040.0, ans=0.0 2023-12-21 20:30:46,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=219040.0, ans=0.1 2023-12-21 20:30:51,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=219040.0, ans=0.125 2023-12-21 20:31:05,833 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2023-12-21 20:31:19,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=219240.0, ans=0.1 2023-12-21 20:31:24,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=219306.66666666666, ans=0.125 2023-12-21 20:31:25,692 INFO [train.py:886] (1/4) Epoch 7, batch 4300, loss[loss=0.01814, audio_tagging_loss=0.01814, over 25000.00 frames. ], tot_loss[loss=0.01599, audio_tagging_loss=0.01599, over 4949348.73 frames. ], batch size: 100, lr: 1.51e-02, grad_scale: 64.0 2023-12-21 20:31:26,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=219306.66666666666, ans=0.0 2023-12-21 20:31:26,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=219306.66666666666, ans=0.05 2023-12-21 20:31:32,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=219306.66666666666, ans=0.125 2023-12-21 20:31:32,215 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.110e+00 2023-12-21 20:31:45,086 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.344e+01 2.678e+01 2.884e+01 3.041e+01 3.867e+01, threshold=5.769e+01, percent-clipped=0.0 2023-12-21 20:31:46,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=219440.0, ans=0.125 2023-12-21 20:31:47,465 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.64 vs. limit=15.0 2023-12-21 20:31:48,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=219440.0, ans=0.125 2023-12-21 20:31:50,244 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=18.86 vs. limit=15.0 2023-12-21 20:32:01,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=219506.66666666666, ans=0.0 2023-12-21 20:32:02,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=219506.66666666666, ans=0.2 2023-12-21 20:32:14,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=219573.33333333334, ans=0.125 2023-12-21 20:32:16,668 INFO [train.py:886] (1/4) Epoch 7, batch 4350, loss[loss=0.01303, audio_tagging_loss=0.01303, over 24750.00 frames. ], tot_loss[loss=0.01598, audio_tagging_loss=0.01598, over 4955236.47 frames. ], batch size: 99, lr: 1.50e-02, grad_scale: 64.0 2023-12-21 20:32:23,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=219640.0, ans=0.125 2023-12-21 20:32:43,662 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.06 vs. limit=22.5 2023-12-21 20:32:52,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=219840.0, ans=0.0 2023-12-21 20:32:54,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=219840.0, ans=0.0 2023-12-21 20:32:57,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=219906.66666666666, ans=0.125 2023-12-21 20:32:59,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=219906.66666666666, ans=0.1 2023-12-21 20:33:09,190 INFO [train.py:886] (1/4) Epoch 7, batch 4400, loss[loss=0.01507, audio_tagging_loss=0.01507, over 24750.00 frames. ], tot_loss[loss=0.01624, audio_tagging_loss=0.01624, over 4954305.52 frames. ], batch size: 99, lr: 1.50e-02, grad_scale: 64.0 2023-12-21 20:33:28,580 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.285e+01 2.625e+01 2.823e+01 3.033e+01 3.448e+01, threshold=5.646e+01, percent-clipped=0.0 2023-12-21 20:33:29,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=220106.66666666666, ans=0.125 2023-12-21 20:33:37,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=220106.66666666666, ans=0.125 2023-12-21 20:33:59,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=220306.66666666666, ans=0.0 2023-12-21 20:34:00,665 INFO [train.py:886] (1/4) Epoch 7, batch 4450, loss[loss=0.01581, audio_tagging_loss=0.01581, over 24750.00 frames. ], tot_loss[loss=0.01631, audio_tagging_loss=0.01631, over 4954409.96 frames. ], batch size: 99, lr: 1.50e-02, grad_scale: 64.0 2023-12-21 20:34:02,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=220306.66666666666, ans=0.02 2023-12-21 20:34:05,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=220306.66666666666, ans=0.1 2023-12-21 20:34:08,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=220306.66666666666, ans=0.0 2023-12-21 20:34:14,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=220373.33333333334, ans=0.0 2023-12-21 20:34:22,364 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.24 vs. limit=15.0 2023-12-21 20:34:52,050 INFO [train.py:886] (1/4) Epoch 7, batch 4500, loss[loss=0.01509, audio_tagging_loss=0.01509, over 24750.00 frames. ], tot_loss[loss=0.01611, audio_tagging_loss=0.01611, over 4955464.47 frames. ], batch size: 99, lr: 1.50e-02, grad_scale: 64.0 2023-12-21 20:34:52,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=220640.0, ans=0.125 2023-12-21 20:34:59,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=220640.0, ans=0.125 2023-12-21 20:35:01,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=220640.0, ans=0.125 2023-12-21 20:35:05,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=220706.66666666666, ans=0.07 2023-12-21 20:35:10,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=220706.66666666666, ans=0.125 2023-12-21 20:35:12,166 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.153e+01 2.651e+01 2.803e+01 3.008e+01 3.714e+01, threshold=5.606e+01, percent-clipped=0.0 2023-12-21 20:35:20,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=220773.33333333334, ans=0.125 2023-12-21 20:35:32,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=220906.66666666666, ans=0.1 2023-12-21 20:35:35,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=220906.66666666666, ans=0.125 2023-12-21 20:35:38,185 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 20:35:44,492 INFO [train.py:886] (1/4) Epoch 7, batch 4550, loss[loss=0.01457, audio_tagging_loss=0.01457, over 25000.00 frames. ], tot_loss[loss=0.01607, audio_tagging_loss=0.01607, over 4954908.08 frames. ], batch size: 100, lr: 1.50e-02, grad_scale: 64.0 2023-12-21 20:35:49,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=220973.33333333334, ans=0.2 2023-12-21 20:35:51,400 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.88 vs. limit=15.0 2023-12-21 20:35:58,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=221040.0, ans=0.035 2023-12-21 20:36:03,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.70 vs. limit=12.0 2023-12-21 20:36:21,750 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.11 vs. limit=15.0 2023-12-21 20:36:35,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=221306.66666666666, ans=0.125 2023-12-21 20:36:36,199 INFO [train.py:886] (1/4) Epoch 7, batch 4600, loss[loss=0.0158, audio_tagging_loss=0.0158, over 22198.00 frames. ], tot_loss[loss=0.01596, audio_tagging_loss=0.01596, over 4950490.87 frames. ], batch size: 107, lr: 1.50e-02, grad_scale: 64.0 2023-12-21 20:36:38,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=221306.66666666666, ans=0.125 2023-12-21 20:36:42,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=221306.66666666666, ans=0.95 2023-12-21 20:36:42,358 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.67 vs. limit=15.0 2023-12-21 20:36:43,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=221306.66666666666, ans=0.125 2023-12-21 20:36:48,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=221373.33333333334, ans=0.125 2023-12-21 20:36:56,447 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.327e+01 2.617e+01 2.767e+01 2.974e+01 3.555e+01, threshold=5.535e+01, percent-clipped=0.0 2023-12-21 20:36:57,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=221440.0, ans=0.1 2023-12-21 20:37:28,712 INFO [train.py:886] (1/4) Epoch 7, batch 4650, loss[loss=0.01656, audio_tagging_loss=0.01656, over 25000.00 frames. ], tot_loss[loss=0.01596, audio_tagging_loss=0.01596, over 4950602.89 frames. ], batch size: 100, lr: 1.50e-02, grad_scale: 64.0 2023-12-21 20:37:49,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=221773.33333333334, ans=0.025 2023-12-21 20:37:54,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=221773.33333333334, ans=0.0 2023-12-21 20:37:59,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=221840.0, ans=0.125 2023-12-21 20:38:01,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=221840.0, ans=0.1 2023-12-21 20:38:19,035 INFO [train.py:886] (1/4) Epoch 7, batch 4700, loss[loss=0.01368, audio_tagging_loss=0.01368, over 24750.00 frames. ], tot_loss[loss=0.016, audio_tagging_loss=0.016, over 4947918.05 frames. ], batch size: 99, lr: 1.50e-02, grad_scale: 64.0 2023-12-21 20:38:21,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=221973.33333333334, ans=0.1 2023-12-21 20:38:22,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=221973.33333333334, ans=0.125 2023-12-21 20:38:32,684 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.278e-01 2023-12-21 20:38:36,956 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.220e+01 2.697e+01 2.843e+01 3.036e+01 3.752e+01, threshold=5.686e+01, percent-clipped=0.0 2023-12-21 20:38:41,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=222106.66666666666, ans=0.2 2023-12-21 20:38:43,348 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.41 vs. limit=15.0 2023-12-21 20:38:48,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=222173.33333333334, ans=0.1 2023-12-21 20:38:51,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=222173.33333333334, ans=0.125 2023-12-21 20:39:01,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=222240.0, ans=0.125 2023-12-21 20:39:01,285 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.76 vs. limit=12.0 2023-12-21 20:39:03,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=222240.0, ans=0.0 2023-12-21 20:39:06,412 INFO [train.py:886] (1/4) Epoch 7, batch 4750, loss[loss=0.01444, audio_tagging_loss=0.01444, over 24750.00 frames. ], tot_loss[loss=0.01601, audio_tagging_loss=0.01601, over 4940986.28 frames. ], batch size: 99, lr: 1.50e-02, grad_scale: 64.0 2023-12-21 20:39:08,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=222306.66666666666, ans=0.0 2023-12-21 20:39:42,215 INFO [train.py:886] (1/4) Epoch 8, batch 0, loss[loss=0.04789, audio_tagging_loss=0.04789, over 20843.00 frames. ], tot_loss[loss=0.04789, audio_tagging_loss=0.04789, over 20843.00 frames. ], batch size: 107, lr: 1.41e-02, grad_scale: 64.0 2023-12-21 20:39:42,216 INFO [train.py:909] (1/4) Computing validation loss 2023-12-21 20:39:59,301 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.8020, 2.7966, 3.5957, 3.9364], device='cuda:1') 2023-12-21 20:39:59,975 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.8079, 2.8478, 3.6222, 3.9345], device='cuda:1') 2023-12-21 20:40:03,500 INFO [train.py:917] (1/4) Epoch 8, validation: loss=0.0357, audio_tagging_loss=0.0357, over 3737520.00 frames. 2023-12-21 20:40:03,500 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-21 20:40:43,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=222613.33333333334, ans=0.0 2023-12-21 20:40:52,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=222680.0, ans=0.0 2023-12-21 20:40:55,247 INFO [train.py:886] (1/4) Epoch 8, batch 50, loss[loss=0.01981, audio_tagging_loss=0.01981, over 25000.00 frames. ], tot_loss[loss=0.0261, audio_tagging_loss=0.0261, over 1109282.44 frames. ], batch size: 100, lr: 1.41e-02, grad_scale: 32.0 2023-12-21 20:40:59,553 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.377e+01 2.852e+01 3.338e+01 3.973e+01 1.217e+02, threshold=6.676e+01, percent-clipped=6.0 2023-12-21 20:41:28,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=222946.66666666666, ans=0.1 2023-12-21 20:41:47,037 INFO [train.py:886] (1/4) Epoch 8, batch 100, loss[loss=0.01506, audio_tagging_loss=0.01506, over 25000.00 frames. ], tot_loss[loss=0.02212, audio_tagging_loss=0.02212, over 1969975.16 frames. ], batch size: 100, lr: 1.41e-02, grad_scale: 32.0 2023-12-21 20:41:52,224 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.35 vs. limit=10.0 2023-12-21 20:42:02,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.26 vs. limit=22.5 2023-12-21 20:42:25,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=223280.0, ans=0.125 2023-12-21 20:42:29,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=223346.66666666666, ans=0.1 2023-12-21 20:42:31,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=223346.66666666666, ans=0.125 2023-12-21 20:42:37,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=223413.33333333334, ans=0.125 2023-12-21 20:42:37,956 INFO [train.py:886] (1/4) Epoch 8, batch 150, loss[loss=0.01851, audio_tagging_loss=0.01851, over 22860.00 frames. ], tot_loss[loss=0.01985, audio_tagging_loss=0.01985, over 2630047.26 frames. ], batch size: 107, lr: 1.41e-02, grad_scale: 32.0 2023-12-21 20:42:38,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=223413.33333333334, ans=0.2 2023-12-21 20:42:41,688 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.311e+01 2.729e+01 2.893e+01 3.075e+01 3.731e+01, threshold=5.785e+01, percent-clipped=0.0 2023-12-21 20:42:48,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=223480.0, ans=0.1 2023-12-21 20:42:55,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=223480.0, ans=0.125 2023-12-21 20:43:03,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=223546.66666666666, ans=0.125 2023-12-21 20:43:15,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=223613.33333333334, ans=0.0 2023-12-21 20:43:18,029 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2023-12-21 20:43:23,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=223680.0, ans=0.125 2023-12-21 20:43:27,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=223680.0, ans=0.125 2023-12-21 20:43:28,866 INFO [train.py:886] (1/4) Epoch 8, batch 200, loss[loss=0.01725, audio_tagging_loss=0.01725, over 25000.00 frames. ], tot_loss[loss=0.01856, audio_tagging_loss=0.01856, over 3151926.97 frames. ], batch size: 100, lr: 1.41e-02, grad_scale: 32.0 2023-12-21 20:43:32,326 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.81 vs. limit=15.0 2023-12-21 20:43:43,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=223813.33333333334, ans=0.125 2023-12-21 20:44:01,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=223946.66666666666, ans=0.125 2023-12-21 20:44:05,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=223946.66666666666, ans=0.5 2023-12-21 20:44:21,446 INFO [train.py:886] (1/4) Epoch 8, batch 250, loss[loss=0.01427, audio_tagging_loss=0.01427, over 25000.00 frames. ], tot_loss[loss=0.01775, audio_tagging_loss=0.01775, over 3556706.97 frames. ], batch size: 100, lr: 1.40e-02, grad_scale: 32.0 2023-12-21 20:44:25,227 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.184e+01 2.594e+01 2.804e+01 3.008e+01 3.563e+01, threshold=5.609e+01, percent-clipped=0.0 2023-12-21 20:44:31,990 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.25 vs. limit=15.0 2023-12-21 20:44:37,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=224146.66666666666, ans=0.0 2023-12-21 20:44:46,819 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.05 vs. limit=15.0 2023-12-21 20:45:08,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=224346.66666666666, ans=0.025 2023-12-21 20:45:11,818 INFO [train.py:886] (1/4) Epoch 8, batch 300, loss[loss=0.01849, audio_tagging_loss=0.01849, over 24945.00 frames. ], tot_loss[loss=0.01739, audio_tagging_loss=0.01739, over 3866151.17 frames. ], batch size: 100, lr: 1.40e-02, grad_scale: 32.0 2023-12-21 20:45:29,166 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.77 vs. limit=10.0 2023-12-21 20:46:04,447 INFO [train.py:886] (1/4) Epoch 8, batch 350, loss[loss=0.01621, audio_tagging_loss=0.01621, over 25000.00 frames. ], tot_loss[loss=0.01697, audio_tagging_loss=0.01697, over 4105402.42 frames. ], batch size: 100, lr: 1.40e-02, grad_scale: 32.0 2023-12-21 20:46:08,268 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.559e+01 2.704e+01 2.867e+01 3.346e+01, threshold=5.408e+01, percent-clipped=0.0 2023-12-21 20:46:14,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.27 vs. limit=15.0 2023-12-21 20:46:23,072 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=5.289e-03 2023-12-21 20:46:30,757 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.72 vs. limit=10.0 2023-12-21 20:46:33,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=224880.0, ans=0.0 2023-12-21 20:46:42,777 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.37 vs. limit=10.0 2023-12-21 20:46:46,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=225013.33333333334, ans=0.1 2023-12-21 20:46:47,120 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 20:46:56,031 INFO [train.py:886] (1/4) Epoch 8, batch 400, loss[loss=0.01735, audio_tagging_loss=0.01735, over 25000.00 frames. ], tot_loss[loss=0.01656, audio_tagging_loss=0.01656, over 4290659.83 frames. ], batch size: 100, lr: 1.40e-02, grad_scale: 32.0 2023-12-21 20:46:57,535 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.94 vs. limit=15.0 2023-12-21 20:47:01,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=225080.0, ans=0.1 2023-12-21 20:47:07,531 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.80 vs. limit=15.0 2023-12-21 20:47:16,203 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.67 vs. limit=12.0 2023-12-21 20:47:16,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=225213.33333333334, ans=0.0 2023-12-21 20:47:16,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=225213.33333333334, ans=0.2 2023-12-21 20:47:18,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=225213.33333333334, ans=0.07 2023-12-21 20:47:19,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=225213.33333333334, ans=0.0 2023-12-21 20:47:29,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=225280.0, ans=0.125 2023-12-21 20:47:34,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=225280.0, ans=0.125 2023-12-21 20:47:42,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=225346.66666666666, ans=0.0 2023-12-21 20:47:48,118 INFO [train.py:886] (1/4) Epoch 8, batch 450, loss[loss=0.01609, audio_tagging_loss=0.01609, over 25000.00 frames. ], tot_loss[loss=0.01632, audio_tagging_loss=0.01632, over 4442503.35 frames. ], batch size: 100, lr: 1.40e-02, grad_scale: 32.0 2023-12-21 20:47:51,844 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.194e+01 2.539e+01 2.733e+01 2.950e+01 3.447e+01, threshold=5.466e+01, percent-clipped=0.0 2023-12-21 20:47:52,257 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.97 vs. limit=15.0 2023-12-21 20:47:56,881 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.63 vs. limit=15.0 2023-12-21 20:47:57,790 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.43 vs. limit=6.0 2023-12-21 20:48:02,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=225480.0, ans=0.1 2023-12-21 20:48:06,842 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.94 vs. limit=22.5 2023-12-21 20:48:11,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=225546.66666666666, ans=0.125 2023-12-21 20:48:11,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=225546.66666666666, ans=0.0 2023-12-21 20:48:12,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=225546.66666666666, ans=0.04949747468305833 2023-12-21 20:48:18,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=225613.33333333334, ans=0.125 2023-12-21 20:48:31,106 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.80 vs. limit=15.0 2023-12-21 20:48:40,730 INFO [train.py:886] (1/4) Epoch 8, batch 500, loss[loss=0.013, audio_tagging_loss=0.013, over 25000.00 frames. ], tot_loss[loss=0.01609, audio_tagging_loss=0.01609, over 4561676.36 frames. ], batch size: 100, lr: 1.40e-02, grad_scale: 32.0 2023-12-21 20:48:44,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=225746.66666666666, ans=0.125 2023-12-21 20:48:47,482 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=15.0 2023-12-21 20:48:51,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=225813.33333333334, ans=0.125 2023-12-21 20:48:52,505 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.75 vs. limit=15.0 2023-12-21 20:48:56,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=225813.33333333334, ans=0.07 2023-12-21 20:49:07,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=225880.0, ans=0.125 2023-12-21 20:49:08,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=225880.0, ans=0.125 2023-12-21 20:49:31,734 INFO [train.py:886] (1/4) Epoch 8, batch 550, loss[loss=0.01903, audio_tagging_loss=0.01903, over 25000.00 frames. ], tot_loss[loss=0.01604, audio_tagging_loss=0.01604, over 4656015.80 frames. ], batch size: 100, lr: 1.40e-02, grad_scale: 32.0 2023-12-21 20:49:35,530 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.198e+01 2.537e+01 2.667e+01 2.842e+01 3.334e+01, threshold=5.333e+01, percent-clipped=0.0 2023-12-21 20:49:39,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=226080.0, ans=0.0 2023-12-21 20:49:42,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.50 vs. limit=12.0 2023-12-21 20:49:55,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=226213.33333333334, ans=0.125 2023-12-21 20:50:04,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=226280.0, ans=0.125 2023-12-21 20:50:24,148 INFO [train.py:886] (1/4) Epoch 8, batch 600, loss[loss=0.01441, audio_tagging_loss=0.01441, over 24750.00 frames. ], tot_loss[loss=0.01616, audio_tagging_loss=0.01616, over 4722710.11 frames. ], batch size: 99, lr: 1.40e-02, grad_scale: 32.0 2023-12-21 20:50:44,996 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.78 vs. limit=15.0 2023-12-21 20:50:53,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.85 vs. limit=10.0 2023-12-21 20:50:54,616 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.47 vs. limit=15.0 2023-12-21 20:51:01,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.20 vs. limit=12.0 2023-12-21 20:51:03,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=226613.33333333334, ans=0.0 2023-12-21 20:51:15,693 INFO [train.py:886] (1/4) Epoch 8, batch 650, loss[loss=0.01804, audio_tagging_loss=0.01804, over 24750.00 frames. ], tot_loss[loss=0.01616, audio_tagging_loss=0.01616, over 4769315.67 frames. ], batch size: 99, lr: 1.40e-02, grad_scale: 32.0 2023-12-21 20:51:20,087 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 2.706e+01 2.885e+01 3.077e+01 3.988e+01, threshold=5.770e+01, percent-clipped=0.0 2023-12-21 20:51:31,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=226813.33333333334, ans=0.2 2023-12-21 20:51:40,301 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.11 vs. limit=15.0 2023-12-21 20:51:55,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=226946.66666666666, ans=0.0 2023-12-21 20:52:06,622 INFO [train.py:886] (1/4) Epoch 8, batch 700, loss[loss=0.01551, audio_tagging_loss=0.01551, over 24750.00 frames. ], tot_loss[loss=0.01614, audio_tagging_loss=0.01614, over 4807663.79 frames. ], batch size: 99, lr: 1.40e-02, grad_scale: 32.0 2023-12-21 20:52:19,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=227146.66666666666, ans=0.1 2023-12-21 20:52:21,647 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.14 vs. limit=6.0 2023-12-21 20:52:27,409 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 20:52:34,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=227213.33333333334, ans=0.125 2023-12-21 20:52:48,479 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.14 vs. limit=22.5 2023-12-21 20:52:59,428 INFO [train.py:886] (1/4) Epoch 8, batch 750, loss[loss=0.01323, audio_tagging_loss=0.01323, over 24750.00 frames. ], tot_loss[loss=0.01594, audio_tagging_loss=0.01594, over 4834571.61 frames. ], batch size: 99, lr: 1.39e-02, grad_scale: 32.0 2023-12-21 20:53:03,146 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.168e+01 2.577e+01 2.736e+01 2.943e+01 3.509e+01, threshold=5.472e+01, percent-clipped=0.0 2023-12-21 20:53:21,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=227546.66666666666, ans=0.0 2023-12-21 20:53:24,553 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.54 vs. limit=15.0 2023-12-21 20:53:26,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=227546.66666666666, ans=0.1 2023-12-21 20:53:38,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=227613.33333333334, ans=0.125 2023-12-21 20:53:42,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=227680.0, ans=0.125 2023-12-21 20:53:50,470 INFO [train.py:886] (1/4) Epoch 8, batch 800, loss[loss=0.01553, audio_tagging_loss=0.01553, over 24750.00 frames. ], tot_loss[loss=0.01583, audio_tagging_loss=0.01583, over 4864213.96 frames. ], batch size: 99, lr: 1.39e-02, grad_scale: 32.0 2023-12-21 20:53:50,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=227746.66666666666, ans=0.125 2023-12-21 20:53:53,829 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.38 vs. limit=15.0 2023-12-21 20:54:22,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=227946.66666666666, ans=0.0 2023-12-21 20:54:43,032 INFO [train.py:886] (1/4) Epoch 8, batch 850, loss[loss=0.01755, audio_tagging_loss=0.01755, over 21882.00 frames. ], tot_loss[loss=0.01577, audio_tagging_loss=0.01577, over 4882555.37 frames. ], batch size: 107, lr: 1.39e-02, grad_scale: 32.0 2023-12-21 20:54:46,742 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.294e+01 2.587e+01 2.731e+01 2.941e+01 3.321e+01, threshold=5.462e+01, percent-clipped=0.0 2023-12-21 20:54:55,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=228146.66666666666, ans=0.125 2023-12-21 20:55:13,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=228280.0, ans=0.0 2023-12-21 20:55:15,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=228280.0, ans=0.0 2023-12-21 20:55:28,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=228346.66666666666, ans=0.125 2023-12-21 20:55:34,602 INFO [train.py:886] (1/4) Epoch 8, batch 900, loss[loss=0.01462, audio_tagging_loss=0.01462, over 24750.00 frames. ], tot_loss[loss=0.01579, audio_tagging_loss=0.01579, over 4904057.30 frames. ], batch size: 99, lr: 1.39e-02, grad_scale: 32.0 2023-12-21 20:55:40,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=228413.33333333334, ans=0.2 2023-12-21 20:55:52,831 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.53 vs. limit=15.0 2023-12-21 20:55:56,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=228546.66666666666, ans=0.0 2023-12-21 20:56:26,473 INFO [train.py:886] (1/4) Epoch 8, batch 950, loss[loss=0.01819, audio_tagging_loss=0.01819, over 24750.00 frames. ], tot_loss[loss=0.01589, audio_tagging_loss=0.01589, over 4911886.63 frames. ], batch size: 99, lr: 1.39e-02, grad_scale: 32.0 2023-12-21 20:56:30,922 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.340e+01 2.642e+01 2.776e+01 2.984e+01 3.977e+01, threshold=5.552e+01, percent-clipped=0.0 2023-12-21 20:56:33,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=228746.66666666666, ans=0.125 2023-12-21 20:56:54,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=228880.0, ans=0.0 2023-12-21 20:56:55,415 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.81 vs. limit=15.0 2023-12-21 20:57:05,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.22 vs. limit=10.0 2023-12-21 20:57:12,731 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.37 vs. limit=15.0 2023-12-21 20:57:18,759 INFO [train.py:886] (1/4) Epoch 8, batch 1000, loss[loss=0.01505, audio_tagging_loss=0.01505, over 24750.00 frames. ], tot_loss[loss=0.01581, audio_tagging_loss=0.01581, over 4922894.84 frames. ], batch size: 99, lr: 1.39e-02, grad_scale: 32.0 2023-12-21 20:57:25,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=229080.0, ans=0.1 2023-12-21 20:57:34,770 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.69 vs. limit=15.0 2023-12-21 20:57:59,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=229346.66666666666, ans=0.0 2023-12-21 20:58:03,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=229346.66666666666, ans=0.0 2023-12-21 20:58:10,995 INFO [train.py:886] (1/4) Epoch 8, batch 1050, loss[loss=0.01388, audio_tagging_loss=0.01388, over 24750.00 frames. ], tot_loss[loss=0.01572, audio_tagging_loss=0.01572, over 4931598.83 frames. ], batch size: 99, lr: 1.39e-02, grad_scale: 32.0 2023-12-21 20:58:14,764 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.565e+01 2.750e+01 2.956e+01 3.659e+01, threshold=5.501e+01, percent-clipped=0.0 2023-12-21 20:58:21,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=229480.0, ans=0.1 2023-12-21 20:58:34,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=229546.66666666666, ans=0.0 2023-12-21 20:58:39,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=229546.66666666666, ans=0.0 2023-12-21 20:59:02,595 INFO [train.py:886] (1/4) Epoch 8, batch 1100, loss[loss=0.01622, audio_tagging_loss=0.01622, over 24750.00 frames. ], tot_loss[loss=0.0156, audio_tagging_loss=0.0156, over 4939173.59 frames. ], batch size: 99, lr: 1.39e-02, grad_scale: 32.0 2023-12-21 20:59:02,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=229746.66666666666, ans=0.0 2023-12-21 20:59:03,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=229746.66666666666, ans=0.0 2023-12-21 20:59:04,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.96 vs. limit=15.0 2023-12-21 20:59:17,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=229813.33333333334, ans=0.125 2023-12-21 20:59:19,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=229813.33333333334, ans=10.0 2023-12-21 20:59:22,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=229880.0, ans=0.125 2023-12-21 20:59:31,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=229880.0, ans=0.125 2023-12-21 20:59:37,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=229946.66666666666, ans=0.125 2023-12-21 20:59:54,257 INFO [train.py:886] (1/4) Epoch 8, batch 1150, loss[loss=0.0137, audio_tagging_loss=0.0137, over 25000.00 frames. ], tot_loss[loss=0.01555, audio_tagging_loss=0.01555, over 4942098.62 frames. ], batch size: 100, lr: 1.39e-02, grad_scale: 32.0 2023-12-21 20:59:58,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=230080.0, ans=0.2 2023-12-21 20:59:58,774 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+01 2.593e+01 2.758e+01 2.932e+01 3.936e+01, threshold=5.517e+01, percent-clipped=0.0 2023-12-21 21:00:06,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=230146.66666666666, ans=0.0 2023-12-21 21:00:15,979 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.29 vs. limit=15.0 2023-12-21 21:00:21,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=230213.33333333334, ans=0.0 2023-12-21 21:00:46,108 INFO [train.py:886] (1/4) Epoch 8, batch 1200, loss[loss=0.01714, audio_tagging_loss=0.01714, over 25000.00 frames. ], tot_loss[loss=0.01559, audio_tagging_loss=0.01559, over 4946985.60 frames. ], batch size: 100, lr: 1.39e-02, grad_scale: 32.0 2023-12-21 21:01:00,061 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.236e-01 2023-12-21 21:01:06,636 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.43 vs. limit=10.0 2023-12-21 21:01:17,995 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.71 vs. limit=6.0 2023-12-21 21:01:35,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=230680.0, ans=0.1 2023-12-21 21:01:38,505 INFO [train.py:886] (1/4) Epoch 8, batch 1250, loss[loss=0.0169, audio_tagging_loss=0.0169, over 24750.00 frames. ], tot_loss[loss=0.01573, audio_tagging_loss=0.01573, over 4948675.55 frames. ], batch size: 99, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:01:42,237 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.312e+01 2.655e+01 2.780e+01 2.996e+01 3.618e+01, threshold=5.560e+01, percent-clipped=0.0 2023-12-21 21:01:57,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=230813.33333333334, ans=0.125 2023-12-21 21:02:05,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=230880.0, ans=0.125 2023-12-21 21:02:11,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=230946.66666666666, ans=0.0 2023-12-21 21:02:30,632 INFO [train.py:886] (1/4) Epoch 8, batch 1300, loss[loss=0.01758, audio_tagging_loss=0.01758, over 25000.00 frames. ], tot_loss[loss=0.0158, audio_tagging_loss=0.0158, over 4946620.39 frames. ], batch size: 100, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:02:35,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=231080.0, ans=0.125 2023-12-21 21:02:44,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=231146.66666666666, ans=0.125 2023-12-21 21:02:54,304 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.86 vs. limit=15.0 2023-12-21 21:02:58,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=231213.33333333334, ans=0.2 2023-12-21 21:03:09,874 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.97 vs. limit=15.0 2023-12-21 21:03:11,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=231346.66666666666, ans=0.2 2023-12-21 21:03:20,606 INFO [train.py:886] (1/4) Epoch 8, batch 1350, loss[loss=0.01339, audio_tagging_loss=0.01339, over 25000.00 frames. ], tot_loss[loss=0.01572, audio_tagging_loss=0.01572, over 4947267.41 frames. ], batch size: 100, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:03:25,734 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+01 2.556e+01 2.752e+01 2.967e+01 4.307e+01, threshold=5.505e+01, percent-clipped=0.0 2023-12-21 21:03:52,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=231613.33333333334, ans=0.0 2023-12-21 21:03:58,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.52 vs. limit=15.0 2023-12-21 21:03:59,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=231613.33333333334, ans=0.1 2023-12-21 21:04:01,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=231613.33333333334, ans=0.0 2023-12-21 21:04:01,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=231613.33333333334, ans=0.2 2023-12-21 21:04:04,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=231680.0, ans=0.0 2023-12-21 21:04:13,937 INFO [train.py:886] (1/4) Epoch 8, batch 1400, loss[loss=0.01934, audio_tagging_loss=0.01934, over 24750.00 frames. ], tot_loss[loss=0.01565, audio_tagging_loss=0.01565, over 4943774.52 frames. ], batch size: 99, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:04:19,162 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=18.22 vs. limit=15.0 2023-12-21 21:04:28,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=231813.33333333334, ans=0.0 2023-12-21 21:04:29,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=231813.33333333334, ans=0.125 2023-12-21 21:04:36,465 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.95 vs. limit=15.0 2023-12-21 21:04:39,810 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 21:04:40,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=231880.0, ans=0.125 2023-12-21 21:04:45,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=231946.66666666666, ans=0.125 2023-12-21 21:05:00,091 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=15.0 2023-12-21 21:05:05,550 INFO [train.py:886] (1/4) Epoch 8, batch 1450, loss[loss=0.01507, audio_tagging_loss=0.01507, over 25000.00 frames. ], tot_loss[loss=0.0156, audio_tagging_loss=0.0156, over 4953964.34 frames. ], batch size: 100, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:05:09,935 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.071e+01 2.540e+01 2.719e+01 2.872e+01 3.689e+01, threshold=5.439e+01, percent-clipped=0.0 2023-12-21 21:05:10,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=232080.0, ans=0.125 2023-12-21 21:05:21,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=232146.66666666666, ans=0.125 2023-12-21 21:05:22,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=232146.66666666666, ans=0.1 2023-12-21 21:05:26,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=232213.33333333334, ans=0.125 2023-12-21 21:05:28,623 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.40 vs. limit=6.0 2023-12-21 21:05:29,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=232213.33333333334, ans=0.2 2023-12-21 21:05:37,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=232280.0, ans=0.0 2023-12-21 21:05:38,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=232280.0, ans=0.07 2023-12-21 21:05:53,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=232346.66666666666, ans=0.0 2023-12-21 21:05:54,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=232346.66666666666, ans=0.0 2023-12-21 21:05:57,192 INFO [train.py:886] (1/4) Epoch 8, batch 1500, loss[loss=0.01711, audio_tagging_loss=0.01711, over 25000.00 frames. ], tot_loss[loss=0.01565, audio_tagging_loss=0.01565, over 4955189.88 frames. ], batch size: 100, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:06:05,268 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.98 vs. limit=15.0 2023-12-21 21:06:14,571 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 21:06:17,662 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.63 vs. limit=12.0 2023-12-21 21:06:27,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=232613.33333333334, ans=0.1 2023-12-21 21:06:27,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=232613.33333333334, ans=0.2 2023-12-21 21:06:35,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=232613.33333333334, ans=0.1 2023-12-21 21:06:48,541 INFO [train.py:886] (1/4) Epoch 8, batch 1550, loss[loss=0.01707, audio_tagging_loss=0.01707, over 24750.00 frames. ], tot_loss[loss=0.01564, audio_tagging_loss=0.01564, over 4952283.79 frames. ], batch size: 99, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:06:52,988 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.317e+01 2.641e+01 2.774e+01 2.948e+01 3.544e+01, threshold=5.547e+01, percent-clipped=0.0 2023-12-21 21:07:03,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=232813.33333333334, ans=0.1 2023-12-21 21:07:25,616 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.40 vs. limit=6.0 2023-12-21 21:07:31,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=233013.33333333334, ans=0.125 2023-12-21 21:07:39,869 INFO [train.py:886] (1/4) Epoch 8, batch 1600, loss[loss=0.0191, audio_tagging_loss=0.0191, over 24750.00 frames. ], tot_loss[loss=0.01568, audio_tagging_loss=0.01568, over 4948122.26 frames. ], batch size: 99, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:07:41,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=233080.0, ans=0.125 2023-12-21 21:07:48,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=233080.0, ans=0.1 2023-12-21 21:08:05,214 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.07 vs. limit=15.0 2023-12-21 21:08:23,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=233346.66666666666, ans=0.1 2023-12-21 21:08:26,010 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.61 vs. limit=15.0 2023-12-21 21:08:31,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=233413.33333333334, ans=0.95 2023-12-21 21:08:32,019 INFO [train.py:886] (1/4) Epoch 8, batch 1650, loss[loss=0.01751, audio_tagging_loss=0.01751, over 21942.00 frames. ], tot_loss[loss=0.01571, audio_tagging_loss=0.01571, over 4946328.94 frames. ], batch size: 107, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:08:35,521 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.30 vs. limit=15.0 2023-12-21 21:08:35,748 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.269e+01 2.707e+01 2.843e+01 2.988e+01 3.739e+01, threshold=5.687e+01, percent-clipped=0.0 2023-12-21 21:08:36,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=233413.33333333334, ans=0.0 2023-12-21 21:08:44,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=233480.0, ans=0.125 2023-12-21 21:08:52,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=233546.66666666666, ans=0.125 2023-12-21 21:09:04,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=233613.33333333334, ans=0.0 2023-12-21 21:09:05,700 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.49 vs. limit=15.0 2023-12-21 21:09:23,675 INFO [train.py:886] (1/4) Epoch 8, batch 1700, loss[loss=0.01187, audio_tagging_loss=0.01187, over 24017.00 frames. ], tot_loss[loss=0.01555, audio_tagging_loss=0.01555, over 4949510.84 frames. ], batch size: 100, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:09:31,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=233746.66666666666, ans=0.0 2023-12-21 21:09:48,463 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.51 vs. limit=15.0 2023-12-21 21:09:50,241 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.41 vs. limit=12.0 2023-12-21 21:10:06,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=234013.33333333334, ans=0.1 2023-12-21 21:10:13,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=234013.33333333334, ans=0.05 2023-12-21 21:10:15,214 INFO [train.py:886] (1/4) Epoch 8, batch 1750, loss[loss=0.01636, audio_tagging_loss=0.01636, over 25000.00 frames. ], tot_loss[loss=0.01551, audio_tagging_loss=0.01551, over 4953850.34 frames. ], batch size: 100, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:10:19,035 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.285e+01 2.526e+01 2.688e+01 2.921e+01 3.740e+01, threshold=5.376e+01, percent-clipped=0.0 2023-12-21 21:10:22,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=234080.0, ans=0.125 2023-12-21 21:10:27,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=234146.66666666666, ans=0.2 2023-12-21 21:10:51,945 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.43 vs. limit=15.0 2023-12-21 21:11:03,128 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.97 vs. limit=15.0 2023-12-21 21:11:08,760 INFO [train.py:886] (1/4) Epoch 8, batch 1800, loss[loss=0.0142, audio_tagging_loss=0.0142, over 25000.00 frames. ], tot_loss[loss=0.01562, audio_tagging_loss=0.01562, over 4954562.11 frames. ], batch size: 100, lr: 1.37e-02, grad_scale: 32.0 2023-12-21 21:11:15,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.92 vs. limit=15.0 2023-12-21 21:11:25,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=234480.0, ans=0.1 2023-12-21 21:11:25,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.77 vs. limit=6.0 2023-12-21 21:11:30,961 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.38 vs. limit=10.0 2023-12-21 21:11:36,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=234546.66666666666, ans=0.0 2023-12-21 21:11:47,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=234613.33333333334, ans=0.0 2023-12-21 21:11:52,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=234680.0, ans=0.0 2023-12-21 21:11:59,301 INFO [train.py:886] (1/4) Epoch 8, batch 1850, loss[loss=0.01154, audio_tagging_loss=0.01154, over 24050.00 frames. ], tot_loss[loss=0.01567, audio_tagging_loss=0.01567, over 4956394.62 frames. ], batch size: 100, lr: 1.37e-02, grad_scale: 32.0 2023-12-21 21:12:04,461 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.298e+01 2.642e+01 2.796e+01 3.049e+01 4.216e+01, threshold=5.592e+01, percent-clipped=0.0 2023-12-21 21:12:13,067 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=15.0 2023-12-21 21:12:22,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=234880.0, ans=0.0 2023-12-21 21:12:30,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=234946.66666666666, ans=0.0 2023-12-21 21:12:48,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=235013.33333333334, ans=0.125 2023-12-21 21:12:50,829 INFO [train.py:886] (1/4) Epoch 8, batch 1900, loss[loss=0.01738, audio_tagging_loss=0.01738, over 24077.00 frames. ], tot_loss[loss=0.01571, audio_tagging_loss=0.01571, over 4949506.20 frames. ], batch size: 100, lr: 1.37e-02, grad_scale: 32.0 2023-12-21 21:12:51,362 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.76 vs. limit=15.0 2023-12-21 21:13:15,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=235213.33333333334, ans=0.1 2023-12-21 21:13:28,686 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.75 vs. limit=6.0 2023-12-21 21:13:33,472 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.23 vs. limit=15.0 2023-12-21 21:13:42,949 INFO [train.py:886] (1/4) Epoch 8, batch 1950, loss[loss=0.01412, audio_tagging_loss=0.01412, over 24750.00 frames. ], tot_loss[loss=0.01567, audio_tagging_loss=0.01567, over 4946974.55 frames. ], batch size: 99, lr: 1.37e-02, grad_scale: 32.0 2023-12-21 21:13:47,359 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.282e+01 2.599e+01 2.763e+01 2.892e+01 3.548e+01, threshold=5.526e+01, percent-clipped=0.0 2023-12-21 21:13:51,239 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=12.0 2023-12-21 21:14:04,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=235546.66666666666, ans=0.1 2023-12-21 21:14:05,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=235546.66666666666, ans=0.0 2023-12-21 21:14:17,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=235613.33333333334, ans=0.125 2023-12-21 21:14:22,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=235613.33333333334, ans=0.1 2023-12-21 21:14:31,945 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.45 vs. limit=15.0 2023-12-21 21:14:34,267 INFO [train.py:886] (1/4) Epoch 8, batch 2000, loss[loss=0.01281, audio_tagging_loss=0.01281, over 25000.00 frames. ], tot_loss[loss=0.01563, audio_tagging_loss=0.01563, over 4944120.01 frames. ], batch size: 100, lr: 1.37e-02, grad_scale: 32.0 2023-12-21 21:14:43,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=235746.66666666666, ans=0.0 2023-12-21 21:14:50,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=235813.33333333334, ans=0.125 2023-12-21 21:15:03,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=235880.0, ans=0.0 2023-12-21 21:15:04,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=235946.66666666666, ans=0.0 2023-12-21 21:15:12,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=235946.66666666666, ans=0.07 2023-12-21 21:15:15,027 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.568e-01 2023-12-21 21:15:21,936 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.17 vs. limit=15.0 2023-12-21 21:15:22,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=236013.33333333334, ans=0.125 2023-12-21 21:15:24,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=236013.33333333334, ans=0.1 2023-12-21 21:15:26,864 INFO [train.py:886] (1/4) Epoch 8, batch 2050, loss[loss=0.01727, audio_tagging_loss=0.01727, over 25000.00 frames. ], tot_loss[loss=0.01553, audio_tagging_loss=0.01553, over 4949190.42 frames. ], batch size: 100, lr: 1.37e-02, grad_scale: 64.0 2023-12-21 21:15:29,939 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.72 vs. limit=15.0 2023-12-21 21:15:31,346 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.176e+01 2.542e+01 2.684e+01 2.827e+01 3.551e+01, threshold=5.367e+01, percent-clipped=0.0 2023-12-21 21:15:40,393 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.88 vs. limit=15.0 2023-12-21 21:15:41,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=236146.66666666666, ans=0.125 2023-12-21 21:15:49,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=236213.33333333334, ans=0.125 2023-12-21 21:15:56,956 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.27 vs. limit=15.0 2023-12-21 21:16:02,071 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.45 vs. limit=15.0 2023-12-21 21:16:05,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=236280.0, ans=0.125 2023-12-21 21:16:18,469 INFO [train.py:886] (1/4) Epoch 8, batch 2100, loss[loss=0.01204, audio_tagging_loss=0.01204, over 25000.00 frames. ], tot_loss[loss=0.01555, audio_tagging_loss=0.01555, over 4950717.31 frames. ], batch size: 100, lr: 1.37e-02, grad_scale: 64.0 2023-12-21 21:16:22,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=236413.33333333334, ans=0.0 2023-12-21 21:16:27,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=236413.33333333334, ans=0.125 2023-12-21 21:16:42,936 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.80 vs. limit=15.0 2023-12-21 21:16:50,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=236613.33333333334, ans=0.125 2023-12-21 21:16:52,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=236613.33333333334, ans=0.1 2023-12-21 21:16:59,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=236680.0, ans=0.1 2023-12-21 21:16:59,820 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.80 vs. limit=15.0 2023-12-21 21:17:05,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=236680.0, ans=0.125 2023-12-21 21:17:05,536 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.05 vs. limit=10.0 2023-12-21 21:17:07,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=236680.0, ans=0.125 2023-12-21 21:17:10,432 INFO [train.py:886] (1/4) Epoch 8, batch 2150, loss[loss=0.0161, audio_tagging_loss=0.0161, over 24750.00 frames. ], tot_loss[loss=0.01559, audio_tagging_loss=0.01559, over 4956664.04 frames. ], batch size: 99, lr: 1.37e-02, grad_scale: 64.0 2023-12-21 21:17:10,871 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.67 vs. limit=6.0 2023-12-21 21:17:10,985 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.94 vs. limit=22.5 2023-12-21 21:17:14,091 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.260e+01 2.652e+01 2.733e+01 2.907e+01 3.375e+01, threshold=5.466e+01, percent-clipped=0.0 2023-12-21 21:17:17,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=236746.66666666666, ans=0.025 2023-12-21 21:17:22,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=236813.33333333334, ans=0.0 2023-12-21 21:17:35,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=236880.0, ans=0.125 2023-12-21 21:17:43,264 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.32 vs. limit=15.0 2023-12-21 21:18:01,944 INFO [train.py:886] (1/4) Epoch 8, batch 2200, loss[loss=0.01712, audio_tagging_loss=0.01712, over 24750.00 frames. ], tot_loss[loss=0.01578, audio_tagging_loss=0.01578, over 4951311.89 frames. ], batch size: 99, lr: 1.37e-02, grad_scale: 64.0 2023-12-21 21:18:04,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=237080.0, ans=0.125 2023-12-21 21:18:10,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=237080.0, ans=0.0 2023-12-21 21:18:20,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=237146.66666666666, ans=0.0 2023-12-21 21:18:39,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=237280.0, ans=0.125 2023-12-21 21:18:49,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=237346.66666666666, ans=0.0 2023-12-21 21:18:53,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=237413.33333333334, ans=0.125 2023-12-21 21:18:53,521 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.09 vs. limit=10.0 2023-12-21 21:18:53,991 INFO [train.py:886] (1/4) Epoch 8, batch 2250, loss[loss=0.0164, audio_tagging_loss=0.0164, over 24750.00 frames. ], tot_loss[loss=0.0157, audio_tagging_loss=0.0157, over 4944243.76 frames. ], batch size: 99, lr: 1.37e-02, grad_scale: 64.0 2023-12-21 21:18:55,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=237413.33333333334, ans=0.125 2023-12-21 21:18:59,105 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.330e+01 2.643e+01 2.757e+01 2.915e+01 3.398e+01, threshold=5.513e+01, percent-clipped=0.0 2023-12-21 21:18:59,405 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-21 21:19:43,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=237680.0, ans=0.125 2023-12-21 21:19:46,180 INFO [train.py:886] (1/4) Epoch 8, batch 2300, loss[loss=0.01429, audio_tagging_loss=0.01429, over 25000.00 frames. ], tot_loss[loss=0.01565, audio_tagging_loss=0.01565, over 4943415.77 frames. ], batch size: 100, lr: 1.37e-02, grad_scale: 64.0 2023-12-21 21:19:54,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=237746.66666666666, ans=0.2 2023-12-21 21:20:07,834 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.84 vs. limit=15.0 2023-12-21 21:20:13,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.86 vs. limit=15.0 2023-12-21 21:20:38,780 INFO [train.py:886] (1/4) Epoch 8, batch 2350, loss[loss=0.01542, audio_tagging_loss=0.01542, over 25000.00 frames. ], tot_loss[loss=0.01559, audio_tagging_loss=0.01559, over 4947127.61 frames. ], batch size: 100, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:20:42,602 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 2.503e+01 2.677e+01 2.839e+01 3.914e+01, threshold=5.353e+01, percent-clipped=0.0 2023-12-21 21:20:43,115 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.17 vs. limit=15.0 2023-12-21 21:20:49,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=238146.66666666666, ans=0.125 2023-12-21 21:21:06,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=238213.33333333334, ans=0.0 2023-12-21 21:21:09,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=238280.0, ans=0.0 2023-12-21 21:21:19,678 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.26 vs. limit=15.0 2023-12-21 21:21:29,906 INFO [train.py:886] (1/4) Epoch 8, batch 2400, loss[loss=0.01419, audio_tagging_loss=0.01419, over 25000.00 frames. ], tot_loss[loss=0.01556, audio_tagging_loss=0.01556, over 4948720.68 frames. ], batch size: 100, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:21:35,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=238413.33333333334, ans=10.0 2023-12-21 21:21:44,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=17.25 vs. limit=15.0 2023-12-21 21:22:10,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=238613.33333333334, ans=0.2 2023-12-21 21:22:11,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=238680.0, ans=0.0 2023-12-21 21:22:22,521 INFO [train.py:886] (1/4) Epoch 8, batch 2450, loss[loss=0.01463, audio_tagging_loss=0.01463, over 25000.00 frames. ], tot_loss[loss=0.01563, audio_tagging_loss=0.01563, over 4950563.06 frames. ], batch size: 100, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:22:26,970 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+01 2.604e+01 2.785e+01 2.949e+01 3.842e+01, threshold=5.569e+01, percent-clipped=0.0 2023-12-21 21:22:40,547 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.35 vs. limit=15.0 2023-12-21 21:22:43,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=238880.0, ans=0.125 2023-12-21 21:22:48,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=238880.0, ans=15.0 2023-12-21 21:22:52,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=238880.0, ans=0.0 2023-12-21 21:23:01,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=238946.66666666666, ans=0.2 2023-12-21 21:23:13,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=239080.0, ans=0.0 2023-12-21 21:23:14,636 INFO [train.py:886] (1/4) Epoch 8, batch 2500, loss[loss=0.01809, audio_tagging_loss=0.01809, over 24750.00 frames. ], tot_loss[loss=0.01579, audio_tagging_loss=0.01579, over 4946479.08 frames. ], batch size: 99, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:23:17,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=239080.0, ans=0.0 2023-12-21 21:23:46,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=239280.0, ans=0.125 2023-12-21 21:23:53,692 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.61 vs. limit=15.0 2023-12-21 21:24:05,783 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.15 vs. limit=15.0 2023-12-21 21:24:05,799 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.43 vs. limit=15.0 2023-12-21 21:24:06,255 INFO [train.py:886] (1/4) Epoch 8, batch 2550, loss[loss=0.01534, audio_tagging_loss=0.01534, over 24750.00 frames. ], tot_loss[loss=0.01581, audio_tagging_loss=0.01581, over 4946198.83 frames. ], batch size: 99, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:24:07,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=239413.33333333334, ans=0.1 2023-12-21 21:24:08,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=239413.33333333334, ans=0.125 2023-12-21 21:24:09,935 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.371e+01 2.703e+01 2.852e+01 3.044e+01 3.567e+01, threshold=5.704e+01, percent-clipped=0.0 2023-12-21 21:24:32,131 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=15.0 2023-12-21 21:24:45,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=239613.33333333334, ans=0.125 2023-12-21 21:24:46,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=239680.0, ans=0.0 2023-12-21 21:24:57,221 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.55 vs. limit=6.0 2023-12-21 21:24:58,487 INFO [train.py:886] (1/4) Epoch 8, batch 2600, loss[loss=0.01563, audio_tagging_loss=0.01563, over 25000.00 frames. ], tot_loss[loss=0.01568, audio_tagging_loss=0.01568, over 4948793.03 frames. ], batch size: 100, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:25:22,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=239880.0, ans=0.0 2023-12-21 21:25:43,723 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.32 vs. limit=12.0 2023-12-21 21:25:50,965 INFO [train.py:886] (1/4) Epoch 8, batch 2650, loss[loss=0.01781, audio_tagging_loss=0.01781, over 25000.00 frames. ], tot_loss[loss=0.01563, audio_tagging_loss=0.01563, over 4950927.16 frames. ], batch size: 100, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:25:55,394 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.141e+01 2.630e+01 2.789e+01 2.971e+01 3.583e+01, threshold=5.578e+01, percent-clipped=0.0 2023-12-21 21:26:00,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=240146.66666666666, ans=0.035 2023-12-21 21:26:12,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=240213.33333333334, ans=0.125 2023-12-21 21:26:16,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=240213.33333333334, ans=0.1 2023-12-21 21:26:16,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=240213.33333333334, ans=0.125 2023-12-21 21:26:19,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=240213.33333333334, ans=0.125 2023-12-21 21:26:35,485 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.278e-02 2023-12-21 21:26:42,494 INFO [train.py:886] (1/4) Epoch 8, batch 2700, loss[loss=0.01474, audio_tagging_loss=0.01474, over 25000.00 frames. ], tot_loss[loss=0.01553, audio_tagging_loss=0.01553, over 4953815.91 frames. ], batch size: 100, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:26:49,223 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.826e-02 2023-12-21 21:26:54,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=240480.0, ans=0.125 2023-12-21 21:26:56,125 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.84 vs. limit=6.0 2023-12-21 21:26:57,180 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.47 vs. limit=10.0 2023-12-21 21:27:04,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=240546.66666666666, ans=0.04949747468305833 2023-12-21 21:27:05,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=240546.66666666666, ans=0.0 2023-12-21 21:27:33,954 INFO [train.py:886] (1/4) Epoch 8, batch 2750, loss[loss=0.01358, audio_tagging_loss=0.01358, over 25000.00 frames. ], tot_loss[loss=0.0156, audio_tagging_loss=0.0156, over 4957339.51 frames. ], batch size: 100, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:27:35,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=240746.66666666666, ans=0.125 2023-12-21 21:27:37,717 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.198e+01 2.574e+01 2.741e+01 2.911e+01 3.788e+01, threshold=5.483e+01, percent-clipped=0.0 2023-12-21 21:27:40,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=240746.66666666666, ans=0.0 2023-12-21 21:28:05,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.26 vs. limit=22.5 2023-12-21 21:28:20,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=241013.33333333334, ans=10.0 2023-12-21 21:28:25,019 INFO [train.py:886] (1/4) Epoch 8, batch 2800, loss[loss=0.01702, audio_tagging_loss=0.01702, over 25000.00 frames. ], tot_loss[loss=0.01556, audio_tagging_loss=0.01556, over 4959802.50 frames. ], batch size: 100, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:28:40,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=241146.66666666666, ans=0.1 2023-12-21 21:29:14,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.69 vs. limit=15.0 2023-12-21 21:29:16,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=241346.66666666666, ans=0.1 2023-12-21 21:29:17,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=241413.33333333334, ans=0.07 2023-12-21 21:29:17,692 INFO [train.py:886] (1/4) Epoch 8, batch 2850, loss[loss=0.01469, audio_tagging_loss=0.01469, over 24750.00 frames. ], tot_loss[loss=0.01575, audio_tagging_loss=0.01575, over 4946432.47 frames. ], batch size: 99, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:29:22,198 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.421e+01 2.608e+01 2.780e+01 2.965e+01 3.474e+01, threshold=5.560e+01, percent-clipped=0.0 2023-12-21 21:29:23,559 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=28.85 vs. limit=22.5 2023-12-21 21:29:29,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=241480.0, ans=0.2 2023-12-21 21:29:33,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=241480.0, ans=0.2 2023-12-21 21:29:34,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=12.0 2023-12-21 21:30:08,457 INFO [train.py:886] (1/4) Epoch 8, batch 2900, loss[loss=0.01231, audio_tagging_loss=0.01231, over 25000.00 frames. ], tot_loss[loss=0.0157, audio_tagging_loss=0.0157, over 4943353.54 frames. ], batch size: 100, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:30:20,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=241813.33333333334, ans=0.125 2023-12-21 21:30:24,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=241813.33333333334, ans=0.2 2023-12-21 21:30:31,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=241880.0, ans=0.125 2023-12-21 21:30:33,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=241880.0, ans=0.2 2023-12-21 21:30:35,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=241880.0, ans=0.125 2023-12-21 21:30:42,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=241946.66666666666, ans=0.125 2023-12-21 21:30:57,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=242013.33333333334, ans=0.0 2023-12-21 21:31:00,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=242080.0, ans=0.125 2023-12-21 21:31:01,406 INFO [train.py:886] (1/4) Epoch 8, batch 2950, loss[loss=0.0115, audio_tagging_loss=0.0115, over 25000.00 frames. ], tot_loss[loss=0.01563, audio_tagging_loss=0.01563, over 4949164.80 frames. ], batch size: 100, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:31:03,862 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.28 vs. limit=22.5 2023-12-21 21:31:04,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=242080.0, ans=10.0 2023-12-21 21:31:05,143 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.238e+01 2.582e+01 2.728e+01 2.928e+01 3.566e+01, threshold=5.455e+01, percent-clipped=0.0 2023-12-21 21:31:11,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=242146.66666666666, ans=0.95 2023-12-21 21:31:28,395 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.02 vs. limit=15.0 2023-12-21 21:31:33,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=242280.0, ans=0.2 2023-12-21 21:31:44,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=242346.66666666666, ans=0.0 2023-12-21 21:31:51,650 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.47 vs. limit=15.0 2023-12-21 21:31:52,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=242413.33333333334, ans=0.1 2023-12-21 21:31:53,111 INFO [train.py:886] (1/4) Epoch 8, batch 3000, loss[loss=0.01363, audio_tagging_loss=0.01363, over 25000.00 frames. ], tot_loss[loss=0.01553, audio_tagging_loss=0.01553, over 4944927.96 frames. ], batch size: 100, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:31:53,112 INFO [train.py:909] (1/4) Computing validation loss 2023-12-21 21:32:03,297 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.6878, 3.0317, 2.9058, 2.3569], device='cuda:1') 2023-12-21 21:32:06,788 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.1882, 3.4509, 3.4772, 3.1252], device='cuda:1') 2023-12-21 21:32:14,410 INFO [train.py:917] (1/4) Epoch 8, validation: loss=0.03648, audio_tagging_loss=0.03648, over 3737520.00 frames. 2023-12-21 21:32:14,411 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-21 21:32:15,978 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.27 vs. limit=15.0 2023-12-21 21:32:16,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=242413.33333333334, ans=0.125 2023-12-21 21:32:23,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=242480.0, ans=0.0 2023-12-21 21:32:31,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=242480.0, ans=0.0 2023-12-21 21:32:36,166 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.34 vs. limit=15.0 2023-12-21 21:32:42,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=242546.66666666666, ans=0.0 2023-12-21 21:32:54,782 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=12.0 2023-12-21 21:32:59,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=242680.0, ans=0.0 2023-12-21 21:33:05,997 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.39 vs. limit=10.0 2023-12-21 21:33:06,490 INFO [train.py:886] (1/4) Epoch 8, batch 3050, loss[loss=0.01551, audio_tagging_loss=0.01551, over 24750.00 frames. ], tot_loss[loss=0.01551, audio_tagging_loss=0.01551, over 4952487.53 frames. ], batch size: 99, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:33:10,394 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.255e+01 2.603e+01 2.768e+01 2.944e+01 3.581e+01, threshold=5.536e+01, percent-clipped=0.0 2023-12-21 21:33:13,538 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.623e-03 2023-12-21 21:33:25,380 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.55 vs. limit=15.0 2023-12-21 21:33:52,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=243013.33333333334, ans=0.0 2023-12-21 21:33:56,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.02 vs. limit=5.0 2023-12-21 21:33:57,601 INFO [train.py:886] (1/4) Epoch 8, batch 3100, loss[loss=0.01748, audio_tagging_loss=0.01748, over 24750.00 frames. ], tot_loss[loss=0.01568, audio_tagging_loss=0.01568, over 4952522.38 frames. ], batch size: 99, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:34:15,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=243146.66666666666, ans=0.125 2023-12-21 21:34:27,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=243213.33333333334, ans=0.125 2023-12-21 21:34:30,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=243280.0, ans=0.2 2023-12-21 21:34:33,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=243280.0, ans=0.125 2023-12-21 21:34:41,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=243346.66666666666, ans=0.0 2023-12-21 21:34:42,929 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.46 vs. limit=15.0 2023-12-21 21:34:49,034 INFO [train.py:886] (1/4) Epoch 8, batch 3150, loss[loss=0.01637, audio_tagging_loss=0.01637, over 24750.00 frames. ], tot_loss[loss=0.01569, audio_tagging_loss=0.01569, over 4951125.50 frames. ], batch size: 99, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:34:52,860 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.379e+01 2.676e+01 2.832e+01 3.035e+01 3.552e+01, threshold=5.664e+01, percent-clipped=0.0 2023-12-21 21:34:59,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=243480.0, ans=0.125 2023-12-21 21:34:59,656 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.46 vs. limit=6.0 2023-12-21 21:35:03,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=243480.0, ans=0.125 2023-12-21 21:35:08,807 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.49 vs. limit=15.0 2023-12-21 21:35:42,049 INFO [train.py:886] (1/4) Epoch 8, batch 3200, loss[loss=0.01296, audio_tagging_loss=0.01296, over 24750.00 frames. ], tot_loss[loss=0.01581, audio_tagging_loss=0.01581, over 4952400.63 frames. ], batch size: 99, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:35:48,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=243746.66666666666, ans=0.0 2023-12-21 21:35:49,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=243746.66666666666, ans=0.2 2023-12-21 21:35:50,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=243813.33333333334, ans=0.5 2023-12-21 21:36:33,920 INFO [train.py:886] (1/4) Epoch 8, batch 3250, loss[loss=0.01343, audio_tagging_loss=0.01343, over 25000.00 frames. ], tot_loss[loss=0.01567, audio_tagging_loss=0.01567, over 4953205.07 frames. ], batch size: 100, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:36:37,713 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.332e+01 2.581e+01 2.751e+01 2.965e+01 3.733e+01, threshold=5.502e+01, percent-clipped=0.0 2023-12-21 21:36:42,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=244080.0, ans=0.0 2023-12-21 21:36:47,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=244146.66666666666, ans=0.5 2023-12-21 21:37:14,712 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.33 vs. limit=22.5 2023-12-21 21:37:25,274 INFO [train.py:886] (1/4) Epoch 8, batch 3300, loss[loss=0.01571, audio_tagging_loss=0.01571, over 25000.00 frames. ], tot_loss[loss=0.01561, audio_tagging_loss=0.01561, over 4955301.91 frames. ], batch size: 100, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:37:30,288 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=7.278e-01 2023-12-21 21:38:02,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=244613.33333333334, ans=0.0 2023-12-21 21:38:05,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=244680.0, ans=0.0 2023-12-21 21:38:07,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=244680.0, ans=0.05 2023-12-21 21:38:17,048 INFO [train.py:886] (1/4) Epoch 8, batch 3350, loss[loss=0.01665, audio_tagging_loss=0.01665, over 25000.00 frames. ], tot_loss[loss=0.01555, audio_tagging_loss=0.01555, over 4958501.85 frames. ], batch size: 100, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:38:19,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=244746.66666666666, ans=0.125 2023-12-21 21:38:21,608 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.276e+01 2.538e+01 2.708e+01 2.913e+01 3.391e+01, threshold=5.415e+01, percent-clipped=0.0 2023-12-21 21:38:23,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=244746.66666666666, ans=0.125 2023-12-21 21:38:34,751 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 21:38:40,787 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.63 vs. limit=15.0 2023-12-21 21:38:56,145 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.24 vs. limit=15.0 2023-12-21 21:39:05,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=245013.33333333334, ans=0.0 2023-12-21 21:39:08,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=245080.0, ans=0.1 2023-12-21 21:39:08,644 INFO [train.py:886] (1/4) Epoch 8, batch 3400, loss[loss=0.01525, audio_tagging_loss=0.01525, over 21198.00 frames. ], tot_loss[loss=0.01563, audio_tagging_loss=0.01563, over 4957710.66 frames. ], batch size: 107, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:39:38,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=245280.0, ans=0.0 2023-12-21 21:39:39,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=245280.0, ans=0.2 2023-12-21 21:39:39,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=245280.0, ans=0.125 2023-12-21 21:39:40,996 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 21:39:44,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=245280.0, ans=0.0 2023-12-21 21:39:47,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=245280.0, ans=0.2 2023-12-21 21:39:55,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=245346.66666666666, ans=0.2 2023-12-21 21:39:55,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=245346.66666666666, ans=0.07 2023-12-21 21:40:01,072 INFO [train.py:886] (1/4) Epoch 8, batch 3450, loss[loss=0.01561, audio_tagging_loss=0.01561, over 24750.00 frames. ], tot_loss[loss=0.01564, audio_tagging_loss=0.01564, over 4958854.90 frames. ], batch size: 99, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:40:01,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.13 vs. limit=15.0 2023-12-21 21:40:04,819 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.354e+01 2.638e+01 2.791e+01 3.014e+01 3.775e+01, threshold=5.582e+01, percent-clipped=0.0 2023-12-21 21:40:06,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=245413.33333333334, ans=0.09899494936611666 2023-12-21 21:40:32,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=245613.33333333334, ans=0.0 2023-12-21 21:40:44,779 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.49 vs. limit=15.0 2023-12-21 21:40:52,623 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=8.003e-03 2023-12-21 21:40:53,257 INFO [train.py:886] (1/4) Epoch 8, batch 3500, loss[loss=0.01732, audio_tagging_loss=0.01732, over 25000.00 frames. ], tot_loss[loss=0.01568, audio_tagging_loss=0.01568, over 4953502.17 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:40:56,448 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.12 vs. limit=22.5 2023-12-21 21:41:06,757 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.19 vs. limit=15.0 2023-12-21 21:41:25,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=245946.66666666666, ans=0.0 2023-12-21 21:41:32,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=245946.66666666666, ans=0.0 2023-12-21 21:41:35,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=246013.33333333334, ans=0.1 2023-12-21 21:41:38,642 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.44 vs. limit=22.5 2023-12-21 21:41:39,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=246013.33333333334, ans=0.07 2023-12-21 21:41:44,052 INFO [train.py:886] (1/4) Epoch 8, batch 3550, loss[loss=0.01496, audio_tagging_loss=0.01496, over 25000.00 frames. ], tot_loss[loss=0.01564, audio_tagging_loss=0.01564, over 4948985.64 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:41:48,503 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.269e+01 2.549e+01 2.709e+01 2.899e+01 3.607e+01, threshold=5.417e+01, percent-clipped=0.0 2023-12-21 21:42:00,914 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.93 vs. limit=15.0 2023-12-21 21:42:10,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=246213.33333333334, ans=0.1 2023-12-21 21:42:11,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=246213.33333333334, ans=0.125 2023-12-21 21:42:13,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=246213.33333333334, ans=0.1 2023-12-21 21:42:30,330 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.77 vs. limit=6.0 2023-12-21 21:42:33,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=246346.66666666666, ans=0.0 2023-12-21 21:42:37,281 INFO [train.py:886] (1/4) Epoch 8, batch 3600, loss[loss=0.01322, audio_tagging_loss=0.01322, over 25000.00 frames. ], tot_loss[loss=0.01554, audio_tagging_loss=0.01554, over 4950088.03 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:42:42,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=246413.33333333334, ans=0.2 2023-12-21 21:42:50,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=246480.0, ans=0.0 2023-12-21 21:43:01,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=246546.66666666666, ans=0.5 2023-12-21 21:43:01,683 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.49 vs. limit=15.0 2023-12-21 21:43:29,312 INFO [train.py:886] (1/4) Epoch 8, batch 3650, loss[loss=0.01493, audio_tagging_loss=0.01493, over 24750.00 frames. ], tot_loss[loss=0.01558, audio_tagging_loss=0.01558, over 4945941.30 frames. ], batch size: 99, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:43:33,751 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+01 2.621e+01 2.795e+01 3.047e+01 3.909e+01, threshold=5.590e+01, percent-clipped=0.0 2023-12-21 21:43:55,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=246880.0, ans=0.1 2023-12-21 21:44:02,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=246946.66666666666, ans=0.125 2023-12-21 21:44:15,489 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.42 vs. limit=22.5 2023-12-21 21:44:20,650 INFO [train.py:886] (1/4) Epoch 8, batch 3700, loss[loss=0.01377, audio_tagging_loss=0.01377, over 25000.00 frames. ], tot_loss[loss=0.01553, audio_tagging_loss=0.01553, over 4947100.37 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:44:49,017 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.31 vs. limit=15.0 2023-12-21 21:45:11,564 INFO [train.py:886] (1/4) Epoch 8, batch 3750, loss[loss=0.01367, audio_tagging_loss=0.01367, over 24750.00 frames. ], tot_loss[loss=0.01563, audio_tagging_loss=0.01563, over 4945728.02 frames. ], batch size: 99, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:45:11,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=247413.33333333334, ans=0.0 2023-12-21 21:45:15,991 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.254e+01 2.638e+01 2.813e+01 3.001e+01 3.548e+01, threshold=5.627e+01, percent-clipped=0.0 2023-12-21 21:45:18,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=247413.33333333334, ans=0.125 2023-12-21 21:45:28,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=247480.0, ans=0.125 2023-12-21 21:45:57,327 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.54 vs. limit=22.5 2023-12-21 21:45:58,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=247680.0, ans=0.2 2023-12-21 21:45:59,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=247680.0, ans=0.125 2023-12-21 21:46:02,568 INFO [train.py:886] (1/4) Epoch 8, batch 3800, loss[loss=0.01591, audio_tagging_loss=0.01591, over 25000.00 frames. ], tot_loss[loss=0.01572, audio_tagging_loss=0.01572, over 4940883.34 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:46:07,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=247746.66666666666, ans=0.125 2023-12-21 21:46:27,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=247880.0, ans=0.125 2023-12-21 21:46:32,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=247880.0, ans=0.1 2023-12-21 21:46:42,230 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 21:46:54,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=248013.33333333334, ans=0.125 2023-12-21 21:46:55,695 INFO [train.py:886] (1/4) Epoch 8, batch 3850, loss[loss=0.01537, audio_tagging_loss=0.01537, over 21579.00 frames. ], tot_loss[loss=0.01569, audio_tagging_loss=0.01569, over 4940389.64 frames. ], batch size: 107, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:46:59,387 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.593e+01 2.766e+01 2.914e+01 3.476e+01, threshold=5.531e+01, percent-clipped=0.0 2023-12-21 21:47:04,377 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.391e-02 2023-12-21 21:47:16,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=248213.33333333334, ans=0.1 2023-12-21 21:47:17,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=248213.33333333334, ans=0.125 2023-12-21 21:47:30,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=248280.0, ans=0.125 2023-12-21 21:47:30,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=248280.0, ans=0.2 2023-12-21 21:47:32,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=248280.0, ans=0.1 2023-12-21 21:47:35,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=248346.66666666666, ans=0.1 2023-12-21 21:47:38,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=248346.66666666666, ans=0.125 2023-12-21 21:47:47,211 INFO [train.py:886] (1/4) Epoch 8, batch 3900, loss[loss=0.01363, audio_tagging_loss=0.01363, over 25000.00 frames. ], tot_loss[loss=0.01556, audio_tagging_loss=0.01556, over 4946400.67 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:47:49,477 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=17.27 vs. limit=15.0 2023-12-21 21:47:54,928 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.35 vs. limit=15.0 2023-12-21 21:48:07,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=248546.66666666666, ans=0.1 2023-12-21 21:48:28,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=248680.0, ans=0.2 2023-12-21 21:48:30,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=248680.0, ans=0.1 2023-12-21 21:48:34,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=248680.0, ans=0.0 2023-12-21 21:48:38,319 INFO [train.py:886] (1/4) Epoch 8, batch 3950, loss[loss=0.01253, audio_tagging_loss=0.01253, over 25000.00 frames. ], tot_loss[loss=0.01542, audio_tagging_loss=0.01542, over 4942764.72 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:48:38,968 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.78 vs. limit=12.0 2023-12-21 21:48:42,027 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.302e+01 2.574e+01 2.671e+01 2.865e+01 3.681e+01, threshold=5.342e+01, percent-clipped=0.0 2023-12-21 21:48:45,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=248746.66666666666, ans=0.0 2023-12-21 21:48:55,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=248813.33333333334, ans=0.0 2023-12-21 21:49:03,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=248880.0, ans=0.2 2023-12-21 21:49:13,239 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=9.815e-01 2023-12-21 21:49:19,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=249013.33333333334, ans=0.125 2023-12-21 21:49:19,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=249013.33333333334, ans=0.0 2023-12-21 21:49:28,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=249013.33333333334, ans=0.125 2023-12-21 21:49:30,369 INFO [train.py:886] (1/4) Epoch 8, batch 4000, loss[loss=0.0129, audio_tagging_loss=0.0129, over 25000.00 frames. ], tot_loss[loss=0.01545, audio_tagging_loss=0.01545, over 4948132.79 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:49:46,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=249146.66666666666, ans=0.0 2023-12-21 21:50:02,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=249280.0, ans=0.0 2023-12-21 21:50:16,552 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.94 vs. limit=15.0 2023-12-21 21:50:21,755 INFO [train.py:886] (1/4) Epoch 8, batch 4050, loss[loss=0.01552, audio_tagging_loss=0.01552, over 24750.00 frames. ], tot_loss[loss=0.01551, audio_tagging_loss=0.01551, over 4951744.06 frames. ], batch size: 99, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:50:27,056 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+01 2.602e+01 2.760e+01 2.948e+01 3.649e+01, threshold=5.519e+01, percent-clipped=0.0 2023-12-21 21:50:27,224 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 21:50:46,899 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.73 vs. limit=15.0 2023-12-21 21:50:48,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=249546.66666666666, ans=0.125 2023-12-21 21:51:04,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=249680.0, ans=0.2 2023-12-21 21:51:13,773 INFO [train.py:886] (1/4) Epoch 8, batch 4100, loss[loss=0.01505, audio_tagging_loss=0.01505, over 24750.00 frames. ], tot_loss[loss=0.01562, audio_tagging_loss=0.01562, over 4946395.33 frames. ], batch size: 99, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:51:25,692 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.88 vs. limit=22.5 2023-12-21 21:51:38,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=249880.0, ans=0.125 2023-12-21 21:51:47,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=249946.66666666666, ans=15.0 2023-12-21 21:51:50,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=249946.66666666666, ans=0.125 2023-12-21 21:52:05,002 INFO [train.py:886] (1/4) Epoch 8, batch 4150, loss[loss=0.01456, audio_tagging_loss=0.01456, over 24750.00 frames. ], tot_loss[loss=0.01565, audio_tagging_loss=0.01565, over 4942430.66 frames. ], batch size: 99, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:52:10,483 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.358e+01 2.710e+01 2.835e+01 2.954e+01 3.813e+01, threshold=5.670e+01, percent-clipped=0.0 2023-12-21 21:52:10,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=250080.0, ans=0.125 2023-12-21 21:52:16,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=250146.66666666666, ans=0.0 2023-12-21 21:52:36,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=250280.0, ans=0.2 2023-12-21 21:52:56,696 INFO [train.py:886] (1/4) Epoch 8, batch 4200, loss[loss=0.01381, audio_tagging_loss=0.01381, over 24750.00 frames. ], tot_loss[loss=0.01555, audio_tagging_loss=0.01555, over 4946703.32 frames. ], batch size: 99, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:53:03,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=250413.33333333334, ans=0.2 2023-12-21 21:53:11,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=250480.0, ans=0.125 2023-12-21 21:53:19,939 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.21 vs. limit=6.0 2023-12-21 21:53:23,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=250546.66666666666, ans=0.125 2023-12-21 21:53:29,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=250613.33333333334, ans=0.2 2023-12-21 21:53:49,351 INFO [train.py:886] (1/4) Epoch 8, batch 4250, loss[loss=0.01678, audio_tagging_loss=0.01678, over 25000.00 frames. ], tot_loss[loss=0.01555, audio_tagging_loss=0.01555, over 4950713.06 frames. ], batch size: 100, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:53:50,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=250746.66666666666, ans=0.2 2023-12-21 21:53:54,767 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.560e+01 2.714e+01 2.924e+01 4.261e+01, threshold=5.428e+01, percent-clipped=0.0 2023-12-21 21:54:14,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=250880.0, ans=0.0 2023-12-21 21:54:31,271 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.09 vs. limit=15.0 2023-12-21 21:54:33,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=251013.33333333334, ans=0.125 2023-12-21 21:54:35,992 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.35 vs. limit=15.0 2023-12-21 21:54:41,046 INFO [train.py:886] (1/4) Epoch 8, batch 4300, loss[loss=0.01273, audio_tagging_loss=0.01273, over 25000.00 frames. ], tot_loss[loss=0.01557, audio_tagging_loss=0.01557, over 4954517.62 frames. ], batch size: 100, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:54:44,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=251080.0, ans=0.1 2023-12-21 21:54:50,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=251146.66666666666, ans=0.125 2023-12-21 21:55:07,280 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.11 vs. limit=15.0 2023-12-21 21:55:14,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=251280.0, ans=0.125 2023-12-21 21:55:15,800 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.73 vs. limit=15.0 2023-12-21 21:55:19,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=251280.0, ans=0.2 2023-12-21 21:55:33,255 INFO [train.py:886] (1/4) Epoch 8, batch 4350, loss[loss=0.01602, audio_tagging_loss=0.01602, over 25000.00 frames. ], tot_loss[loss=0.01573, audio_tagging_loss=0.01573, over 4953055.97 frames. ], batch size: 100, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:55:36,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=251413.33333333334, ans=0.0 2023-12-21 21:55:37,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=251413.33333333334, ans=0.04949747468305833 2023-12-21 21:55:37,906 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.248e+01 2.680e+01 2.895e+01 3.105e+01 4.359e+01, threshold=5.791e+01, percent-clipped=0.0 2023-12-21 21:55:38,993 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 21:55:52,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=251546.66666666666, ans=0.07 2023-12-21 21:55:54,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=251546.66666666666, ans=0.125 2023-12-21 21:56:02,284 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.53 vs. limit=15.0 2023-12-21 21:56:19,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=251680.0, ans=0.1 2023-12-21 21:56:24,820 INFO [train.py:886] (1/4) Epoch 8, batch 4400, loss[loss=0.01244, audio_tagging_loss=0.01244, over 23984.00 frames. ], tot_loss[loss=0.01577, audio_tagging_loss=0.01577, over 4948715.76 frames. ], batch size: 100, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:56:36,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=251813.33333333334, ans=0.0 2023-12-21 21:56:36,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=251813.33333333334, ans=0.05 2023-12-21 21:56:46,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=251880.0, ans=0.125 2023-12-21 21:56:46,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=251880.0, ans=0.125 2023-12-21 21:56:57,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=251946.66666666666, ans=0.125 2023-12-21 21:57:16,309 INFO [train.py:886] (1/4) Epoch 8, batch 4450, loss[loss=0.01568, audio_tagging_loss=0.01568, over 24032.00 frames. ], tot_loss[loss=0.0158, audio_tagging_loss=0.0158, over 4951184.82 frames. ], batch size: 100, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:57:19,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=252080.0, ans=0.0 2023-12-21 21:57:19,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=252080.0, ans=0.2 2023-12-21 21:57:21,714 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.362e+01 2.660e+01 2.808e+01 3.032e+01 4.377e+01, threshold=5.615e+01, percent-clipped=0.0 2023-12-21 21:57:54,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=252280.0, ans=0.125 2023-12-21 21:58:08,662 INFO [train.py:886] (1/4) Epoch 8, batch 4500, loss[loss=0.01475, audio_tagging_loss=0.01475, over 25000.00 frames. ], tot_loss[loss=0.01571, audio_tagging_loss=0.01571, over 4948281.32 frames. ], batch size: 100, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:58:12,646 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 21:58:21,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=252480.0, ans=0.0 2023-12-21 21:58:33,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=252546.66666666666, ans=0.2 2023-12-21 21:58:38,609 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.49 vs. limit=10.0 2023-12-21 21:58:40,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.82 vs. limit=15.0 2023-12-21 21:58:44,134 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=7.513e-01 2023-12-21 21:58:54,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=252680.0, ans=0.125 2023-12-21 21:59:00,806 INFO [train.py:886] (1/4) Epoch 8, batch 4550, loss[loss=0.01357, audio_tagging_loss=0.01357, over 25000.00 frames. ], tot_loss[loss=0.01562, audio_tagging_loss=0.01562, over 4955369.48 frames. ], batch size: 100, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:59:06,128 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.249e+01 2.575e+01 2.765e+01 2.949e+01 3.693e+01, threshold=5.529e+01, percent-clipped=0.0 2023-12-21 21:59:12,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=252813.33333333334, ans=0.125 2023-12-21 21:59:18,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=252813.33333333334, ans=0.0 2023-12-21 21:59:25,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=252880.0, ans=0.0 2023-12-21 21:59:51,065 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=12.0 2023-12-21 21:59:52,616 INFO [train.py:886] (1/4) Epoch 8, batch 4600, loss[loss=0.01578, audio_tagging_loss=0.01578, over 24750.00 frames. ], tot_loss[loss=0.01558, audio_tagging_loss=0.01558, over 4954950.79 frames. ], batch size: 99, lr: 1.32e-02, grad_scale: 64.0 2023-12-21 21:59:53,977 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.81 vs. limit=22.5 2023-12-21 21:59:56,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=253080.0, ans=0.125 2023-12-21 22:00:20,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=253213.33333333334, ans=0.125 2023-12-21 22:00:33,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=253346.66666666666, ans=0.02 2023-12-21 22:00:37,317 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.96 vs. limit=15.0 2023-12-21 22:00:44,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.68 vs. limit=15.0 2023-12-21 22:00:45,210 INFO [train.py:886] (1/4) Epoch 8, batch 4650, loss[loss=0.01675, audio_tagging_loss=0.01675, over 25000.00 frames. ], tot_loss[loss=0.01556, audio_tagging_loss=0.01556, over 4960836.55 frames. ], batch size: 100, lr: 1.32e-02, grad_scale: 64.0 2023-12-21 22:00:50,676 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.217e+01 2.620e+01 2.747e+01 2.927e+01 3.887e+01, threshold=5.494e+01, percent-clipped=0.0 2023-12-21 22:00:58,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=253480.0, ans=0.0 2023-12-21 22:00:59,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=253480.0, ans=0.04949747468305833 2023-12-21 22:01:19,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.89 vs. limit=15.0 2023-12-21 22:01:23,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.54 vs. limit=15.0 2023-12-21 22:01:23,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=253613.33333333334, ans=0.125 2023-12-21 22:01:27,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=253680.0, ans=0.125 2023-12-21 22:01:30,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=253680.0, ans=0.2 2023-12-21 22:01:33,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=253680.0, ans=0.1 2023-12-21 22:01:35,716 INFO [train.py:886] (1/4) Epoch 8, batch 4700, loss[loss=0.01582, audio_tagging_loss=0.01582, over 24750.00 frames. ], tot_loss[loss=0.0156, audio_tagging_loss=0.0156, over 4958723.57 frames. ], batch size: 99, lr: 1.32e-02, grad_scale: 64.0 2023-12-21 22:02:05,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=253946.66666666666, ans=0.2 2023-12-21 22:02:12,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=253946.66666666666, ans=0.125 2023-12-21 22:02:20,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=254013.33333333334, ans=0.125 2023-12-21 22:02:23,292 INFO [train.py:886] (1/4) Epoch 8, batch 4750, loss[loss=0.01558, audio_tagging_loss=0.01558, over 24750.00 frames. ], tot_loss[loss=0.01575, audio_tagging_loss=0.01575, over 4955131.46 frames. ], batch size: 99, lr: 1.32e-02, grad_scale: 64.0 2023-12-21 22:02:27,781 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.333e+01 2.666e+01 2.796e+01 2.993e+01 3.759e+01, threshold=5.593e+01, percent-clipped=0.0 2023-12-21 22:02:32,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=254146.66666666666, ans=0.0 2023-12-21 22:03:00,071 INFO [train.py:886] (1/4) Epoch 9, batch 0, loss[loss=0.03661, audio_tagging_loss=0.03661, over 23970.00 frames. ], tot_loss[loss=0.03661, audio_tagging_loss=0.03661, over 23970.00 frames. ], batch size: 100, lr: 1.25e-02, grad_scale: 64.0 2023-12-21 22:03:00,072 INFO [train.py:909] (1/4) Computing validation loss 2023-12-21 22:03:21,365 INFO [train.py:917] (1/4) Epoch 9, validation: loss=0.03498, audio_tagging_loss=0.03498, over 3737520.00 frames. 2023-12-21 22:03:21,366 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-21 22:03:23,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=254186.66666666666, ans=0.0 2023-12-21 22:03:39,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=254253.33333333334, ans=0.05 2023-12-21 22:03:50,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=254320.0, ans=0.125 2023-12-21 22:03:51,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=254386.66666666666, ans=0.05 2023-12-21 22:03:53,910 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.97 vs. limit=15.0 2023-12-21 22:04:02,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=254453.33333333334, ans=0.0 2023-12-21 22:04:10,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=254453.33333333334, ans=0.2 2023-12-21 22:04:12,825 INFO [train.py:886] (1/4) Epoch 9, batch 50, loss[loss=0.01971, audio_tagging_loss=0.01971, over 25000.00 frames. ], tot_loss[loss=0.02504, audio_tagging_loss=0.02504, over 1105711.17 frames. ], batch size: 100, lr: 1.25e-02, grad_scale: 32.0 2023-12-21 22:04:25,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=254586.66666666666, ans=0.2 2023-12-21 22:04:34,798 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.15 vs. limit=8.0 2023-12-21 22:04:37,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=254653.33333333334, ans=0.125 2023-12-21 22:04:39,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=254653.33333333334, ans=0.0 2023-12-21 22:04:39,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=254653.33333333334, ans=0.0 2023-12-21 22:04:43,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=254720.0, ans=0.125 2023-12-21 22:04:45,553 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.56 vs. limit=15.0 2023-12-21 22:04:54,278 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.647e+01 3.011e+01 3.284e+01 3.905e+01 1.113e+02, threshold=6.568e+01, percent-clipped=8.0 2023-12-21 22:05:01,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=254786.66666666666, ans=0.2 2023-12-21 22:05:01,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=254786.66666666666, ans=0.125 2023-12-21 22:05:04,578 INFO [train.py:886] (1/4) Epoch 9, batch 100, loss[loss=0.01564, audio_tagging_loss=0.01564, over 25000.00 frames. ], tot_loss[loss=0.02149, audio_tagging_loss=0.02149, over 1963092.05 frames. ], batch size: 100, lr: 1.25e-02, grad_scale: 32.0 2023-12-21 22:05:13,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=254920.0, ans=0.0 2023-12-21 22:05:34,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=255053.33333333334, ans=0.0 2023-12-21 22:05:35,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=255053.33333333334, ans=0.125 2023-12-21 22:05:37,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=255053.33333333334, ans=0.0 2023-12-21 22:05:38,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=255053.33333333334, ans=0.0 2023-12-21 22:05:43,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=255120.0, ans=0.125 2023-12-21 22:05:44,194 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.20 vs. limit=22.5 2023-12-21 22:05:44,976 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.84 vs. limit=15.0 2023-12-21 22:05:55,434 INFO [train.py:886] (1/4) Epoch 9, batch 150, loss[loss=0.01469, audio_tagging_loss=0.01469, over 25000.00 frames. ], tot_loss[loss=0.01953, audio_tagging_loss=0.01953, over 2631914.41 frames. ], batch size: 100, lr: 1.25e-02, grad_scale: 32.0 2023-12-21 22:06:11,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=255253.33333333334, ans=0.0 2023-12-21 22:06:37,353 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.420e+01 2.677e+01 2.811e+01 2.976e+01 3.484e+01, threshold=5.622e+01, percent-clipped=0.0 2023-12-21 22:06:47,044 INFO [train.py:886] (1/4) Epoch 9, batch 200, loss[loss=0.01624, audio_tagging_loss=0.01624, over 25000.00 frames. ], tot_loss[loss=0.01839, audio_tagging_loss=0.01839, over 3152038.67 frames. ], batch size: 100, lr: 1.25e-02, grad_scale: 32.0 2023-12-21 22:06:47,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=255520.0, ans=0.2 2023-12-21 22:06:58,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=255586.66666666666, ans=0.1 2023-12-21 22:07:17,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=255720.0, ans=0.1 2023-12-21 22:07:25,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=255720.0, ans=0.125 2023-12-21 22:07:29,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=255786.66666666666, ans=0.0 2023-12-21 22:07:31,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=255786.66666666666, ans=0.1 2023-12-21 22:07:39,170 INFO [train.py:886] (1/4) Epoch 9, batch 250, loss[loss=0.01694, audio_tagging_loss=0.01694, over 24750.00 frames. ], tot_loss[loss=0.01752, audio_tagging_loss=0.01752, over 3555461.54 frames. ], batch size: 99, lr: 1.25e-02, grad_scale: 32.0 2023-12-21 22:07:49,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=255920.0, ans=0.1 2023-12-21 22:07:56,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=255920.0, ans=0.0 2023-12-21 22:08:05,926 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.17 vs. limit=12.0 2023-12-21 22:08:10,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=256053.33333333334, ans=0.125 2023-12-21 22:08:20,988 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.218e+01 2.572e+01 2.690e+01 2.867e+01 4.305e+01, threshold=5.380e+01, percent-clipped=0.0 2023-12-21 22:08:25,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=256120.0, ans=0.0 2023-12-21 22:08:25,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=256120.0, ans=0.0 2023-12-21 22:08:30,618 INFO [train.py:886] (1/4) Epoch 9, batch 300, loss[loss=0.02022, audio_tagging_loss=0.02022, over 24750.00 frames. ], tot_loss[loss=0.01698, audio_tagging_loss=0.01698, over 3867856.62 frames. ], batch size: 99, lr: 1.25e-02, grad_scale: 32.0 2023-12-21 22:08:37,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=256186.66666666666, ans=0.125 2023-12-21 22:08:39,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=256186.66666666666, ans=0.125 2023-12-21 22:08:48,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=256253.33333333334, ans=0.125 2023-12-21 22:09:00,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=256320.0, ans=0.2 2023-12-21 22:09:03,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=256386.66666666666, ans=10.0 2023-12-21 22:09:07,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=256386.66666666666, ans=0.125 2023-12-21 22:09:16,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=256453.33333333334, ans=0.125 2023-12-21 22:09:22,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=256453.33333333334, ans=0.125 2023-12-21 22:09:23,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=256520.0, ans=0.1 2023-12-21 22:09:23,829 INFO [train.py:886] (1/4) Epoch 9, batch 350, loss[loss=0.01699, audio_tagging_loss=0.01699, over 24750.00 frames. ], tot_loss[loss=0.0167, audio_tagging_loss=0.0167, over 4108166.06 frames. ], batch size: 99, lr: 1.25e-02, grad_scale: 32.0 2023-12-21 22:09:26,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=256520.0, ans=0.125 2023-12-21 22:09:33,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=256586.66666666666, ans=0.0 2023-12-21 22:09:39,277 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.61 vs. limit=22.5 2023-12-21 22:09:39,491 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.58 vs. limit=15.0 2023-12-21 22:09:50,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=256653.33333333334, ans=0.2 2023-12-21 22:09:57,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=256720.0, ans=0.125 2023-12-21 22:10:04,508 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.132e+01 2.569e+01 2.774e+01 2.957e+01 3.605e+01, threshold=5.548e+01, percent-clipped=0.0 2023-12-21 22:10:15,407 INFO [train.py:886] (1/4) Epoch 9, batch 400, loss[loss=0.01288, audio_tagging_loss=0.01288, over 25000.00 frames. ], tot_loss[loss=0.01633, audio_tagging_loss=0.01633, over 4296640.07 frames. ], batch size: 100, lr: 1.25e-02, grad_scale: 32.0 2023-12-21 22:10:38,202 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.94 vs. limit=22.5 2023-12-21 22:11:04,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=257120.0, ans=0.0 2023-12-21 22:11:07,310 INFO [train.py:886] (1/4) Epoch 9, batch 450, loss[loss=0.01396, audio_tagging_loss=0.01396, over 25000.00 frames. ], tot_loss[loss=0.01592, audio_tagging_loss=0.01592, over 4443667.69 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:11:19,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=257253.33333333334, ans=0.125 2023-12-21 22:11:31,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=257320.0, ans=0.125 2023-12-21 22:11:34,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=257320.0, ans=0.125 2023-12-21 22:11:48,940 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.177e+01 2.608e+01 2.793e+01 2.951e+01 3.727e+01, threshold=5.585e+01, percent-clipped=0.0 2023-12-21 22:11:50,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=257453.33333333334, ans=0.0 2023-12-21 22:12:00,518 INFO [train.py:886] (1/4) Epoch 9, batch 500, loss[loss=0.01386, audio_tagging_loss=0.01386, over 23991.00 frames. ], tot_loss[loss=0.01572, audio_tagging_loss=0.01572, over 4553024.05 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:12:12,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=257586.66666666666, ans=0.125 2023-12-21 22:12:27,002 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.24 vs. limit=15.0 2023-12-21 22:12:33,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=257720.0, ans=0.04949747468305833 2023-12-21 22:12:47,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=257786.66666666666, ans=0.125 2023-12-21 22:12:51,067 INFO [train.py:886] (1/4) Epoch 9, batch 550, loss[loss=0.01479, audio_tagging_loss=0.01479, over 25000.00 frames. ], tot_loss[loss=0.01568, audio_tagging_loss=0.01568, over 4645510.27 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:13:01,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=257920.0, ans=0.2 2023-12-21 22:13:11,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=257986.66666666666, ans=0.0 2023-12-21 22:13:15,420 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.95 vs. limit=15.0 2023-12-21 22:13:17,321 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.32 vs. limit=22.5 2023-12-21 22:13:32,268 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.239e+01 2.530e+01 2.671e+01 2.882e+01 3.618e+01, threshold=5.343e+01, percent-clipped=0.0 2023-12-21 22:13:43,184 INFO [train.py:886] (1/4) Epoch 9, batch 600, loss[loss=0.01837, audio_tagging_loss=0.01837, over 24750.00 frames. ], tot_loss[loss=0.01569, audio_tagging_loss=0.01569, over 4712892.50 frames. ], batch size: 99, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:13:44,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=258186.66666666666, ans=0.125 2023-12-21 22:13:56,755 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=15.0 2023-12-21 22:14:34,964 INFO [train.py:886] (1/4) Epoch 9, batch 650, loss[loss=0.01753, audio_tagging_loss=0.01753, over 25000.00 frames. ], tot_loss[loss=0.01583, audio_tagging_loss=0.01583, over 4762995.31 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:14:37,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=258520.0, ans=0.125 2023-12-21 22:15:01,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=258653.33333333334, ans=0.125 2023-12-21 22:15:17,031 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.299e+01 2.595e+01 2.763e+01 2.915e+01 3.436e+01, threshold=5.525e+01, percent-clipped=0.0 2023-12-21 22:15:24,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=258786.66666666666, ans=0.125 2023-12-21 22:15:26,594 INFO [train.py:886] (1/4) Epoch 9, batch 700, loss[loss=0.01601, audio_tagging_loss=0.01601, over 25000.00 frames. ], tot_loss[loss=0.01573, audio_tagging_loss=0.01573, over 4804588.70 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:15:29,468 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.584e-01 2023-12-21 22:15:36,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=258920.0, ans=0.125 2023-12-21 22:15:48,947 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.04 vs. limit=15.0 2023-12-21 22:16:09,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=259120.0, ans=0.5 2023-12-21 22:16:16,719 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.314e-02 2023-12-21 22:16:18,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=259186.66666666666, ans=0.1 2023-12-21 22:16:19,302 INFO [train.py:886] (1/4) Epoch 9, batch 750, loss[loss=0.01354, audio_tagging_loss=0.01354, over 25000.00 frames. ], tot_loss[loss=0.01569, audio_tagging_loss=0.01569, over 4835280.14 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:16:39,251 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.80 vs. limit=15.0 2023-12-21 22:16:58,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=259386.66666666666, ans=0.0 2023-12-21 22:17:01,047 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 2.547e+01 2.724e+01 2.898e+01 3.402e+01, threshold=5.448e+01, percent-clipped=0.0 2023-12-21 22:17:04,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=259453.33333333334, ans=0.1 2023-12-21 22:17:10,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=259520.0, ans=0.1 2023-12-21 22:17:11,294 INFO [train.py:886] (1/4) Epoch 9, batch 800, loss[loss=0.01539, audio_tagging_loss=0.01539, over 24750.00 frames. ], tot_loss[loss=0.0156, audio_tagging_loss=0.0156, over 4866802.82 frames. ], batch size: 99, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:17:24,364 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-21 22:17:27,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=259586.66666666666, ans=0.2 2023-12-21 22:17:38,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=259653.33333333334, ans=0.125 2023-12-21 22:17:55,014 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.92 vs. limit=22.5 2023-12-21 22:18:02,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.93 vs. limit=15.0 2023-12-21 22:18:03,816 INFO [train.py:886] (1/4) Epoch 9, batch 850, loss[loss=0.01681, audio_tagging_loss=0.01681, over 25000.00 frames. ], tot_loss[loss=0.01552, audio_tagging_loss=0.01552, over 4889160.85 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:18:04,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=259853.33333333334, ans=0.125 2023-12-21 22:18:05,400 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2023-12-21 22:18:28,706 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.96 vs. limit=15.0 2023-12-21 22:18:33,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=259986.66666666666, ans=0.125 2023-12-21 22:18:45,094 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.246e+01 2.660e+01 2.834e+01 3.014e+01 3.541e+01, threshold=5.667e+01, percent-clipped=0.0 2023-12-21 22:18:56,057 INFO [train.py:886] (1/4) Epoch 9, batch 900, loss[loss=0.01689, audio_tagging_loss=0.01689, over 24750.00 frames. ], tot_loss[loss=0.01556, audio_tagging_loss=0.01556, over 4905528.57 frames. ], batch size: 99, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:19:00,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=260186.66666666666, ans=0.125 2023-12-21 22:19:10,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=260253.33333333334, ans=0.125 2023-12-21 22:19:21,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=260320.0, ans=0.1 2023-12-21 22:19:48,422 INFO [train.py:886] (1/4) Epoch 9, batch 950, loss[loss=0.01744, audio_tagging_loss=0.01744, over 24750.00 frames. ], tot_loss[loss=0.01562, audio_tagging_loss=0.01562, over 4902273.17 frames. ], batch size: 99, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:20:23,081 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.58 vs. limit=15.0 2023-12-21 22:20:29,898 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.257e+01 2.673e+01 2.825e+01 3.002e+01 3.457e+01, threshold=5.650e+01, percent-clipped=0.0 2023-12-21 22:20:31,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=260786.66666666666, ans=0.125 2023-12-21 22:20:33,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=260786.66666666666, ans=0.125 2023-12-21 22:20:39,997 INFO [train.py:886] (1/4) Epoch 9, batch 1000, loss[loss=0.01582, audio_tagging_loss=0.01582, over 22299.00 frames. ], tot_loss[loss=0.01564, audio_tagging_loss=0.01564, over 4908965.25 frames. ], batch size: 107, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:20:49,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=260920.0, ans=0.0 2023-12-21 22:20:57,013 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.15 vs. limit=15.0 2023-12-21 22:21:02,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=260986.66666666666, ans=0.125 2023-12-21 22:21:26,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=261120.0, ans=0.1 2023-12-21 22:21:28,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=261120.0, ans=0.2 2023-12-21 22:21:32,039 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.53 vs. limit=6.0 2023-12-21 22:21:32,222 INFO [train.py:886] (1/4) Epoch 9, batch 1050, loss[loss=0.01515, audio_tagging_loss=0.01515, over 25000.00 frames. ], tot_loss[loss=0.01557, audio_tagging_loss=0.01557, over 4923544.58 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:22:11,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.68 vs. limit=15.0 2023-12-21 22:22:13,860 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 2.578e+01 2.720e+01 2.898e+01 3.660e+01, threshold=5.440e+01, percent-clipped=0.0 2023-12-21 22:22:23,393 INFO [train.py:886] (1/4) Epoch 9, batch 1100, loss[loss=0.01386, audio_tagging_loss=0.01386, over 25000.00 frames. ], tot_loss[loss=0.01546, audio_tagging_loss=0.01546, over 4925139.43 frames. ], batch size: 100, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:22:27,486 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.38 vs. limit=15.0 2023-12-21 22:22:31,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=261520.0, ans=0.025 2023-12-21 22:23:06,434 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.37 vs. limit=12.0 2023-12-21 22:23:16,840 INFO [train.py:886] (1/4) Epoch 9, batch 1150, loss[loss=0.01683, audio_tagging_loss=0.01683, over 25000.00 frames. ], tot_loss[loss=0.01551, audio_tagging_loss=0.01551, over 4923705.48 frames. ], batch size: 100, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:23:22,121 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.13 vs. limit=22.5 2023-12-21 22:23:26,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=261920.0, ans=0.0 2023-12-21 22:23:29,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=261920.0, ans=0.0 2023-12-21 22:23:48,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=262053.33333333334, ans=0.025 2023-12-21 22:23:57,573 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.233e+01 2.612e+01 2.792e+01 2.985e+01 3.661e+01, threshold=5.583e+01, percent-clipped=0.0 2023-12-21 22:23:58,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=262120.0, ans=0.125 2023-12-21 22:24:07,928 INFO [train.py:886] (1/4) Epoch 9, batch 1200, loss[loss=0.01863, audio_tagging_loss=0.01863, over 24940.00 frames. ], tot_loss[loss=0.01551, audio_tagging_loss=0.01551, over 4935657.30 frames. ], batch size: 100, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:24:10,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=262186.6666666667, ans=0.125 2023-12-21 22:24:19,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=262253.3333333333, ans=0.0 2023-12-21 22:24:24,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=262253.3333333333, ans=0.125 2023-12-21 22:24:25,810 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.28 vs. limit=15.0 2023-12-21 22:24:34,369 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.54 vs. limit=15.0 2023-12-21 22:24:35,181 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.90 vs. limit=15.0 2023-12-21 22:24:36,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=262386.6666666667, ans=0.125 2023-12-21 22:24:48,361 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.79 vs. limit=15.0 2023-12-21 22:24:58,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=262520.0, ans=0.125 2023-12-21 22:24:59,132 INFO [train.py:886] (1/4) Epoch 9, batch 1250, loss[loss=0.01452, audio_tagging_loss=0.01452, over 21157.00 frames. ], tot_loss[loss=0.01556, audio_tagging_loss=0.01556, over 4930302.54 frames. ], batch size: 107, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:24:59,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=262520.0, ans=0.1 2023-12-21 22:25:02,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=262520.0, ans=0.0 2023-12-21 22:25:04,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=262520.0, ans=0.05 2023-12-21 22:25:06,089 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.11 vs. limit=15.0 2023-12-21 22:25:25,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=262653.3333333333, ans=0.0 2023-12-21 22:25:39,891 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.421e+01 2.627e+01 2.793e+01 3.069e+01 3.716e+01, threshold=5.586e+01, percent-clipped=0.0 2023-12-21 22:25:40,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=262786.6666666667, ans=0.2 2023-12-21 22:25:45,622 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.40 vs. limit=22.5 2023-12-21 22:25:52,106 INFO [train.py:886] (1/4) Epoch 9, batch 1300, loss[loss=0.01296, audio_tagging_loss=0.01296, over 25000.00 frames. ], tot_loss[loss=0.01559, audio_tagging_loss=0.01559, over 4934770.55 frames. ], batch size: 100, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:26:18,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=262986.6666666667, ans=0.125 2023-12-21 22:26:22,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=263053.3333333333, ans=0.125 2023-12-21 22:26:22,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=263053.3333333333, ans=0.125 2023-12-21 22:26:24,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=263053.3333333333, ans=0.1 2023-12-21 22:26:39,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=263120.0, ans=0.125 2023-12-21 22:26:42,326 INFO [train.py:886] (1/4) Epoch 9, batch 1350, loss[loss=0.01953, audio_tagging_loss=0.01953, over 24938.00 frames. ], tot_loss[loss=0.01568, audio_tagging_loss=0.01568, over 4938331.22 frames. ], batch size: 100, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:26:47,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=263186.6666666667, ans=0.1 2023-12-21 22:26:48,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.67 vs. limit=15.0 2023-12-21 22:26:53,415 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.63 vs. limit=15.0 2023-12-21 22:27:00,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=263253.3333333333, ans=0.0 2023-12-21 22:27:08,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=263320.0, ans=0.2 2023-12-21 22:27:12,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=263320.0, ans=0.125 2023-12-21 22:27:25,417 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.110e+01 2.597e+01 2.714e+01 2.927e+01 3.624e+01, threshold=5.427e+01, percent-clipped=0.0 2023-12-21 22:27:28,375 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.68 vs. limit=15.0 2023-12-21 22:27:31,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=263453.3333333333, ans=0.125 2023-12-21 22:27:35,545 INFO [train.py:886] (1/4) Epoch 9, batch 1400, loss[loss=0.01345, audio_tagging_loss=0.01345, over 25000.00 frames. ], tot_loss[loss=0.01556, audio_tagging_loss=0.01556, over 4942567.29 frames. ], batch size: 100, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:27:43,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=263520.0, ans=0.07 2023-12-21 22:27:48,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=263586.6666666667, ans=0.125 2023-12-21 22:27:52,038 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.90 vs. limit=6.0 2023-12-21 22:27:52,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=263586.6666666667, ans=0.0 2023-12-21 22:28:07,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=263720.0, ans=0.1 2023-12-21 22:28:11,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=263720.0, ans=0.0 2023-12-21 22:28:12,966 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.62 vs. limit=15.0 2023-12-21 22:28:14,766 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.23 vs. limit=15.0 2023-12-21 22:28:23,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=263786.6666666667, ans=0.0 2023-12-21 22:28:26,532 INFO [train.py:886] (1/4) Epoch 9, batch 1450, loss[loss=0.01442, audio_tagging_loss=0.01442, over 25000.00 frames. ], tot_loss[loss=0.01557, audio_tagging_loss=0.01557, over 4949812.31 frames. ], batch size: 100, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:28:32,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=263853.3333333333, ans=0.1 2023-12-21 22:28:34,996 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=12.0 2023-12-21 22:28:41,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=263920.0, ans=0.125 2023-12-21 22:28:43,332 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 22:28:59,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=264053.3333333333, ans=0.125 2023-12-21 22:29:08,305 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.293e+01 2.534e+01 2.742e+01 2.913e+01 3.478e+01, threshold=5.484e+01, percent-clipped=0.0 2023-12-21 22:29:14,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=264120.0, ans=0.125 2023-12-21 22:29:17,815 INFO [train.py:886] (1/4) Epoch 9, batch 1500, loss[loss=0.01242, audio_tagging_loss=0.01242, over 25000.00 frames. ], tot_loss[loss=0.01553, audio_tagging_loss=0.01553, over 4944938.03 frames. ], batch size: 100, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:29:18,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=264186.6666666667, ans=0.125 2023-12-21 22:29:22,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=264186.6666666667, ans=0.125 2023-12-21 22:29:27,531 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 22:29:30,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=264253.3333333333, ans=0.0 2023-12-21 22:29:31,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=264253.3333333333, ans=0.125 2023-12-21 22:30:10,242 INFO [train.py:886] (1/4) Epoch 9, batch 1550, loss[loss=0.01518, audio_tagging_loss=0.01518, over 24750.00 frames. ], tot_loss[loss=0.01562, audio_tagging_loss=0.01562, over 4946753.07 frames. ], batch size: 99, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:30:14,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=264520.0, ans=0.95 2023-12-21 22:30:17,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=264520.0, ans=0.2 2023-12-21 22:30:39,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.13 vs. limit=15.0 2023-12-21 22:30:51,442 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.326e+01 2.715e+01 2.913e+01 3.058e+01 3.707e+01, threshold=5.826e+01, percent-clipped=0.0 2023-12-21 22:30:55,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=264786.6666666667, ans=0.1 2023-12-21 22:31:00,777 INFO [train.py:886] (1/4) Epoch 9, batch 1600, loss[loss=0.01318, audio_tagging_loss=0.01318, over 25000.00 frames. ], tot_loss[loss=0.01564, audio_tagging_loss=0.01564, over 4941686.00 frames. ], batch size: 100, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:31:08,449 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.467e-02 2023-12-21 22:31:27,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=264986.6666666667, ans=0.125 2023-12-21 22:31:27,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=264986.6666666667, ans=0.2 2023-12-21 22:31:32,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=265053.3333333333, ans=0.0 2023-12-21 22:31:54,078 INFO [train.py:886] (1/4) Epoch 9, batch 1650, loss[loss=0.01533, audio_tagging_loss=0.01533, over 25000.00 frames. ], tot_loss[loss=0.01563, audio_tagging_loss=0.01563, over 4941662.41 frames. ], batch size: 100, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:31:59,301 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.91 vs. limit=12.0 2023-12-21 22:32:02,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=265253.3333333333, ans=0.0 2023-12-21 22:32:22,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=265320.0, ans=0.025 2023-12-21 22:32:24,473 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.41 vs. limit=8.0 2023-12-21 22:32:27,077 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.18 vs. limit=15.0 2023-12-21 22:32:28,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=265386.6666666667, ans=0.95 2023-12-21 22:32:28,837 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.94 vs. limit=22.5 2023-12-21 22:32:35,014 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.351e+01 2.651e+01 2.793e+01 3.063e+01 3.586e+01, threshold=5.586e+01, percent-clipped=0.0 2023-12-21 22:32:46,502 INFO [train.py:886] (1/4) Epoch 9, batch 1700, loss[loss=0.0118, audio_tagging_loss=0.0118, over 25000.00 frames. ], tot_loss[loss=0.01554, audio_tagging_loss=0.01554, over 4947109.50 frames. ], batch size: 100, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:33:19,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=265720.0, ans=0.0 2023-12-21 22:33:24,827 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=4.823e+00 2023-12-21 22:33:31,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=265786.6666666667, ans=0.0 2023-12-21 22:33:35,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=265786.6666666667, ans=0.0 2023-12-21 22:33:37,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.09 vs. limit=22.5 2023-12-21 22:33:37,682 INFO [train.py:886] (1/4) Epoch 9, batch 1750, loss[loss=0.01484, audio_tagging_loss=0.01484, over 25000.00 frames. ], tot_loss[loss=0.01537, audio_tagging_loss=0.01537, over 4947314.12 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 32.0 2023-12-21 22:33:52,672 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.37 vs. limit=15.0 2023-12-21 22:34:03,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=265986.6666666667, ans=0.0 2023-12-21 22:34:05,622 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=28.71 vs. limit=22.5 2023-12-21 22:34:09,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=266053.3333333333, ans=0.0 2023-12-21 22:34:09,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=266053.3333333333, ans=0.0 2023-12-21 22:34:10,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=266053.3333333333, ans=0.125 2023-12-21 22:34:16,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=266053.3333333333, ans=0.125 2023-12-21 22:34:19,240 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.219e+01 2.611e+01 2.763e+01 2.973e+01 3.566e+01, threshold=5.527e+01, percent-clipped=0.0 2023-12-21 22:34:30,207 INFO [train.py:886] (1/4) Epoch 9, batch 1800, loss[loss=0.01737, audio_tagging_loss=0.01737, over 25000.00 frames. ], tot_loss[loss=0.01534, audio_tagging_loss=0.01534, over 4945416.92 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 32.0 2023-12-21 22:34:34,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=266186.6666666667, ans=0.1 2023-12-21 22:34:39,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=266253.3333333333, ans=0.0 2023-12-21 22:34:45,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=266253.3333333333, ans=0.0 2023-12-21 22:34:46,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=266253.3333333333, ans=0.0 2023-12-21 22:35:04,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=266386.6666666667, ans=0.2 2023-12-21 22:35:11,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=266453.3333333333, ans=0.1 2023-12-21 22:35:15,162 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.38 vs. limit=15.0 2023-12-21 22:35:21,923 INFO [train.py:886] (1/4) Epoch 9, batch 1850, loss[loss=0.01961, audio_tagging_loss=0.01961, over 24957.00 frames. ], tot_loss[loss=0.01551, audio_tagging_loss=0.01551, over 4948626.94 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 32.0 2023-12-21 22:35:24,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=266520.0, ans=0.0 2023-12-21 22:35:25,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=266520.0, ans=0.2 2023-12-21 22:35:32,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=266586.6666666667, ans=0.0 2023-12-21 22:35:51,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=266653.3333333333, ans=0.09899494936611666 2023-12-21 22:35:51,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=266653.3333333333, ans=0.2 2023-12-21 22:35:52,138 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.46 vs. limit=10.0 2023-12-21 22:35:56,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=266720.0, ans=0.0 2023-12-21 22:36:03,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=266720.0, ans=0.125 2023-12-21 22:36:05,788 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.271e+01 2.665e+01 2.807e+01 3.010e+01 3.699e+01, threshold=5.613e+01, percent-clipped=0.0 2023-12-21 22:36:07,215 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.62 vs. limit=15.0 2023-12-21 22:36:07,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=266786.6666666667, ans=0.125 2023-12-21 22:36:16,024 INFO [train.py:886] (1/4) Epoch 9, batch 1900, loss[loss=0.01539, audio_tagging_loss=0.01539, over 24750.00 frames. ], tot_loss[loss=0.01568, audio_tagging_loss=0.01568, over 4944224.26 frames. ], batch size: 99, lr: 1.22e-02, grad_scale: 32.0 2023-12-21 22:36:17,564 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.68 vs. limit=15.0 2023-12-21 22:36:19,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=266853.3333333333, ans=0.0 2023-12-21 22:36:21,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=266853.3333333333, ans=0.125 2023-12-21 22:36:23,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=266853.3333333333, ans=0.0 2023-12-21 22:36:42,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=266986.6666666667, ans=0.0 2023-12-21 22:36:45,846 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.18 vs. limit=15.0 2023-12-21 22:36:57,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=267120.0, ans=0.0 2023-12-21 22:37:06,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=267120.0, ans=0.07 2023-12-21 22:37:08,007 INFO [train.py:886] (1/4) Epoch 9, batch 1950, loss[loss=0.01704, audio_tagging_loss=0.01704, over 21655.00 frames. ], tot_loss[loss=0.01555, audio_tagging_loss=0.01555, over 4938382.45 frames. ], batch size: 107, lr: 1.22e-02, grad_scale: 32.0 2023-12-21 22:37:27,778 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.08 vs. limit=15.0 2023-12-21 22:37:36,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=267320.0, ans=0.0 2023-12-21 22:37:47,986 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.225e+01 2.613e+01 2.746e+01 2.938e+01 3.371e+01, threshold=5.492e+01, percent-clipped=0.0 2023-12-21 22:37:54,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=267453.3333333333, ans=0.2 2023-12-21 22:37:56,710 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.38 vs. limit=15.0 2023-12-21 22:37:58,873 INFO [train.py:886] (1/4) Epoch 9, batch 2000, loss[loss=0.01647, audio_tagging_loss=0.01647, over 25000.00 frames. ], tot_loss[loss=0.01547, audio_tagging_loss=0.01547, over 4943635.37 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 32.0 2023-12-21 22:38:07,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=267520.0, ans=0.0 2023-12-21 22:38:18,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=267653.3333333333, ans=0.02 2023-12-21 22:38:28,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=267720.0, ans=0.0 2023-12-21 22:38:35,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=267720.0, ans=0.0 2023-12-21 22:38:42,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=267786.6666666667, ans=0.125 2023-12-21 22:38:47,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=267786.6666666667, ans=0.1 2023-12-21 22:38:50,013 INFO [train.py:886] (1/4) Epoch 9, batch 2050, loss[loss=0.01567, audio_tagging_loss=0.01567, over 22039.00 frames. ], tot_loss[loss=0.01542, audio_tagging_loss=0.01542, over 4950234.29 frames. ], batch size: 107, lr: 1.22e-02, grad_scale: 64.0 2023-12-21 22:38:51,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=267853.3333333333, ans=0.125 2023-12-21 22:39:00,709 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=25.01 vs. limit=22.5 2023-12-21 22:39:01,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=267920.0, ans=0.125 2023-12-21 22:39:10,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=267986.6666666667, ans=0.125 2023-12-21 22:39:19,302 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.70 vs. limit=15.0 2023-12-21 22:39:21,228 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.54 vs. limit=15.0 2023-12-21 22:39:25,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=268053.3333333333, ans=0.125 2023-12-21 22:39:30,914 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.279e+01 2.585e+01 2.736e+01 2.898e+01 3.379e+01, threshold=5.472e+01, percent-clipped=0.0 2023-12-21 22:39:41,180 INFO [train.py:886] (1/4) Epoch 9, batch 2100, loss[loss=0.0133, audio_tagging_loss=0.0133, over 25000.00 frames. ], tot_loss[loss=0.0154, audio_tagging_loss=0.0154, over 4948018.31 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 64.0 2023-12-21 22:40:17,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=268386.6666666667, ans=0.2 2023-12-21 22:40:17,832 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.29 vs. limit=22.5 2023-12-21 22:40:32,296 INFO [train.py:886] (1/4) Epoch 9, batch 2150, loss[loss=0.0159, audio_tagging_loss=0.0159, over 25000.00 frames. ], tot_loss[loss=0.01555, audio_tagging_loss=0.01555, over 4948473.27 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 64.0 2023-12-21 22:41:14,319 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.39 vs. limit=22.5 2023-12-21 22:41:14,769 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.269e+01 2.638e+01 2.787e+01 2.989e+01 3.388e+01, threshold=5.573e+01, percent-clipped=0.0 2023-12-21 22:41:25,795 INFO [train.py:886] (1/4) Epoch 9, batch 2200, loss[loss=0.01534, audio_tagging_loss=0.01534, over 24750.00 frames. ], tot_loss[loss=0.0156, audio_tagging_loss=0.0156, over 4946150.93 frames. ], batch size: 99, lr: 1.22e-02, grad_scale: 64.0 2023-12-21 22:41:26,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=268853.3333333333, ans=0.2 2023-12-21 22:41:54,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=268986.6666666667, ans=0.125 2023-12-21 22:41:59,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=269053.3333333333, ans=0.0 2023-12-21 22:42:00,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=269053.3333333333, ans=0.0 2023-12-21 22:42:01,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=269053.3333333333, ans=0.0 2023-12-21 22:42:01,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=269053.3333333333, ans=0.125 2023-12-21 22:42:02,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=269053.3333333333, ans=0.015 2023-12-21 22:42:11,570 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.05 vs. limit=22.5 2023-12-21 22:42:12,316 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2023-12-21 22:42:17,192 INFO [train.py:886] (1/4) Epoch 9, batch 2250, loss[loss=0.01651, audio_tagging_loss=0.01651, over 24750.00 frames. ], tot_loss[loss=0.01563, audio_tagging_loss=0.01563, over 4941617.36 frames. ], batch size: 99, lr: 1.22e-02, grad_scale: 64.0 2023-12-21 22:42:30,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=269253.3333333333, ans=0.125 2023-12-21 22:42:44,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=269320.0, ans=0.05 2023-12-21 22:42:48,583 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.37 vs. limit=15.0 2023-12-21 22:42:58,353 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.331e+01 2.625e+01 2.827e+01 2.998e+01 3.600e+01, threshold=5.653e+01, percent-clipped=0.0 2023-12-21 22:43:07,848 INFO [train.py:886] (1/4) Epoch 9, batch 2300, loss[loss=0.01901, audio_tagging_loss=0.01901, over 24750.00 frames. ], tot_loss[loss=0.01554, audio_tagging_loss=0.01554, over 4940062.01 frames. ], batch size: 99, lr: 1.22e-02, grad_scale: 64.0 2023-12-21 22:43:20,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=269586.6666666667, ans=0.0 2023-12-21 22:43:23,117 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.58 vs. limit=15.0 2023-12-21 22:43:29,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=269653.3333333333, ans=0.0 2023-12-21 22:43:32,890 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.00 vs. limit=15.0 2023-12-21 22:43:51,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=269786.6666666667, ans=0.0 2023-12-21 22:43:57,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=269786.6666666667, ans=0.125 2023-12-21 22:44:01,072 INFO [train.py:886] (1/4) Epoch 9, batch 2350, loss[loss=0.01803, audio_tagging_loss=0.01803, over 25000.00 frames. ], tot_loss[loss=0.01534, audio_tagging_loss=0.01534, over 4943391.88 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 64.0 2023-12-21 22:44:01,487 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.98 vs. limit=22.5 2023-12-21 22:44:18,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=269920.0, ans=0.95 2023-12-21 22:44:36,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=270053.3333333333, ans=0.0 2023-12-21 22:44:41,222 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.286e+01 2.563e+01 2.767e+01 2.942e+01 3.408e+01, threshold=5.535e+01, percent-clipped=0.0 2023-12-21 22:44:50,841 INFO [train.py:886] (1/4) Epoch 9, batch 2400, loss[loss=0.0146, audio_tagging_loss=0.0146, over 25000.00 frames. ], tot_loss[loss=0.01527, audio_tagging_loss=0.01527, over 4949009.33 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 64.0 2023-12-21 22:44:55,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=270186.6666666667, ans=0.125 2023-12-21 22:44:57,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=270186.6666666667, ans=0.0 2023-12-21 22:45:15,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=270320.0, ans=0.1 2023-12-21 22:45:30,101 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.33 vs. limit=15.0 2023-12-21 22:45:42,468 INFO [train.py:886] (1/4) Epoch 9, batch 2450, loss[loss=0.01326, audio_tagging_loss=0.01326, over 25000.00 frames. ], tot_loss[loss=0.01534, audio_tagging_loss=0.01534, over 4958670.00 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:45:50,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=270520.0, ans=0.015 2023-12-21 22:45:50,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=270520.0, ans=0.1 2023-12-21 22:45:51,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=270586.6666666667, ans=0.0 2023-12-21 22:46:07,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=270653.3333333333, ans=0.125 2023-12-21 22:46:08,891 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.07 vs. limit=22.5 2023-12-21 22:46:09,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.07 vs. limit=15.0 2023-12-21 22:46:10,826 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.24 vs. limit=22.5 2023-12-21 22:46:16,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=270720.0, ans=0.0 2023-12-21 22:46:20,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=270720.0, ans=0.125 2023-12-21 22:46:22,745 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.269e+01 2.728e+01 2.875e+01 2.985e+01 3.809e+01, threshold=5.751e+01, percent-clipped=0.0 2023-12-21 22:46:30,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=270786.6666666667, ans=0.125 2023-12-21 22:46:33,016 INFO [train.py:886] (1/4) Epoch 9, batch 2500, loss[loss=0.01848, audio_tagging_loss=0.01848, over 25000.00 frames. ], tot_loss[loss=0.01547, audio_tagging_loss=0.01547, over 4958620.96 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:47:16,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=271120.0, ans=0.035 2023-12-21 22:47:25,477 INFO [train.py:886] (1/4) Epoch 9, batch 2550, loss[loss=0.01615, audio_tagging_loss=0.01615, over 25000.00 frames. ], tot_loss[loss=0.01557, audio_tagging_loss=0.01557, over 4955730.98 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:47:34,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=271186.6666666667, ans=0.125 2023-12-21 22:47:35,189 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2023-12-21 22:47:39,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=271253.3333333333, ans=0.125 2023-12-21 22:48:07,013 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.241e+01 2.663e+01 2.770e+01 2.998e+01 3.753e+01, threshold=5.540e+01, percent-clipped=0.0 2023-12-21 22:48:17,925 INFO [train.py:886] (1/4) Epoch 9, batch 2600, loss[loss=0.01614, audio_tagging_loss=0.01614, over 24750.00 frames. ], tot_loss[loss=0.01543, audio_tagging_loss=0.01543, over 4952359.51 frames. ], batch size: 99, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:48:27,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=271586.6666666667, ans=0.1 2023-12-21 22:48:27,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=271586.6666666667, ans=0.0 2023-12-21 22:48:34,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=271586.6666666667, ans=0.125 2023-12-21 22:48:50,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=271720.0, ans=0.125 2023-12-21 22:48:57,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=271720.0, ans=0.1 2023-12-21 22:49:05,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=271786.6666666667, ans=0.0 2023-12-21 22:49:09,002 INFO [train.py:886] (1/4) Epoch 9, batch 2650, loss[loss=0.01507, audio_tagging_loss=0.01507, over 25000.00 frames. ], tot_loss[loss=0.01537, audio_tagging_loss=0.01537, over 4948471.48 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:49:13,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=271853.3333333333, ans=0.125 2023-12-21 22:49:22,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=271920.0, ans=0.0 2023-12-21 22:49:23,453 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.00 vs. limit=22.5 2023-12-21 22:49:47,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=272053.3333333333, ans=0.02 2023-12-21 22:49:51,071 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.302e+01 2.584e+01 2.691e+01 2.849e+01 3.428e+01, threshold=5.381e+01, percent-clipped=0.0 2023-12-21 22:49:52,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=272120.0, ans=0.5 2023-12-21 22:50:00,595 INFO [train.py:886] (1/4) Epoch 9, batch 2700, loss[loss=0.01706, audio_tagging_loss=0.01706, over 25000.00 frames. ], tot_loss[loss=0.01527, audio_tagging_loss=0.01527, over 4955888.67 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:50:03,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=272186.6666666667, ans=0.0 2023-12-21 22:50:27,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=272320.0, ans=0.125 2023-12-21 22:50:28,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=272320.0, ans=0.2 2023-12-21 22:50:32,407 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.84 vs. limit=15.0 2023-12-21 22:50:50,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=272520.0, ans=0.125 2023-12-21 22:50:50,706 INFO [train.py:886] (1/4) Epoch 9, batch 2750, loss[loss=0.01235, audio_tagging_loss=0.01235, over 25000.00 frames. ], tot_loss[loss=0.01526, audio_tagging_loss=0.01526, over 4954177.31 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:50:55,192 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 22:50:55,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=272520.0, ans=0.125 2023-12-21 22:50:59,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=272520.0, ans=0.05 2023-12-21 22:51:04,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=272586.6666666667, ans=0.0 2023-12-21 22:51:06,559 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.98 vs. limit=12.0 2023-12-21 22:51:08,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=272586.6666666667, ans=0.125 2023-12-21 22:51:15,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=272653.3333333333, ans=0.125 2023-12-21 22:51:16,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=272653.3333333333, ans=0.0 2023-12-21 22:51:29,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=272720.0, ans=0.1 2023-12-21 22:51:33,395 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.337e+01 2.590e+01 2.730e+01 2.864e+01 3.266e+01, threshold=5.459e+01, percent-clipped=0.0 2023-12-21 22:51:42,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=272853.3333333333, ans=0.125 2023-12-21 22:51:43,037 INFO [train.py:886] (1/4) Epoch 9, batch 2800, loss[loss=0.0145, audio_tagging_loss=0.0145, over 24750.00 frames. ], tot_loss[loss=0.01539, audio_tagging_loss=0.01539, over 4952629.65 frames. ], batch size: 99, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:51:44,621 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.10 vs. limit=22.5 2023-12-21 22:51:54,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=272920.0, ans=0.125 2023-12-21 22:52:11,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=272986.6666666667, ans=0.125 2023-12-21 22:52:15,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=273053.3333333333, ans=0.2 2023-12-21 22:52:35,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=273186.6666666667, ans=0.0 2023-12-21 22:52:36,151 INFO [train.py:886] (1/4) Epoch 9, batch 2850, loss[loss=0.01907, audio_tagging_loss=0.01907, over 24750.00 frames. ], tot_loss[loss=0.01553, audio_tagging_loss=0.01553, over 4942299.92 frames. ], batch size: 99, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:52:47,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=273253.3333333333, ans=0.125 2023-12-21 22:52:47,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=273253.3333333333, ans=0.125 2023-12-21 22:53:11,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=273386.6666666667, ans=0.125 2023-12-21 22:53:17,486 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.396e+01 2.634e+01 2.788e+01 2.936e+01 3.853e+01, threshold=5.577e+01, percent-clipped=0.0 2023-12-21 22:53:20,925 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.89 vs. limit=12.0 2023-12-21 22:53:21,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=273453.3333333333, ans=0.125 2023-12-21 22:53:27,618 INFO [train.py:886] (1/4) Epoch 9, batch 2900, loss[loss=0.01481, audio_tagging_loss=0.01481, over 25000.00 frames. ], tot_loss[loss=0.01544, audio_tagging_loss=0.01544, over 4941271.30 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:53:47,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=273586.6666666667, ans=0.2 2023-12-21 22:53:51,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=273653.3333333333, ans=0.125 2023-12-21 22:53:56,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=273653.3333333333, ans=0.125 2023-12-21 22:53:58,756 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.13 vs. limit=22.5 2023-12-21 22:54:01,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=273720.0, ans=0.1 2023-12-21 22:54:11,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=273786.6666666667, ans=0.125 2023-12-21 22:54:18,649 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.81 vs. limit=22.5 2023-12-21 22:54:19,717 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.02 vs. limit=22.5 2023-12-21 22:54:20,013 INFO [train.py:886] (1/4) Epoch 9, batch 2950, loss[loss=0.01715, audio_tagging_loss=0.01715, over 25000.00 frames. ], tot_loss[loss=0.01542, audio_tagging_loss=0.01542, over 4950011.99 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:54:28,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=273920.0, ans=0.035 2023-12-21 22:54:34,298 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.57 vs. limit=15.0 2023-12-21 22:54:37,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=273920.0, ans=0.05 2023-12-21 22:54:40,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=273986.6666666667, ans=0.0 2023-12-21 22:55:00,637 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.328e+01 2.696e+01 2.843e+01 2.959e+01 3.331e+01, threshold=5.685e+01, percent-clipped=0.0 2023-12-21 22:55:09,937 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.55 vs. limit=15.0 2023-12-21 22:55:12,164 INFO [train.py:886] (1/4) Epoch 9, batch 3000, loss[loss=0.01609, audio_tagging_loss=0.01609, over 25000.00 frames. ], tot_loss[loss=0.0154, audio_tagging_loss=0.0154, over 4955934.09 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:55:12,165 INFO [train.py:909] (1/4) Computing validation loss 2023-12-21 22:55:33,475 INFO [train.py:917] (1/4) Epoch 9, validation: loss=0.03523, audio_tagging_loss=0.03523, over 3737520.00 frames. 2023-12-21 22:55:33,475 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-21 22:55:43,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=274253.3333333333, ans=0.125 2023-12-21 22:55:44,128 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.51 vs. limit=6.0 2023-12-21 22:55:51,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=274253.3333333333, ans=0.0 2023-12-21 22:56:25,465 INFO [train.py:886] (1/4) Epoch 9, batch 3050, loss[loss=0.01338, audio_tagging_loss=0.01338, over 25000.00 frames. ], tot_loss[loss=0.01532, audio_tagging_loss=0.01532, over 4959053.35 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:56:49,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=274653.3333333333, ans=0.1 2023-12-21 22:56:51,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=274653.3333333333, ans=0.0 2023-12-21 22:56:56,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=274720.0, ans=0.125 2023-12-21 22:56:56,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=274720.0, ans=6.0 2023-12-21 22:57:06,216 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.305e+01 2.681e+01 2.830e+01 3.002e+01 4.084e+01, threshold=5.659e+01, percent-clipped=0.0 2023-12-21 22:57:11,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=274786.6666666667, ans=0.125 2023-12-21 22:57:17,845 INFO [train.py:886] (1/4) Epoch 9, batch 3100, loss[loss=0.01585, audio_tagging_loss=0.01585, over 24750.00 frames. ], tot_loss[loss=0.01541, audio_tagging_loss=0.01541, over 4957482.61 frames. ], batch size: 99, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:57:27,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=274920.0, ans=0.125 2023-12-21 22:57:29,008 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.85 vs. limit=15.0 2023-12-21 22:57:50,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=275053.3333333333, ans=0.0 2023-12-21 22:57:54,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=275053.3333333333, ans=15.0 2023-12-21 22:58:01,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=275120.0, ans=0.2 2023-12-21 22:58:03,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=275120.0, ans=0.0 2023-12-21 22:58:08,826 INFO [train.py:886] (1/4) Epoch 9, batch 3150, loss[loss=0.01279, audio_tagging_loss=0.01279, over 24750.00 frames. ], tot_loss[loss=0.01539, audio_tagging_loss=0.01539, over 4946503.15 frames. ], batch size: 99, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 22:58:16,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=275186.6666666667, ans=0.125 2023-12-21 22:58:16,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=275186.6666666667, ans=0.125 2023-12-21 22:58:18,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=275253.3333333333, ans=0.1 2023-12-21 22:58:20,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=275253.3333333333, ans=0.0 2023-12-21 22:58:44,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=275386.6666666667, ans=0.125 2023-12-21 22:58:50,921 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.280e+01 2.623e+01 2.777e+01 2.975e+01 3.499e+01, threshold=5.555e+01, percent-clipped=0.0 2023-12-21 22:59:00,384 INFO [train.py:886] (1/4) Epoch 9, batch 3200, loss[loss=0.01406, audio_tagging_loss=0.01406, over 24750.00 frames. ], tot_loss[loss=0.01538, audio_tagging_loss=0.01538, over 4944444.12 frames. ], batch size: 99, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 22:59:14,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=275586.6666666667, ans=0.0 2023-12-21 22:59:23,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=275653.3333333333, ans=0.0 2023-12-21 22:59:27,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=275653.3333333333, ans=0.125 2023-12-21 22:59:50,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=275786.6666666667, ans=0.1 2023-12-21 22:59:51,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=275786.6666666667, ans=0.125 2023-12-21 22:59:53,488 INFO [train.py:886] (1/4) Epoch 9, batch 3250, loss[loss=0.01269, audio_tagging_loss=0.01269, over 24750.00 frames. ], tot_loss[loss=0.01537, audio_tagging_loss=0.01537, over 4943334.42 frames. ], batch size: 99, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 22:59:53,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=275853.3333333333, ans=0.1 2023-12-21 23:00:14,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.33 vs. limit=22.5 2023-12-21 23:00:19,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=275986.6666666667, ans=0.125 2023-12-21 23:00:24,490 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.81 vs. limit=10.0 2023-12-21 23:00:33,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=276120.0, ans=0.0 2023-12-21 23:00:34,340 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.303e+01 2.607e+01 2.752e+01 2.914e+01 3.525e+01, threshold=5.504e+01, percent-clipped=0.0 2023-12-21 23:00:34,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=276120.0, ans=0.125 2023-12-21 23:00:42,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=276186.6666666667, ans=0.2 2023-12-21 23:00:43,810 INFO [train.py:886] (1/4) Epoch 9, batch 3300, loss[loss=0.01412, audio_tagging_loss=0.01412, over 25000.00 frames. ], tot_loss[loss=0.01533, audio_tagging_loss=0.01533, over 4949096.74 frames. ], batch size: 100, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:00:51,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=276186.6666666667, ans=0.2 2023-12-21 23:00:51,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.13 vs. limit=15.0 2023-12-21 23:01:07,233 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.64 vs. limit=6.0 2023-12-21 23:01:19,964 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 23:01:33,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=276453.3333333333, ans=0.125 2023-12-21 23:01:36,163 INFO [train.py:886] (1/4) Epoch 9, batch 3350, loss[loss=0.01595, audio_tagging_loss=0.01595, over 24750.00 frames. ], tot_loss[loss=0.01536, audio_tagging_loss=0.01536, over 4955684.29 frames. ], batch size: 99, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:02:13,445 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.23 vs. limit=15.0 2023-12-21 23:02:16,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=276786.6666666667, ans=0.125 2023-12-21 23:02:16,738 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.258e+01 2.658e+01 2.793e+01 2.957e+01 4.433e+01, threshold=5.585e+01, percent-clipped=0.0 2023-12-21 23:02:25,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=276853.3333333333, ans=0.0 2023-12-21 23:02:26,909 INFO [train.py:886] (1/4) Epoch 9, batch 3400, loss[loss=0.01443, audio_tagging_loss=0.01443, over 24750.00 frames. ], tot_loss[loss=0.0154, audio_tagging_loss=0.0154, over 4956550.72 frames. ], batch size: 99, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:02:27,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=276853.3333333333, ans=0.125 2023-12-21 23:02:32,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=276853.3333333333, ans=0.95 2023-12-21 23:02:38,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=276920.0, ans=10.0 2023-12-21 23:02:40,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=276920.0, ans=0.2 2023-12-21 23:02:51,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=276986.6666666667, ans=0.125 2023-12-21 23:03:10,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=277120.0, ans=0.125 2023-12-21 23:03:14,848 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.14 vs. limit=10.0 2023-12-21 23:03:16,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=277120.0, ans=12.0 2023-12-21 23:03:19,109 INFO [train.py:886] (1/4) Epoch 9, batch 3450, loss[loss=0.01469, audio_tagging_loss=0.01469, over 24750.00 frames. ], tot_loss[loss=0.0155, audio_tagging_loss=0.0155, over 4945351.73 frames. ], batch size: 99, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:03:27,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=277253.3333333333, ans=0.125 2023-12-21 23:03:28,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=277253.3333333333, ans=0.025 2023-12-21 23:03:29,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=277253.3333333333, ans=0.125 2023-12-21 23:03:40,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=277320.0, ans=0.125 2023-12-21 23:03:53,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=277386.6666666667, ans=0.125 2023-12-21 23:03:53,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=277386.6666666667, ans=0.0 2023-12-21 23:03:53,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=277386.6666666667, ans=0.125 2023-12-21 23:03:58,843 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.66 vs. limit=10.0 2023-12-21 23:03:59,344 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.257e+01 2.682e+01 2.834e+01 2.983e+01 3.671e+01, threshold=5.669e+01, percent-clipped=0.0 2023-12-21 23:04:04,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=277453.3333333333, ans=0.125 2023-12-21 23:04:08,708 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.77 vs. limit=22.5 2023-12-21 23:04:10,953 INFO [train.py:886] (1/4) Epoch 9, batch 3500, loss[loss=0.016, audio_tagging_loss=0.016, over 22016.00 frames. ], tot_loss[loss=0.01556, audio_tagging_loss=0.01556, over 4941883.31 frames. ], batch size: 107, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:04:24,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=277586.6666666667, ans=0.125 2023-12-21 23:04:38,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=277653.3333333333, ans=0.2 2023-12-21 23:04:38,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=277653.3333333333, ans=0.0 2023-12-21 23:04:38,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=277653.3333333333, ans=0.0 2023-12-21 23:04:54,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=277786.6666666667, ans=15.0 2023-12-21 23:05:00,639 INFO [train.py:886] (1/4) Epoch 9, batch 3550, loss[loss=0.01668, audio_tagging_loss=0.01668, over 24750.00 frames. ], tot_loss[loss=0.0155, audio_tagging_loss=0.0155, over 4942497.33 frames. ], batch size: 99, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:05:09,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=277853.3333333333, ans=0.125 2023-12-21 23:05:38,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=278053.3333333333, ans=0.0 2023-12-21 23:05:41,551 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.34 vs. limit=22.5 2023-12-21 23:05:42,780 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 2.633e+01 2.767e+01 2.971e+01 3.918e+01, threshold=5.534e+01, percent-clipped=0.0 2023-12-21 23:05:48,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=278120.0, ans=0.0 2023-12-21 23:05:48,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=278120.0, ans=0.0 2023-12-21 23:05:52,249 INFO [train.py:886] (1/4) Epoch 9, batch 3600, loss[loss=0.01414, audio_tagging_loss=0.01414, over 24750.00 frames. ], tot_loss[loss=0.01542, audio_tagging_loss=0.01542, over 4941734.35 frames. ], batch size: 99, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:05:53,651 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.43 vs. limit=15.0 2023-12-21 23:05:59,389 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.07 vs. limit=15.0 2023-12-21 23:05:59,979 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.193e+00 2023-12-21 23:06:10,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=278253.3333333333, ans=0.125 2023-12-21 23:06:12,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=278320.0, ans=0.0 2023-12-21 23:06:25,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=278386.6666666667, ans=0.09899494936611666 2023-12-21 23:06:29,516 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.080e-02 2023-12-21 23:06:33,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=278453.3333333333, ans=0.2 2023-12-21 23:06:42,383 INFO [train.py:886] (1/4) Epoch 9, batch 3650, loss[loss=0.01401, audio_tagging_loss=0.01401, over 25000.00 frames. ], tot_loss[loss=0.01532, audio_tagging_loss=0.01532, over 4946029.85 frames. ], batch size: 100, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:06:47,681 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.71 vs. limit=15.0 2023-12-21 23:06:50,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=278520.0, ans=0.0 2023-12-21 23:07:00,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=278586.6666666667, ans=0.125 2023-12-21 23:07:25,669 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+01 2.660e+01 2.835e+01 3.011e+01 3.673e+01, threshold=5.670e+01, percent-clipped=0.0 2023-12-21 23:07:26,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=278786.6666666667, ans=0.2 2023-12-21 23:07:35,200 INFO [train.py:886] (1/4) Epoch 9, batch 3700, loss[loss=0.02, audio_tagging_loss=0.02, over 25000.00 frames. ], tot_loss[loss=0.0154, audio_tagging_loss=0.0154, over 4949919.09 frames. ], batch size: 100, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:07:40,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=278853.3333333333, ans=0.2 2023-12-21 23:07:41,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=278853.3333333333, ans=0.125 2023-12-21 23:08:28,111 INFO [train.py:886] (1/4) Epoch 9, batch 3750, loss[loss=0.0155, audio_tagging_loss=0.0155, over 24750.00 frames. ], tot_loss[loss=0.01549, audio_tagging_loss=0.01549, over 4946358.09 frames. ], batch size: 99, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:08:45,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=279253.3333333333, ans=0.125 2023-12-21 23:08:52,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=279320.0, ans=0.0 2023-12-21 23:08:55,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.34 vs. limit=22.5 2023-12-21 23:09:06,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=279386.6666666667, ans=0.125 2023-12-21 23:09:09,039 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.434e+01 2.692e+01 2.838e+01 2.996e+01 3.691e+01, threshold=5.675e+01, percent-clipped=0.0 2023-12-21 23:09:14,095 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.017e-02 2023-12-21 23:09:18,852 INFO [train.py:886] (1/4) Epoch 9, batch 3800, loss[loss=0.01842, audio_tagging_loss=0.01842, over 22163.00 frames. ], tot_loss[loss=0.01553, audio_tagging_loss=0.01553, over 4939869.94 frames. ], batch size: 107, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:10:03,980 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.30 vs. limit=15.0 2023-12-21 23:10:12,075 INFO [train.py:886] (1/4) Epoch 9, batch 3850, loss[loss=0.01423, audio_tagging_loss=0.01423, over 24750.00 frames. ], tot_loss[loss=0.01545, audio_tagging_loss=0.01545, over 4941251.67 frames. ], batch size: 99, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:10:13,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=279853.3333333333, ans=0.035 2023-12-21 23:10:18,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=279853.3333333333, ans=0.0 2023-12-21 23:10:27,709 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.42 vs. limit=6.0 2023-12-21 23:10:48,740 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 23:10:50,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=280053.3333333333, ans=0.2 2023-12-21 23:10:52,251 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.224e+01 2.650e+01 2.787e+01 2.948e+01 3.554e+01, threshold=5.574e+01, percent-clipped=0.0 2023-12-21 23:10:54,568 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.99 vs. limit=15.0 2023-12-21 23:11:03,179 INFO [train.py:886] (1/4) Epoch 9, batch 3900, loss[loss=0.01272, audio_tagging_loss=0.01272, over 24750.00 frames. ], tot_loss[loss=0.0153, audio_tagging_loss=0.0153, over 4941447.56 frames. ], batch size: 99, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:11:10,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=280186.6666666667, ans=0.0 2023-12-21 23:11:18,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=280253.3333333333, ans=0.0 2023-12-21 23:11:21,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=280253.3333333333, ans=0.0 2023-12-21 23:11:28,087 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.97 vs. limit=22.5 2023-12-21 23:11:30,740 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.34 vs. limit=15.0 2023-12-21 23:11:34,393 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.88 vs. limit=15.0 2023-12-21 23:11:34,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=280386.6666666667, ans=0.2 2023-12-21 23:11:35,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=280386.6666666667, ans=0.125 2023-12-21 23:11:53,505 INFO [train.py:886] (1/4) Epoch 9, batch 3950, loss[loss=0.0155, audio_tagging_loss=0.0155, over 25000.00 frames. ], tot_loss[loss=0.0153, audio_tagging_loss=0.0153, over 4945300.74 frames. ], batch size: 100, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:12:00,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=280520.0, ans=0.2 2023-12-21 23:12:06,566 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.08 vs. limit=15.0 2023-12-21 23:12:24,365 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.15 vs. limit=22.5 2023-12-21 23:12:28,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=280720.0, ans=0.125 2023-12-21 23:12:35,030 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.244e+01 2.621e+01 2.776e+01 2.902e+01 4.085e+01, threshold=5.551e+01, percent-clipped=0.0 2023-12-21 23:12:45,139 INFO [train.py:886] (1/4) Epoch 9, batch 4000, loss[loss=0.0161, audio_tagging_loss=0.0161, over 21933.00 frames. ], tot_loss[loss=0.01536, audio_tagging_loss=0.01536, over 4946541.84 frames. ], batch size: 107, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:12:53,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=280853.3333333333, ans=0.125 2023-12-21 23:13:00,288 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.164e+00 2023-12-21 23:13:10,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=280986.6666666667, ans=0.015 2023-12-21 23:13:13,800 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 23:13:14,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=281053.3333333333, ans=0.125 2023-12-21 23:13:25,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=281120.0, ans=0.0 2023-12-21 23:13:28,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=281120.0, ans=0.125 2023-12-21 23:13:35,308 INFO [train.py:886] (1/4) Epoch 9, batch 4050, loss[loss=0.01778, audio_tagging_loss=0.01778, over 24750.00 frames. ], tot_loss[loss=0.01546, audio_tagging_loss=0.01546, over 4950137.84 frames. ], batch size: 99, lr: 1.19e-02, grad_scale: 128.0 2023-12-21 23:13:43,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=281186.6666666667, ans=0.1 2023-12-21 23:13:44,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=281186.6666666667, ans=0.125 2023-12-21 23:13:45,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=281253.3333333333, ans=0.125 2023-12-21 23:13:48,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=281253.3333333333, ans=0.04949747468305833 2023-12-21 23:13:51,370 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.58 vs. limit=15.0 2023-12-21 23:13:53,086 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.73 vs. limit=12.0 2023-12-21 23:14:04,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=281320.0, ans=0.0 2023-12-21 23:14:09,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=281386.6666666667, ans=0.0 2023-12-21 23:14:11,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=281386.6666666667, ans=15.0 2023-12-21 23:14:12,482 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2023-12-21 23:14:14,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=281386.6666666667, ans=0.125 2023-12-21 23:14:16,847 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.243e+01 2.645e+01 2.783e+01 2.979e+01 3.494e+01, threshold=5.566e+01, percent-clipped=0.0 2023-12-21 23:14:20,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=281453.3333333333, ans=0.1 2023-12-21 23:14:26,222 INFO [train.py:886] (1/4) Epoch 9, batch 4100, loss[loss=0.01376, audio_tagging_loss=0.01376, over 24750.00 frames. ], tot_loss[loss=0.0156, audio_tagging_loss=0.0156, over 4944100.20 frames. ], batch size: 99, lr: 1.19e-02, grad_scale: 128.0 2023-12-21 23:14:34,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=281520.0, ans=0.04949747468305833 2023-12-21 23:14:48,383 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.43 vs. limit=22.5 2023-12-21 23:14:50,516 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.84 vs. limit=15.0 2023-12-21 23:14:51,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=281653.3333333333, ans=0.0 2023-12-21 23:14:52,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=281653.3333333333, ans=0.2 2023-12-21 23:14:52,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=281653.3333333333, ans=0.125 2023-12-21 23:15:01,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=281720.0, ans=0.125 2023-12-21 23:15:03,664 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2023-12-21 23:15:11,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=281786.6666666667, ans=0.125 2023-12-21 23:15:11,695 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.77 vs. limit=12.0 2023-12-21 23:15:19,258 INFO [train.py:886] (1/4) Epoch 9, batch 4150, loss[loss=0.01411, audio_tagging_loss=0.01411, over 24750.00 frames. ], tot_loss[loss=0.01558, audio_tagging_loss=0.01558, over 4941739.58 frames. ], batch size: 99, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:15:32,097 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.21 vs. limit=22.5 2023-12-21 23:15:39,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=281986.6666666667, ans=0.2 2023-12-21 23:15:51,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=282053.3333333333, ans=0.125 2023-12-21 23:15:53,656 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.05 vs. limit=12.0 2023-12-21 23:15:54,722 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.10 vs. limit=12.0 2023-12-21 23:16:00,692 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+01 2.634e+01 2.756e+01 2.939e+01 3.432e+01, threshold=5.512e+01, percent-clipped=0.0 2023-12-21 23:16:08,287 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.865e-03 2023-12-21 23:16:09,857 INFO [train.py:886] (1/4) Epoch 9, batch 4200, loss[loss=0.01797, audio_tagging_loss=0.01797, over 25000.00 frames. ], tot_loss[loss=0.01544, audio_tagging_loss=0.01544, over 4943114.24 frames. ], batch size: 100, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:17:02,313 INFO [train.py:886] (1/4) Epoch 9, batch 4250, loss[loss=0.01719, audio_tagging_loss=0.01719, over 25000.00 frames. ], tot_loss[loss=0.01537, audio_tagging_loss=0.01537, over 4946122.99 frames. ], batch size: 100, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:17:09,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.85 vs. limit=15.0 2023-12-21 23:17:43,724 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.286e+01 2.596e+01 2.839e+01 2.954e+01 3.918e+01, threshold=5.679e+01, percent-clipped=0.0 2023-12-21 23:17:47,682 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=24.22 vs. limit=22.5 2023-12-21 23:17:54,381 INFO [train.py:886] (1/4) Epoch 9, batch 4300, loss[loss=0.01285, audio_tagging_loss=0.01285, over 25000.00 frames. ], tot_loss[loss=0.0153, audio_tagging_loss=0.0153, over 4952834.79 frames. ], batch size: 100, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:18:10,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=282920.0, ans=0.5 2023-12-21 23:18:31,280 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.13 vs. limit=15.0 2023-12-21 23:18:45,891 INFO [train.py:886] (1/4) Epoch 9, batch 4350, loss[loss=0.01359, audio_tagging_loss=0.01359, over 24750.00 frames. ], tot_loss[loss=0.01546, audio_tagging_loss=0.01546, over 4956680.99 frames. ], batch size: 99, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:18:47,441 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.38 vs. limit=6.0 2023-12-21 23:19:10,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=283320.0, ans=0.07 2023-12-21 23:19:13,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=283320.0, ans=15.0 2023-12-21 23:19:29,156 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.360e+01 2.656e+01 2.784e+01 2.913e+01 3.411e+01, threshold=5.569e+01, percent-clipped=0.0 2023-12-21 23:19:33,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=283453.3333333333, ans=0.125 2023-12-21 23:19:38,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=283520.0, ans=0.0 2023-12-21 23:19:39,186 INFO [train.py:886] (1/4) Epoch 9, batch 4400, loss[loss=0.01344, audio_tagging_loss=0.01344, over 24246.00 frames. ], tot_loss[loss=0.01547, audio_tagging_loss=0.01547, over 4947201.70 frames. ], batch size: 101, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:19:42,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=283520.0, ans=0.2 2023-12-21 23:19:42,504 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.99 vs. limit=22.5 2023-12-21 23:19:47,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=283586.6666666667, ans=0.1 2023-12-21 23:20:06,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=283653.3333333333, ans=0.0 2023-12-21 23:20:21,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=283786.6666666667, ans=0.125 2023-12-21 23:20:27,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=283786.6666666667, ans=0.125 2023-12-21 23:20:29,342 INFO [train.py:886] (1/4) Epoch 9, batch 4450, loss[loss=0.01499, audio_tagging_loss=0.01499, over 24750.00 frames. ], tot_loss[loss=0.01549, audio_tagging_loss=0.01549, over 4943351.85 frames. ], batch size: 99, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:20:49,205 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.018e-01 2023-12-21 23:21:04,362 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.53 vs. limit=15.0 2023-12-21 23:21:12,822 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.308e+01 2.640e+01 2.772e+01 2.975e+01 3.500e+01, threshold=5.544e+01, percent-clipped=0.0 2023-12-21 23:21:14,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=284120.0, ans=0.0 2023-12-21 23:21:17,843 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=3.286e+00 2023-12-21 23:21:21,367 INFO [train.py:886] (1/4) Epoch 9, batch 4500, loss[loss=0.01178, audio_tagging_loss=0.01178, over 24750.00 frames. ], tot_loss[loss=0.01534, audio_tagging_loss=0.01534, over 4942060.70 frames. ], batch size: 99, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:21:27,621 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.59 vs. limit=22.5 2023-12-21 23:21:33,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=284253.3333333333, ans=0.04949747468305833 2023-12-21 23:21:38,006 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2023-12-21 23:21:42,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=284320.0, ans=0.125 2023-12-21 23:21:43,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=284320.0, ans=0.0 2023-12-21 23:21:59,091 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.34 vs. limit=6.0 2023-12-21 23:22:12,492 INFO [train.py:886] (1/4) Epoch 9, batch 4550, loss[loss=0.01591, audio_tagging_loss=0.01591, over 25000.00 frames. ], tot_loss[loss=0.01526, audio_tagging_loss=0.01526, over 4939848.07 frames. ], batch size: 100, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:22:39,139 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.81 vs. limit=12.0 2023-12-21 23:22:53,253 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.304e+01 2.599e+01 2.790e+01 2.989e+01 3.611e+01, threshold=5.581e+01, percent-clipped=0.0 2023-12-21 23:22:59,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=284786.6666666667, ans=0.0 2023-12-21 23:23:01,879 INFO [train.py:886] (1/4) Epoch 9, batch 4600, loss[loss=0.01347, audio_tagging_loss=0.01347, over 24020.00 frames. ], tot_loss[loss=0.01522, audio_tagging_loss=0.01522, over 4943839.73 frames. ], batch size: 100, lr: 1.18e-02, grad_scale: 64.0 2023-12-21 23:23:20,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=284920.0, ans=0.07 2023-12-21 23:23:22,265 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.35 vs. limit=22.5 2023-12-21 23:23:28,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=284986.6666666667, ans=0.0 2023-12-21 23:23:34,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=285053.3333333333, ans=0.025 2023-12-21 23:23:39,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=285053.3333333333, ans=0.1 2023-12-21 23:23:46,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=285120.0, ans=0.0 2023-12-21 23:23:50,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=285120.0, ans=0.125 2023-12-21 23:23:54,803 INFO [train.py:886] (1/4) Epoch 9, batch 4650, loss[loss=0.01293, audio_tagging_loss=0.01293, over 25000.00 frames. ], tot_loss[loss=0.01535, audio_tagging_loss=0.01535, over 4950797.33 frames. ], batch size: 100, lr: 1.18e-02, grad_scale: 64.0 2023-12-21 23:24:13,115 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.90 vs. limit=22.5 2023-12-21 23:24:19,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=285320.0, ans=0.95 2023-12-21 23:24:24,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=285386.6666666667, ans=0.125 2023-12-21 23:24:28,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=285386.6666666667, ans=0.125 2023-12-21 23:24:28,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.01 vs. limit=6.0 2023-12-21 23:24:35,610 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.352e+01 2.629e+01 2.821e+01 3.013e+01 3.684e+01, threshold=5.641e+01, percent-clipped=0.0 2023-12-21 23:24:36,118 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.77 vs. limit=22.5 2023-12-21 23:24:37,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=285453.3333333333, ans=0.125 2023-12-21 23:24:40,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=285453.3333333333, ans=0.1 2023-12-21 23:24:41,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=285453.3333333333, ans=0.125 2023-12-21 23:24:43,875 INFO [train.py:886] (1/4) Epoch 9, batch 4700, loss[loss=0.01783, audio_tagging_loss=0.01783, over 25000.00 frames. ], tot_loss[loss=0.01549, audio_tagging_loss=0.01549, over 4949484.76 frames. ], batch size: 100, lr: 1.18e-02, grad_scale: 64.0 2023-12-21 23:24:51,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=285520.0, ans=0.015 2023-12-21 23:24:51,623 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.20 vs. limit=15.0 2023-12-21 23:24:52,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=285586.6666666667, ans=0.05 2023-12-21 23:24:54,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=285586.6666666667, ans=0.5 2023-12-21 23:24:57,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=285586.6666666667, ans=0.125 2023-12-21 23:25:05,718 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.32 vs. limit=22.5 2023-12-21 23:25:12,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=285720.0, ans=0.125 2023-12-21 23:25:15,013 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.99 vs. limit=12.0 2023-12-21 23:25:17,116 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.039e-03 2023-12-21 23:25:31,410 INFO [train.py:886] (1/4) Epoch 9, batch 4750, loss[loss=0.0136, audio_tagging_loss=0.0136, over 24750.00 frames. ], tot_loss[loss=0.01559, audio_tagging_loss=0.01559, over 4942499.55 frames. ], batch size: 99, lr: 1.18e-02, grad_scale: 64.0 2023-12-21 23:25:44,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=285920.0, ans=0.125 2023-12-21 23:26:08,775 INFO [train.py:886] (1/4) Epoch 10, batch 0, loss[loss=0.03731, audio_tagging_loss=0.03731, over 25000.00 frames. ], tot_loss[loss=0.03731, audio_tagging_loss=0.03731, over 25000.00 frames. ], batch size: 100, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:26:08,775 INFO [train.py:909] (1/4) Computing validation loss 2023-12-21 23:26:30,372 INFO [train.py:917] (1/4) Epoch 10, validation: loss=0.03426, audio_tagging_loss=0.03426, over 3737520.00 frames. 2023-12-21 23:26:30,373 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-21 23:26:44,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=286026.6666666667, ans=0.0 2023-12-21 23:26:46,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=286026.6666666667, ans=0.1 2023-12-21 23:26:54,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=286093.3333333333, ans=0.2 2023-12-21 23:26:55,362 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.295e+01 2.685e+01 2.858e+01 3.839e+01 9.905e+01, threshold=5.715e+01, percent-clipped=6.0 2023-12-21 23:27:21,834 INFO [train.py:886] (1/4) Epoch 10, batch 50, loss[loss=0.01735, audio_tagging_loss=0.01735, over 25000.00 frames. ], tot_loss[loss=0.02416, audio_tagging_loss=0.02416, over 1119903.15 frames. ], batch size: 100, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:27:39,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=286360.0, ans=0.2 2023-12-21 23:27:54,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=286493.3333333333, ans=0.07 2023-12-21 23:28:12,482 INFO [train.py:886] (1/4) Epoch 10, batch 100, loss[loss=0.0187, audio_tagging_loss=0.0187, over 25000.00 frames. ], tot_loss[loss=0.02093, audio_tagging_loss=0.02093, over 1977604.20 frames. ], batch size: 100, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:28:21,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=286693.3333333333, ans=0.05 2023-12-21 23:28:23,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=286693.3333333333, ans=0.125 2023-12-21 23:28:38,475 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+01 2.924e+01 3.109e+01 3.428e+01 4.349e+01, threshold=6.218e+01, percent-clipped=0.0 2023-12-21 23:28:48,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=286826.6666666667, ans=10.0 2023-12-21 23:28:58,966 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=15.0 2023-12-21 23:29:04,977 INFO [train.py:886] (1/4) Epoch 10, batch 150, loss[loss=0.01506, audio_tagging_loss=0.01506, over 25000.00 frames. ], tot_loss[loss=0.01893, audio_tagging_loss=0.01893, over 2640876.53 frames. ], batch size: 100, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:29:13,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=287026.6666666667, ans=0.125 2023-12-21 23:29:13,842 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.80 vs. limit=10.0 2023-12-21 23:29:39,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=287160.0, ans=0.125 2023-12-21 23:29:54,465 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.71 vs. limit=15.0 2023-12-21 23:29:55,929 INFO [train.py:886] (1/4) Epoch 10, batch 200, loss[loss=0.01614, audio_tagging_loss=0.01614, over 25000.00 frames. ], tot_loss[loss=0.01789, audio_tagging_loss=0.01789, over 3159689.99 frames. ], batch size: 100, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:29:57,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=287293.3333333333, ans=0.125 2023-12-21 23:30:06,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=287360.0, ans=0.0 2023-12-21 23:30:10,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=287360.0, ans=0.125 2023-12-21 23:30:18,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=287426.6666666667, ans=0.1 2023-12-21 23:30:19,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=287426.6666666667, ans=0.125 2023-12-21 23:30:20,756 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.252e+01 2.628e+01 2.764e+01 2.956e+01 4.225e+01, threshold=5.527e+01, percent-clipped=0.0 2023-12-21 23:30:21,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=287426.6666666667, ans=0.125 2023-12-21 23:30:23,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=287426.6666666667, ans=0.125 2023-12-21 23:30:29,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=287493.3333333333, ans=0.05 2023-12-21 23:30:47,390 INFO [train.py:886] (1/4) Epoch 10, batch 250, loss[loss=0.01847, audio_tagging_loss=0.01847, over 24750.00 frames. ], tot_loss[loss=0.01706, audio_tagging_loss=0.01706, over 3562570.77 frames. ], batch size: 99, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:30:52,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=287626.6666666667, ans=0.125 2023-12-21 23:31:04,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=287693.3333333333, ans=0.2 2023-12-21 23:31:17,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=287826.6666666667, ans=0.0 2023-12-21 23:31:22,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=287826.6666666667, ans=0.05 2023-12-21 23:31:38,987 INFO [train.py:886] (1/4) Epoch 10, batch 300, loss[loss=0.01326, audio_tagging_loss=0.01326, over 24750.00 frames. ], tot_loss[loss=0.01663, audio_tagging_loss=0.01663, over 3869055.17 frames. ], batch size: 99, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:31:52,971 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=15.0 2023-12-21 23:32:03,473 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.409e+01 2.631e+01 2.795e+01 2.956e+01 3.518e+01, threshold=5.589e+01, percent-clipped=0.0 2023-12-21 23:32:05,966 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.42 vs. limit=15.0 2023-12-21 23:32:11,016 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.79 vs. limit=22.5 2023-12-21 23:32:12,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=288160.0, ans=0.2 2023-12-21 23:32:14,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=288160.0, ans=0.0 2023-12-21 23:32:24,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=288226.6666666667, ans=0.0 2023-12-21 23:32:28,479 INFO [train.py:886] (1/4) Epoch 10, batch 350, loss[loss=0.01316, audio_tagging_loss=0.01316, over 25000.00 frames. ], tot_loss[loss=0.01633, audio_tagging_loss=0.01633, over 4100573.06 frames. ], batch size: 100, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:32:29,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=288293.3333333333, ans=0.2 2023-12-21 23:32:52,840 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.73 vs. limit=10.0 2023-12-21 23:33:14,743 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.51 vs. limit=22.5 2023-12-21 23:33:15,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=288560.0, ans=0.1 2023-12-21 23:33:20,891 INFO [train.py:886] (1/4) Epoch 10, batch 400, loss[loss=0.01393, audio_tagging_loss=0.01393, over 25000.00 frames. ], tot_loss[loss=0.01598, audio_tagging_loss=0.01598, over 4292000.61 frames. ], batch size: 100, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:33:32,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.85 vs. limit=22.5 2023-12-21 23:33:34,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=288693.3333333333, ans=0.015 2023-12-21 23:33:38,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=288693.3333333333, ans=0.1 2023-12-21 23:33:45,760 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.69 vs. limit=22.5 2023-12-21 23:33:47,214 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.300e+01 2.616e+01 2.753e+01 2.907e+01 3.389e+01, threshold=5.507e+01, percent-clipped=0.0 2023-12-21 23:34:11,445 INFO [train.py:886] (1/4) Epoch 10, batch 450, loss[loss=0.01412, audio_tagging_loss=0.01412, over 25000.00 frames. ], tot_loss[loss=0.01576, audio_tagging_loss=0.01576, over 4440779.00 frames. ], batch size: 100, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:34:44,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=289160.0, ans=0.125 2023-12-21 23:34:46,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=289160.0, ans=0.07 2023-12-21 23:34:55,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=289226.6666666667, ans=0.1 2023-12-21 23:35:03,881 INFO [train.py:886] (1/4) Epoch 10, batch 500, loss[loss=0.01723, audio_tagging_loss=0.01723, over 21608.00 frames. ], tot_loss[loss=0.01559, audio_tagging_loss=0.01559, over 4554773.99 frames. ], batch size: 107, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:35:05,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=289293.3333333333, ans=0.1 2023-12-21 23:35:18,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=289360.0, ans=0.125 2023-12-21 23:35:26,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=289426.6666666667, ans=0.0 2023-12-21 23:35:30,833 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+01 2.565e+01 2.709e+01 2.854e+01 3.600e+01, threshold=5.419e+01, percent-clipped=0.0 2023-12-21 23:35:41,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=289493.3333333333, ans=0.0 2023-12-21 23:35:47,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=289560.0, ans=0.0 2023-12-21 23:35:56,485 INFO [train.py:886] (1/4) Epoch 10, batch 550, loss[loss=0.016, audio_tagging_loss=0.016, over 25000.00 frames. ], tot_loss[loss=0.01551, audio_tagging_loss=0.01551, over 4644891.51 frames. ], batch size: 100, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:35:59,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=289626.6666666667, ans=0.035 2023-12-21 23:36:06,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=289693.3333333333, ans=0.125 2023-12-21 23:36:06,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=289693.3333333333, ans=0.125 2023-12-21 23:36:06,536 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.44 vs. limit=12.0 2023-12-21 23:36:09,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=289693.3333333333, ans=0.125 2023-12-21 23:36:12,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=289693.3333333333, ans=0.125 2023-12-21 23:36:14,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=289760.0, ans=0.0 2023-12-21 23:36:16,020 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.56 vs. limit=15.0 2023-12-21 23:36:23,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=289760.0, ans=0.125 2023-12-21 23:36:42,002 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.59 vs. limit=22.5 2023-12-21 23:36:45,293 INFO [train.py:886] (1/4) Epoch 10, batch 600, loss[loss=0.01506, audio_tagging_loss=0.01506, over 24750.00 frames. ], tot_loss[loss=0.01551, audio_tagging_loss=0.01551, over 4709516.32 frames. ], batch size: 99, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:36:50,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=289960.0, ans=0.125 2023-12-21 23:36:56,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=290026.6666666667, ans=0.0 2023-12-21 23:37:10,375 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.394e+01 2.605e+01 2.757e+01 2.995e+01 3.479e+01, threshold=5.514e+01, percent-clipped=0.0 2023-12-21 23:37:33,601 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.54 vs. limit=22.5 2023-12-21 23:37:34,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=290226.6666666667, ans=0.125 2023-12-21 23:37:36,074 INFO [train.py:886] (1/4) Epoch 10, batch 650, loss[loss=0.01496, audio_tagging_loss=0.01496, over 24750.00 frames. ], tot_loss[loss=0.01556, audio_tagging_loss=0.01556, over 4750966.99 frames. ], batch size: 99, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:37:54,268 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.79 vs. limit=15.0 2023-12-21 23:38:05,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=290426.6666666667, ans=0.05 2023-12-21 23:38:26,564 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.93 vs. limit=6.0 2023-12-21 23:38:27,737 INFO [train.py:886] (1/4) Epoch 10, batch 700, loss[loss=0.01621, audio_tagging_loss=0.01621, over 25000.00 frames. ], tot_loss[loss=0.01551, audio_tagging_loss=0.01551, over 4796223.31 frames. ], batch size: 100, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:38:35,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=290626.6666666667, ans=0.1 2023-12-21 23:38:52,700 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.403e+01 2.671e+01 2.861e+01 3.072e+01 3.885e+01, threshold=5.722e+01, percent-clipped=0.0 2023-12-21 23:39:02,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=290826.6666666667, ans=0.0 2023-12-21 23:39:14,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=290893.3333333333, ans=0.125 2023-12-21 23:39:16,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=290893.3333333333, ans=0.0 2023-12-21 23:39:18,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=290960.0, ans=0.125 2023-12-21 23:39:18,740 INFO [train.py:886] (1/4) Epoch 10, batch 750, loss[loss=0.01674, audio_tagging_loss=0.01674, over 22962.00 frames. ], tot_loss[loss=0.0154, audio_tagging_loss=0.0154, over 4829030.33 frames. ], batch size: 107, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:39:57,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=291160.0, ans=0.125 2023-12-21 23:40:10,527 INFO [train.py:886] (1/4) Epoch 10, batch 800, loss[loss=0.01689, audio_tagging_loss=0.01689, over 25000.00 frames. ], tot_loss[loss=0.0153, audio_tagging_loss=0.0153, over 4858449.85 frames. ], batch size: 100, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:40:25,889 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.49 vs. limit=15.0 2023-12-21 23:40:34,600 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.277e+01 2.609e+01 2.792e+01 2.929e+01 3.584e+01, threshold=5.584e+01, percent-clipped=0.0 2023-12-21 23:40:45,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=291493.3333333333, ans=0.125 2023-12-21 23:40:47,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=291493.3333333333, ans=0.0 2023-12-21 23:40:53,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=291560.0, ans=0.1 2023-12-21 23:40:59,714 INFO [train.py:886] (1/4) Epoch 10, batch 850, loss[loss=0.01393, audio_tagging_loss=0.01393, over 25000.00 frames. ], tot_loss[loss=0.01532, audio_tagging_loss=0.01532, over 4876900.21 frames. ], batch size: 100, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:41:15,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=291693.3333333333, ans=0.125 2023-12-21 23:41:18,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=291693.3333333333, ans=0.2 2023-12-21 23:41:43,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=291893.3333333333, ans=0.2 2023-12-21 23:41:45,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=291893.3333333333, ans=0.125 2023-12-21 23:41:51,247 INFO [train.py:886] (1/4) Epoch 10, batch 900, loss[loss=0.01586, audio_tagging_loss=0.01586, over 25000.00 frames. ], tot_loss[loss=0.0154, audio_tagging_loss=0.0154, over 4897256.95 frames. ], batch size: 100, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:42:09,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=292026.6666666667, ans=0.1 2023-12-21 23:42:16,470 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.371e+01 2.658e+01 2.802e+01 2.964e+01 3.575e+01, threshold=5.603e+01, percent-clipped=0.0 2023-12-21 23:42:17,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=292093.3333333333, ans=0.1 2023-12-21 23:42:19,309 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.325e-02 2023-12-21 23:42:36,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=292226.6666666667, ans=0.0 2023-12-21 23:42:42,912 INFO [train.py:886] (1/4) Epoch 10, batch 950, loss[loss=0.01488, audio_tagging_loss=0.01488, over 24750.00 frames. ], tot_loss[loss=0.01552, audio_tagging_loss=0.01552, over 4906043.85 frames. ], batch size: 99, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:42:59,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=292360.0, ans=0.125 2023-12-21 23:43:03,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=292426.6666666667, ans=0.0 2023-12-21 23:43:04,513 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 23:43:32,954 INFO [train.py:886] (1/4) Epoch 10, batch 1000, loss[loss=0.01522, audio_tagging_loss=0.01522, over 24750.00 frames. ], tot_loss[loss=0.01545, audio_tagging_loss=0.01545, over 4912747.35 frames. ], batch size: 99, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:43:37,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=292626.6666666667, ans=0.125 2023-12-21 23:43:37,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=292626.6666666667, ans=0.2 2023-12-21 23:43:38,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=292626.6666666667, ans=0.125 2023-12-21 23:43:52,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=292760.0, ans=0.0 2023-12-21 23:43:54,589 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.011e+00 2023-12-21 23:43:55,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=292760.0, ans=0.1 2023-12-21 23:43:58,597 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.09 vs. limit=15.0 2023-12-21 23:43:59,054 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.316e+01 2.595e+01 2.757e+01 2.971e+01 3.553e+01, threshold=5.515e+01, percent-clipped=0.0 2023-12-21 23:44:05,188 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.13 vs. limit=22.5 2023-12-21 23:44:09,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=292826.6666666667, ans=0.0 2023-12-21 23:44:15,030 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.62 vs. limit=22.5 2023-12-21 23:44:18,710 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.29 vs. limit=22.5 2023-12-21 23:44:22,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=292893.3333333333, ans=0.125 2023-12-21 23:44:24,854 INFO [train.py:886] (1/4) Epoch 10, batch 1050, loss[loss=0.01283, audio_tagging_loss=0.01283, over 25000.00 frames. ], tot_loss[loss=0.01533, audio_tagging_loss=0.01533, over 4920270.30 frames. ], batch size: 100, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:44:30,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=292960.0, ans=0.125 2023-12-21 23:44:35,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=293026.6666666667, ans=0.125 2023-12-21 23:44:38,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=293026.6666666667, ans=0.0 2023-12-21 23:44:46,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=293093.3333333333, ans=0.125 2023-12-21 23:44:57,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=293160.0, ans=0.0 2023-12-21 23:44:59,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=293160.0, ans=0.125 2023-12-21 23:45:16,153 INFO [train.py:886] (1/4) Epoch 10, batch 1100, loss[loss=0.0127, audio_tagging_loss=0.0127, over 25000.00 frames. ], tot_loss[loss=0.01523, audio_tagging_loss=0.01523, over 4928556.54 frames. ], batch size: 100, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:45:29,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=293360.0, ans=0.0 2023-12-21 23:45:37,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=293360.0, ans=0.125 2023-12-21 23:45:44,966 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.223e+01 2.604e+01 2.781e+01 2.959e+01 3.663e+01, threshold=5.562e+01, percent-clipped=0.0 2023-12-21 23:45:48,168 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=3.869e+00 2023-12-21 23:45:54,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=293493.3333333333, ans=0.125 2023-12-21 23:45:59,062 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.02 vs. limit=15.0 2023-12-21 23:46:11,382 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.33 vs. limit=15.0 2023-12-21 23:46:12,039 INFO [train.py:886] (1/4) Epoch 10, batch 1150, loss[loss=0.01556, audio_tagging_loss=0.01556, over 25000.00 frames. ], tot_loss[loss=0.01521, audio_tagging_loss=0.01521, over 4936676.29 frames. ], batch size: 100, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:46:24,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=293693.3333333333, ans=0.0 2023-12-21 23:46:25,058 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=15.0 2023-12-21 23:46:41,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=293826.6666666667, ans=0.1 2023-12-21 23:47:02,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=293960.0, ans=0.0 2023-12-21 23:47:03,897 INFO [train.py:886] (1/4) Epoch 10, batch 1200, loss[loss=0.01361, audio_tagging_loss=0.01361, over 25000.00 frames. ], tot_loss[loss=0.01521, audio_tagging_loss=0.01521, over 4944781.14 frames. ], batch size: 100, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:47:18,174 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.079e-02 2023-12-21 23:47:23,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=294093.3333333333, ans=0.1 2023-12-21 23:47:28,071 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.156e+01 2.576e+01 2.725e+01 2.862e+01 3.373e+01, threshold=5.450e+01, percent-clipped=0.0 2023-12-21 23:47:54,688 INFO [train.py:886] (1/4) Epoch 10, batch 1250, loss[loss=0.0144, audio_tagging_loss=0.0144, over 24750.00 frames. ], tot_loss[loss=0.01529, audio_tagging_loss=0.01529, over 4945723.17 frames. ], batch size: 99, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:47:59,790 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.73 vs. limit=15.0 2023-12-21 23:48:05,360 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.15 vs. limit=15.0 2023-12-21 23:48:10,914 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.66 vs. limit=15.0 2023-12-21 23:48:18,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=294426.6666666667, ans=0.125 2023-12-21 23:48:18,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=294426.6666666667, ans=0.125 2023-12-21 23:48:31,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=294493.3333333333, ans=0.125 2023-12-21 23:48:36,601 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.58 vs. limit=22.5 2023-12-21 23:48:40,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=294560.0, ans=0.1 2023-12-21 23:48:46,966 INFO [train.py:886] (1/4) Epoch 10, batch 1300, loss[loss=0.01576, audio_tagging_loss=0.01576, over 24750.00 frames. ], tot_loss[loss=0.01544, audio_tagging_loss=0.01544, over 4939355.88 frames. ], batch size: 99, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:48:50,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=294626.6666666667, ans=0.1 2023-12-21 23:48:59,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=294693.3333333333, ans=0.1 2023-12-21 23:49:11,956 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.209e+00 2023-12-21 23:49:13,636 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.113e+01 2.699e+01 2.817e+01 2.948e+01 3.406e+01, threshold=5.634e+01, percent-clipped=0.0 2023-12-21 23:49:19,803 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.41 vs. limit=15.0 2023-12-21 23:49:35,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=294893.3333333333, ans=0.125 2023-12-21 23:49:37,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=294893.3333333333, ans=0.125 2023-12-21 23:49:39,368 INFO [train.py:886] (1/4) Epoch 10, batch 1350, loss[loss=0.01375, audio_tagging_loss=0.01375, over 24750.00 frames. ], tot_loss[loss=0.01538, audio_tagging_loss=0.01538, over 4941602.34 frames. ], batch size: 99, lr: 1.11e-02, grad_scale: 128.0 2023-12-21 23:49:43,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=294960.0, ans=0.0 2023-12-21 23:50:18,488 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.74 vs. limit=22.5 2023-12-21 23:50:27,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=295226.6666666667, ans=0.125 2023-12-21 23:50:30,337 INFO [train.py:886] (1/4) Epoch 10, batch 1400, loss[loss=0.01785, audio_tagging_loss=0.01785, over 24750.00 frames. ], tot_loss[loss=0.01531, audio_tagging_loss=0.01531, over 4946690.83 frames. ], batch size: 99, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:50:30,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=295293.3333333333, ans=0.125 2023-12-21 23:50:45,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=295360.0, ans=0.1 2023-12-21 23:50:55,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=295426.6666666667, ans=0.1 2023-12-21 23:50:57,860 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.330e+01 2.578e+01 2.757e+01 2.921e+01 3.435e+01, threshold=5.514e+01, percent-clipped=0.0 2023-12-21 23:51:00,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=295493.3333333333, ans=0.2 2023-12-21 23:51:16,530 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.80 vs. limit=15.0 2023-12-21 23:51:19,235 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.99 vs. limit=15.0 2023-12-21 23:51:23,347 INFO [train.py:886] (1/4) Epoch 10, batch 1450, loss[loss=0.01477, audio_tagging_loss=0.01477, over 25000.00 frames. ], tot_loss[loss=0.01514, audio_tagging_loss=0.01514, over 4950444.35 frames. ], batch size: 100, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:51:23,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=295626.6666666667, ans=0.1 2023-12-21 23:51:36,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=295693.3333333333, ans=0.125 2023-12-21 23:51:42,746 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.72 vs. limit=10.0 2023-12-21 23:52:14,524 INFO [train.py:886] (1/4) Epoch 10, batch 1500, loss[loss=0.01542, audio_tagging_loss=0.01542, over 25000.00 frames. ], tot_loss[loss=0.01517, audio_tagging_loss=0.01517, over 4952100.00 frames. ], batch size: 100, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:52:33,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=296026.6666666667, ans=0.1 2023-12-21 23:52:34,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=296026.6666666667, ans=0.1 2023-12-21 23:52:41,514 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.355e+01 2.575e+01 2.789e+01 2.982e+01 3.364e+01, threshold=5.578e+01, percent-clipped=0.0 2023-12-21 23:52:51,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=296160.0, ans=15.0 2023-12-21 23:52:55,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=296160.0, ans=0.0 2023-12-21 23:53:06,797 INFO [train.py:886] (1/4) Epoch 10, batch 1550, loss[loss=0.01842, audio_tagging_loss=0.01842, over 24949.00 frames. ], tot_loss[loss=0.01535, audio_tagging_loss=0.01535, over 4953257.58 frames. ], batch size: 100, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:53:07,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=296293.3333333333, ans=0.02 2023-12-21 23:53:07,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=296293.3333333333, ans=0.1 2023-12-21 23:53:08,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=296293.3333333333, ans=0.125 2023-12-21 23:53:12,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=296293.3333333333, ans=0.125 2023-12-21 23:53:18,093 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.41 vs. limit=15.0 2023-12-21 23:53:25,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=296360.0, ans=0.125 2023-12-21 23:53:28,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.35 vs. limit=15.0 2023-12-21 23:53:32,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=296426.6666666667, ans=0.0 2023-12-21 23:53:34,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=296426.6666666667, ans=0.0 2023-12-21 23:53:36,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=296493.3333333333, ans=0.5 2023-12-21 23:53:52,235 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.56 vs. limit=10.0 2023-12-21 23:53:59,460 INFO [train.py:886] (1/4) Epoch 10, batch 1600, loss[loss=0.01639, audio_tagging_loss=0.01639, over 25000.00 frames. ], tot_loss[loss=0.0154, audio_tagging_loss=0.0154, over 4950561.18 frames. ], batch size: 100, lr: 1.10e-02, grad_scale: 64.0 2023-12-21 23:54:01,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=296626.6666666667, ans=0.0 2023-12-21 23:54:14,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=296693.3333333333, ans=0.125 2023-12-21 23:54:15,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=296693.3333333333, ans=22.5 2023-12-21 23:54:17,243 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.32 vs. limit=6.0 2023-12-21 23:54:21,720 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.14 vs. limit=22.5 2023-12-21 23:54:25,589 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.316e+01 2.673e+01 2.852e+01 3.027e+01 3.338e+01, threshold=5.705e+01, percent-clipped=0.0 2023-12-21 23:54:33,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=296826.6666666667, ans=0.125 2023-12-21 23:54:36,010 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.40 vs. limit=15.0 2023-12-21 23:54:49,781 INFO [train.py:886] (1/4) Epoch 10, batch 1650, loss[loss=0.01414, audio_tagging_loss=0.01414, over 25000.00 frames. ], tot_loss[loss=0.01526, audio_tagging_loss=0.01526, over 4949018.82 frames. ], batch size: 100, lr: 1.10e-02, grad_scale: 64.0 2023-12-21 23:55:13,887 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.33 vs. limit=22.5 2023-12-21 23:55:23,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=297160.0, ans=0.125 2023-12-21 23:55:25,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=297160.0, ans=0.125 2023-12-21 23:55:42,888 INFO [train.py:886] (1/4) Epoch 10, batch 1700, loss[loss=0.0148, audio_tagging_loss=0.0148, over 25000.00 frames. ], tot_loss[loss=0.01523, audio_tagging_loss=0.01523, over 4950958.78 frames. ], batch size: 100, lr: 1.10e-02, grad_scale: 64.0 2023-12-21 23:55:53,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=297360.0, ans=0.125 2023-12-21 23:55:57,455 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 23:56:10,400 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.376e+01 2.587e+01 2.708e+01 2.875e+01 3.627e+01, threshold=5.416e+01, percent-clipped=0.0 2023-12-21 23:56:32,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=297626.6666666667, ans=0.0 2023-12-21 23:56:33,977 INFO [train.py:886] (1/4) Epoch 10, batch 1750, loss[loss=0.01516, audio_tagging_loss=0.01516, over 25000.00 frames. ], tot_loss[loss=0.01522, audio_tagging_loss=0.01522, over 4953847.22 frames. ], batch size: 100, lr: 1.10e-02, grad_scale: 64.0 2023-12-21 23:57:17,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=297893.3333333333, ans=10.0 2023-12-21 23:57:24,857 INFO [train.py:886] (1/4) Epoch 10, batch 1800, loss[loss=0.0182, audio_tagging_loss=0.0182, over 24750.00 frames. ], tot_loss[loss=0.01517, audio_tagging_loss=0.01517, over 4952124.11 frames. ], batch size: 99, lr: 1.10e-02, grad_scale: 64.0 2023-12-21 23:57:26,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=297960.0, ans=0.0 2023-12-21 23:57:27,168 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.61 vs. limit=15.0 2023-12-21 23:57:34,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=298026.6666666667, ans=0.125 2023-12-21 23:57:40,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=298026.6666666667, ans=0.1 2023-12-21 23:57:52,156 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 2.629e+01 2.799e+01 2.963e+01 3.676e+01, threshold=5.597e+01, percent-clipped=0.0 2023-12-21 23:57:57,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=298160.0, ans=0.0 2023-12-21 23:58:17,347 INFO [train.py:886] (1/4) Epoch 10, batch 1850, loss[loss=0.02328, audio_tagging_loss=0.02328, over 24943.00 frames. ], tot_loss[loss=0.01526, audio_tagging_loss=0.01526, over 4954988.01 frames. ], batch size: 100, lr: 1.10e-02, grad_scale: 64.0 2023-12-21 23:58:39,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.45 vs. limit=15.0 2023-12-21 23:58:52,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=298493.3333333333, ans=0.0 2023-12-21 23:58:54,613 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.20 vs. limit=15.0 2023-12-21 23:59:02,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=298560.0, ans=0.0 2023-12-21 23:59:07,467 INFO [train.py:886] (1/4) Epoch 10, batch 1900, loss[loss=0.01655, audio_tagging_loss=0.01655, over 24750.00 frames. ], tot_loss[loss=0.01536, audio_tagging_loss=0.01536, over 4949410.67 frames. ], batch size: 99, lr: 1.10e-02, grad_scale: 64.0 2023-12-21 23:59:24,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=298693.3333333333, ans=0.0 2023-12-21 23:59:30,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=298760.0, ans=0.125 2023-12-21 23:59:31,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=298760.0, ans=0.125 2023-12-21 23:59:34,344 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.174e+01 2.713e+01 2.863e+01 3.091e+01 4.533e+01, threshold=5.726e+01, percent-clipped=0.0 2023-12-21 23:59:34,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=298760.0, ans=0.125 2023-12-21 23:59:45,386 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.396e-01 2023-12-21 23:59:54,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=298893.3333333333, ans=0.125 2023-12-21 23:59:54,812 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.59 vs. limit=6.0 2023-12-21 23:59:54,883 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.42 vs. limit=15.0 2023-12-21 23:59:55,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=298893.3333333333, ans=0.95 2023-12-21 23:59:59,051 INFO [train.py:886] (1/4) Epoch 10, batch 1950, loss[loss=0.01552, audio_tagging_loss=0.01552, over 24750.00 frames. ], tot_loss[loss=0.0153, audio_tagging_loss=0.0153, over 4949013.54 frames. ], batch size: 99, lr: 1.10e-02, grad_scale: 64.0 2023-12-22 00:00:00,585 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=12.0 2023-12-22 00:00:03,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=298960.0, ans=0.2 2023-12-22 00:00:06,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=298960.0, ans=0.125 2023-12-22 00:00:07,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=299026.6666666667, ans=0.125 2023-12-22 00:00:26,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=299093.3333333333, ans=0.125 2023-12-22 00:00:28,796 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=12.0 2023-12-22 00:00:36,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=299160.0, ans=0.0 2023-12-22 00:00:47,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=299226.6666666667, ans=0.125 2023-12-22 00:00:50,513 INFO [train.py:886] (1/4) Epoch 10, batch 2000, loss[loss=0.01866, audio_tagging_loss=0.01866, over 25000.00 frames. ], tot_loss[loss=0.01521, audio_tagging_loss=0.01521, over 4949806.84 frames. ], batch size: 100, lr: 1.10e-02, grad_scale: 64.0 2023-12-22 00:01:02,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=299360.0, ans=0.1 2023-12-22 00:01:04,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=299360.0, ans=0.125 2023-12-22 00:01:11,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=299426.6666666667, ans=15.0 2023-12-22 00:01:15,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.69 vs. limit=15.0 2023-12-22 00:01:16,431 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.262e+01 2.618e+01 2.723e+01 2.914e+01 3.556e+01, threshold=5.446e+01, percent-clipped=0.0 2023-12-22 00:01:20,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=299493.3333333333, ans=0.125 2023-12-22 00:01:36,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=299560.0, ans=0.2 2023-12-22 00:01:42,017 INFO [train.py:886] (1/4) Epoch 10, batch 2050, loss[loss=0.01676, audio_tagging_loss=0.01676, over 24750.00 frames. ], tot_loss[loss=0.01517, audio_tagging_loss=0.01517, over 4953403.35 frames. ], batch size: 99, lr: 1.10e-02, grad_scale: 64.0 2023-12-22 00:01:46,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=299626.6666666667, ans=0.125 2023-12-22 00:01:52,735 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.84 vs. limit=22.5 2023-12-22 00:01:57,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=299693.3333333333, ans=0.09899494936611666 2023-12-22 00:02:05,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=299760.0, ans=0.125 2023-12-22 00:02:05,566 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.97 vs. limit=15.0 2023-12-22 00:02:08,393 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.05 vs. limit=15.0 2023-12-22 00:02:33,764 INFO [train.py:886] (1/4) Epoch 10, batch 2100, loss[loss=0.01545, audio_tagging_loss=0.01545, over 25000.00 frames. ], tot_loss[loss=0.01515, audio_tagging_loss=0.01515, over 4949746.63 frames. ], batch size: 100, lr: 1.10e-02, grad_scale: 64.0 2023-12-22 00:02:41,604 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.049e-02 2023-12-22 00:02:44,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=300026.6666666667, ans=0.1 2023-12-22 00:02:45,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=300026.6666666667, ans=0.1 2023-12-22 00:03:00,779 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.290e+01 2.587e+01 2.718e+01 2.864e+01 3.394e+01, threshold=5.437e+01, percent-clipped=0.0 2023-12-22 00:03:04,767 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.86 vs. limit=22.5 2023-12-22 00:03:24,902 INFO [train.py:886] (1/4) Epoch 10, batch 2150, loss[loss=0.01387, audio_tagging_loss=0.01387, over 24750.00 frames. ], tot_loss[loss=0.01516, audio_tagging_loss=0.01516, over 4958057.05 frames. ], batch size: 99, lr: 1.10e-02, grad_scale: 64.0 2023-12-22 00:03:31,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=300293.3333333333, ans=0.125 2023-12-22 00:03:42,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=300360.0, ans=0.0 2023-12-22 00:03:50,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=300426.6666666667, ans=0.125 2023-12-22 00:03:52,821 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.71 vs. limit=15.0 2023-12-22 00:04:01,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=300493.3333333333, ans=0.1 2023-12-22 00:04:06,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=300560.0, ans=0.1 2023-12-22 00:04:12,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=300560.0, ans=0.125 2023-12-22 00:04:17,936 INFO [train.py:886] (1/4) Epoch 10, batch 2200, loss[loss=0.01562, audio_tagging_loss=0.01562, over 24750.00 frames. ], tot_loss[loss=0.01523, audio_tagging_loss=0.01523, over 4952126.11 frames. ], batch size: 99, lr: 1.10e-02, grad_scale: 64.0 2023-12-22 00:04:26,882 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.55 vs. limit=12.0 2023-12-22 00:04:28,786 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.94 vs. limit=15.0 2023-12-22 00:04:35,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=300693.3333333333, ans=0.125 2023-12-22 00:04:38,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=300760.0, ans=0.0 2023-12-22 00:04:42,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=300760.0, ans=0.125 2023-12-22 00:04:43,833 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.287e+01 2.615e+01 2.771e+01 2.942e+01 3.456e+01, threshold=5.541e+01, percent-clipped=0.0 2023-12-22 00:04:52,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=300826.6666666667, ans=0.0 2023-12-22 00:04:53,203 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.200e-01 2023-12-22 00:05:00,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=300893.3333333333, ans=0.1 2023-12-22 00:05:06,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=300893.3333333333, ans=0.125 2023-12-22 00:05:09,188 INFO [train.py:886] (1/4) Epoch 10, batch 2250, loss[loss=0.01475, audio_tagging_loss=0.01475, over 24750.00 frames. ], tot_loss[loss=0.01525, audio_tagging_loss=0.01525, over 4952464.98 frames. ], batch size: 99, lr: 1.10e-02, grad_scale: 64.0 2023-12-22 00:05:09,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=300960.0, ans=0.04949747468305833 2023-12-22 00:05:23,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=301026.6666666667, ans=0.125 2023-12-22 00:05:57,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=301226.6666666667, ans=0.05 2023-12-22 00:05:57,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=301226.6666666667, ans=0.125 2023-12-22 00:05:59,830 INFO [train.py:886] (1/4) Epoch 10, batch 2300, loss[loss=0.01806, audio_tagging_loss=0.01806, over 24750.00 frames. ], tot_loss[loss=0.01521, audio_tagging_loss=0.01521, over 4952860.31 frames. ], batch size: 99, lr: 1.10e-02, grad_scale: 64.0 2023-12-22 00:06:07,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=301293.3333333333, ans=0.5 2023-12-22 00:06:09,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=301360.0, ans=0.1 2023-12-22 00:06:10,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=301360.0, ans=0.0 2023-12-22 00:06:26,822 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.310e+01 2.627e+01 2.751e+01 2.919e+01 3.578e+01, threshold=5.503e+01, percent-clipped=0.0 2023-12-22 00:06:50,948 INFO [train.py:886] (1/4) Epoch 10, batch 2350, loss[loss=0.01763, audio_tagging_loss=0.01763, over 25000.00 frames. ], tot_loss[loss=0.01523, audio_tagging_loss=0.01523, over 4951429.19 frames. ], batch size: 100, lr: 1.10e-02, grad_scale: 64.0 2023-12-22 00:06:58,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=301626.6666666667, ans=0.1 2023-12-22 00:07:03,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=301693.3333333333, ans=0.1 2023-12-22 00:07:07,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=301693.3333333333, ans=0.0 2023-12-22 00:07:09,926 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.51 vs. limit=15.0 2023-12-22 00:07:16,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=301760.0, ans=0.125 2023-12-22 00:07:23,113 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2023-12-22 00:07:37,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=301893.3333333333, ans=0.125 2023-12-22 00:07:42,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=301960.0, ans=0.0 2023-12-22 00:07:42,758 INFO [train.py:886] (1/4) Epoch 10, batch 2400, loss[loss=0.01503, audio_tagging_loss=0.01503, over 25000.00 frames. ], tot_loss[loss=0.01514, audio_tagging_loss=0.01514, over 4955727.50 frames. ], batch size: 100, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:07:59,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=302026.6666666667, ans=0.1 2023-12-22 00:08:08,764 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.344e+01 2.611e+01 2.780e+01 2.950e+01 3.631e+01, threshold=5.560e+01, percent-clipped=0.0 2023-12-22 00:08:09,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=302093.3333333333, ans=0.125 2023-12-22 00:08:22,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=302226.6666666667, ans=0.0 2023-12-22 00:08:31,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=302226.6666666667, ans=0.0 2023-12-22 00:08:33,178 INFO [train.py:886] (1/4) Epoch 10, batch 2450, loss[loss=0.01752, audio_tagging_loss=0.01752, over 25000.00 frames. ], tot_loss[loss=0.01512, audio_tagging_loss=0.01512, over 4955436.31 frames. ], batch size: 100, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:08:51,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=302360.0, ans=0.0 2023-12-22 00:08:51,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=302360.0, ans=0.125 2023-12-22 00:08:55,901 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.33 vs. limit=15.0 2023-12-22 00:09:01,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=302426.6666666667, ans=0.0 2023-12-22 00:09:25,547 INFO [train.py:886] (1/4) Epoch 10, batch 2500, loss[loss=0.015, audio_tagging_loss=0.015, over 24750.00 frames. ], tot_loss[loss=0.01517, audio_tagging_loss=0.01517, over 4952865.97 frames. ], batch size: 99, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:09:29,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=302626.6666666667, ans=0.125 2023-12-22 00:09:29,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.63 vs. limit=10.0 2023-12-22 00:09:46,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.40 vs. limit=22.5 2023-12-22 00:09:52,766 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.429e+01 2.737e+01 2.858e+01 3.085e+01 3.601e+01, threshold=5.717e+01, percent-clipped=0.0 2023-12-22 00:10:03,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=302826.6666666667, ans=0.1 2023-12-22 00:10:16,835 INFO [train.py:886] (1/4) Epoch 10, batch 2550, loss[loss=0.01647, audio_tagging_loss=0.01647, over 24750.00 frames. ], tot_loss[loss=0.0153, audio_tagging_loss=0.0153, over 4948244.01 frames. ], batch size: 99, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:10:32,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=303026.6666666667, ans=0.125 2023-12-22 00:10:39,247 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.15 vs. limit=22.5 2023-12-22 00:11:04,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=303226.6666666667, ans=0.1 2023-12-22 00:11:08,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=303293.3333333333, ans=0.125 2023-12-22 00:11:08,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=303293.3333333333, ans=0.1 2023-12-22 00:11:08,869 INFO [train.py:886] (1/4) Epoch 10, batch 2600, loss[loss=0.01306, audio_tagging_loss=0.01306, over 22542.00 frames. ], tot_loss[loss=0.01522, audio_tagging_loss=0.01522, over 4940659.88 frames. ], batch size: 107, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:11:10,489 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.75 vs. limit=15.0 2023-12-22 00:11:36,001 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.384e+01 2.658e+01 2.842e+01 3.002e+01 3.670e+01, threshold=5.683e+01, percent-clipped=0.0 2023-12-22 00:11:36,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=303426.6666666667, ans=0.125 2023-12-22 00:11:44,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=303493.3333333333, ans=0.1 2023-12-22 00:11:50,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=303560.0, ans=0.125 2023-12-22 00:12:00,776 INFO [train.py:886] (1/4) Epoch 10, batch 2650, loss[loss=0.01554, audio_tagging_loss=0.01554, over 25000.00 frames. ], tot_loss[loss=0.01511, audio_tagging_loss=0.01511, over 4944796.37 frames. ], batch size: 100, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:12:25,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=303760.0, ans=10.0 2023-12-22 00:12:30,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=303826.6666666667, ans=0.125 2023-12-22 00:12:44,938 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 00:12:46,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=303893.3333333333, ans=0.09899494936611666 2023-12-22 00:12:47,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=303893.3333333333, ans=0.125 2023-12-22 00:12:51,225 INFO [train.py:886] (1/4) Epoch 10, batch 2700, loss[loss=0.01526, audio_tagging_loss=0.01526, over 25000.00 frames. ], tot_loss[loss=0.01505, audio_tagging_loss=0.01505, over 4946719.79 frames. ], batch size: 100, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:13:04,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=304026.6666666667, ans=0.0 2023-12-22 00:13:06,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=304026.6666666667, ans=0.0 2023-12-22 00:13:18,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=304093.3333333333, ans=0.0 2023-12-22 00:13:18,951 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+01 2.625e+01 2.781e+01 2.947e+01 3.660e+01, threshold=5.563e+01, percent-clipped=0.0 2023-12-22 00:13:22,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=304160.0, ans=0.09899494936611666 2023-12-22 00:13:36,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=304226.6666666667, ans=0.0 2023-12-22 00:13:38,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=304226.6666666667, ans=0.0 2023-12-22 00:13:40,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=304226.6666666667, ans=0.0 2023-12-22 00:13:44,518 INFO [train.py:886] (1/4) Epoch 10, batch 2750, loss[loss=0.0155, audio_tagging_loss=0.0155, over 25000.00 frames. ], tot_loss[loss=0.01504, audio_tagging_loss=0.01504, over 4947384.70 frames. ], batch size: 100, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:13:45,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=304293.3333333333, ans=0.0 2023-12-22 00:13:52,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=304293.3333333333, ans=0.125 2023-12-22 00:13:54,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=304360.0, ans=0.125 2023-12-22 00:14:25,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=304560.0, ans=0.125 2023-12-22 00:14:30,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=304560.0, ans=0.1 2023-12-22 00:14:35,054 INFO [train.py:886] (1/4) Epoch 10, batch 2800, loss[loss=0.01519, audio_tagging_loss=0.01519, over 24750.00 frames. ], tot_loss[loss=0.01504, audio_tagging_loss=0.01504, over 4946949.31 frames. ], batch size: 99, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:14:49,071 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.74 vs. limit=15.0 2023-12-22 00:15:00,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=304760.0, ans=0.125 2023-12-22 00:15:01,760 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.388e+01 2.658e+01 2.805e+01 2.937e+01 3.602e+01, threshold=5.609e+01, percent-clipped=0.0 2023-12-22 00:15:03,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=304760.0, ans=0.125 2023-12-22 00:15:07,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=304826.6666666667, ans=0.125 2023-12-22 00:15:08,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=304826.6666666667, ans=0.1 2023-12-22 00:15:16,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=304826.6666666667, ans=0.2 2023-12-22 00:15:17,433 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.50 vs. limit=15.0 2023-12-22 00:15:21,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=304893.3333333333, ans=0.1 2023-12-22 00:15:27,328 INFO [train.py:886] (1/4) Epoch 10, batch 2850, loss[loss=0.0179, audio_tagging_loss=0.0179, over 24750.00 frames. ], tot_loss[loss=0.01514, audio_tagging_loss=0.01514, over 4942922.98 frames. ], batch size: 99, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:15:37,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=305026.6666666667, ans=0.125 2023-12-22 00:15:51,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=305093.3333333333, ans=0.1 2023-12-22 00:15:53,195 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.17 vs. limit=22.5 2023-12-22 00:16:06,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=305160.0, ans=0.0 2023-12-22 00:16:08,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=305226.6666666667, ans=0.2 2023-12-22 00:16:13,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=305226.6666666667, ans=0.125 2023-12-22 00:16:14,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=305226.6666666667, ans=0.0 2023-12-22 00:16:19,442 INFO [train.py:886] (1/4) Epoch 10, batch 2900, loss[loss=0.01295, audio_tagging_loss=0.01295, over 25000.00 frames. ], tot_loss[loss=0.01507, audio_tagging_loss=0.01507, over 4946998.58 frames. ], batch size: 100, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:16:19,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=305293.3333333333, ans=0.0 2023-12-22 00:16:22,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=305293.3333333333, ans=0.0 2023-12-22 00:16:22,977 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.13 vs. limit=22.5 2023-12-22 00:16:27,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=305293.3333333333, ans=0.1 2023-12-22 00:16:43,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=305426.6666666667, ans=0.125 2023-12-22 00:16:43,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=305426.6666666667, ans=0.125 2023-12-22 00:16:44,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=305426.6666666667, ans=0.125 2023-12-22 00:16:45,546 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.261e+01 2.645e+01 2.819e+01 2.957e+01 3.644e+01, threshold=5.638e+01, percent-clipped=0.0 2023-12-22 00:16:56,487 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.30 vs. limit=10.0 2023-12-22 00:17:10,130 INFO [train.py:886] (1/4) Epoch 10, batch 2950, loss[loss=0.01521, audio_tagging_loss=0.01521, over 25000.00 frames. ], tot_loss[loss=0.01509, audio_tagging_loss=0.01509, over 4940188.82 frames. ], batch size: 100, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:17:33,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=305760.0, ans=0.125 2023-12-22 00:17:49,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=305826.6666666667, ans=0.0 2023-12-22 00:18:03,088 INFO [train.py:886] (1/4) Epoch 10, batch 3000, loss[loss=0.01564, audio_tagging_loss=0.01564, over 25000.00 frames. ], tot_loss[loss=0.01505, audio_tagging_loss=0.01505, over 4943002.13 frames. ], batch size: 100, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:18:03,089 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 00:18:24,625 INFO [train.py:917] (1/4) Epoch 10, validation: loss=0.03417, audio_tagging_loss=0.03417, over 3737520.00 frames. 2023-12-22 00:18:24,626 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 00:18:24,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=305960.0, ans=0.125 2023-12-22 00:18:34,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=306026.6666666667, ans=0.125 2023-12-22 00:18:39,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=306026.6666666667, ans=0.125 2023-12-22 00:18:41,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=306026.6666666667, ans=0.125 2023-12-22 00:18:49,712 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.328e+01 2.598e+01 2.703e+01 2.864e+01 3.269e+01, threshold=5.407e+01, percent-clipped=0.0 2023-12-22 00:19:03,394 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 00:19:09,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=306226.6666666667, ans=0.125 2023-12-22 00:19:14,563 INFO [train.py:886] (1/4) Epoch 10, batch 3050, loss[loss=0.01477, audio_tagging_loss=0.01477, over 21729.00 frames. ], tot_loss[loss=0.015, audio_tagging_loss=0.015, over 4944431.34 frames. ], batch size: 107, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:19:22,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=306293.3333333333, ans=0.125 2023-12-22 00:19:24,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=306360.0, ans=10.0 2023-12-22 00:19:30,217 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.83 vs. limit=15.0 2023-12-22 00:19:31,701 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.96 vs. limit=22.5 2023-12-22 00:19:46,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=306493.3333333333, ans=0.1 2023-12-22 00:20:07,361 INFO [train.py:886] (1/4) Epoch 10, batch 3100, loss[loss=0.01581, audio_tagging_loss=0.01581, over 25000.00 frames. ], tot_loss[loss=0.0151, audio_tagging_loss=0.0151, over 4952135.51 frames. ], batch size: 100, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:20:09,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=306626.6666666667, ans=0.125 2023-12-22 00:20:23,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=306693.3333333333, ans=0.1 2023-12-22 00:20:28,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=306760.0, ans=0.125 2023-12-22 00:20:34,052 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.376e+01 2.718e+01 2.843e+01 3.043e+01 4.179e+01, threshold=5.686e+01, percent-clipped=0.0 2023-12-22 00:20:58,238 INFO [train.py:886] (1/4) Epoch 10, batch 3150, loss[loss=0.01558, audio_tagging_loss=0.01558, over 24750.00 frames. ], tot_loss[loss=0.01519, audio_tagging_loss=0.01519, over 4947929.12 frames. ], batch size: 99, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:21:01,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=306960.0, ans=0.1 2023-12-22 00:21:02,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=306960.0, ans=0.125 2023-12-22 00:21:14,016 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=12.30 vs. limit=12.0 2023-12-22 00:21:20,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=307093.3333333333, ans=0.0 2023-12-22 00:21:23,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=307093.3333333333, ans=0.0 2023-12-22 00:21:37,140 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.83 vs. limit=15.0 2023-12-22 00:21:50,121 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.00 vs. limit=15.0 2023-12-22 00:21:50,529 INFO [train.py:886] (1/4) Epoch 10, batch 3200, loss[loss=0.01341, audio_tagging_loss=0.01341, over 24079.00 frames. ], tot_loss[loss=0.01525, audio_tagging_loss=0.01525, over 4947436.83 frames. ], batch size: 100, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:22:08,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=307360.0, ans=0.1 2023-12-22 00:22:10,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=307360.0, ans=0.2 2023-12-22 00:22:18,055 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.189e+01 2.605e+01 2.808e+01 3.019e+01 3.529e+01, threshold=5.615e+01, percent-clipped=0.0 2023-12-22 00:22:18,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=307426.6666666667, ans=0.1 2023-12-22 00:22:21,305 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.84 vs. limit=10.0 2023-12-22 00:22:26,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=307493.3333333333, ans=0.0 2023-12-22 00:22:33,492 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.09 vs. limit=15.0 2023-12-22 00:22:40,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=307560.0, ans=0.0 2023-12-22 00:22:42,807 INFO [train.py:886] (1/4) Epoch 10, batch 3250, loss[loss=0.01778, audio_tagging_loss=0.01778, over 24750.00 frames. ], tot_loss[loss=0.01522, audio_tagging_loss=0.01522, over 4945610.28 frames. ], batch size: 99, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:23:04,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=307760.0, ans=0.125 2023-12-22 00:23:04,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=307760.0, ans=0.2 2023-12-22 00:23:06,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=307760.0, ans=0.125 2023-12-22 00:23:07,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=307760.0, ans=0.1 2023-12-22 00:23:34,264 INFO [train.py:886] (1/4) Epoch 10, batch 3300, loss[loss=0.01516, audio_tagging_loss=0.01516, over 25000.00 frames. ], tot_loss[loss=0.01518, audio_tagging_loss=0.01518, over 4940645.80 frames. ], batch size: 100, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:23:53,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=308026.6666666667, ans=0.1 2023-12-22 00:23:57,196 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.39 vs. limit=15.0 2023-12-22 00:23:58,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=308093.3333333333, ans=0.0 2023-12-22 00:24:01,368 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.303e+01 2.643e+01 2.728e+01 2.904e+01 3.479e+01, threshold=5.455e+01, percent-clipped=0.0 2023-12-22 00:24:22,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=15.0 2023-12-22 00:24:26,096 INFO [train.py:886] (1/4) Epoch 10, batch 3350, loss[loss=0.01685, audio_tagging_loss=0.01685, over 21381.00 frames. ], tot_loss[loss=0.01518, audio_tagging_loss=0.01518, over 4943693.38 frames. ], batch size: 107, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:24:31,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=308293.3333333333, ans=0.2 2023-12-22 00:24:31,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=308293.3333333333, ans=0.0 2023-12-22 00:24:33,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2023-12-22 00:24:43,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=308360.0, ans=0.125 2023-12-22 00:24:50,240 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.65 vs. limit=10.0 2023-12-22 00:24:57,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=308493.3333333333, ans=0.0 2023-12-22 00:24:57,426 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.691e-01 2023-12-22 00:24:58,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=308493.3333333333, ans=0.2 2023-12-22 00:25:10,041 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.85 vs. limit=6.0 2023-12-22 00:25:16,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=308626.6666666667, ans=0.0 2023-12-22 00:25:17,548 INFO [train.py:886] (1/4) Epoch 10, batch 3400, loss[loss=0.0148, audio_tagging_loss=0.0148, over 24750.00 frames. ], tot_loss[loss=0.01508, audio_tagging_loss=0.01508, over 4950711.09 frames. ], batch size: 99, lr: 1.08e-02, grad_scale: 128.0 2023-12-22 00:25:35,204 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.35 vs. limit=22.5 2023-12-22 00:25:44,947 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.284e+01 2.624e+01 2.786e+01 2.955e+01 3.614e+01, threshold=5.572e+01, percent-clipped=0.0 2023-12-22 00:25:56,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=308826.6666666667, ans=0.125 2023-12-22 00:25:57,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.40 vs. limit=22.5 2023-12-22 00:26:09,917 INFO [train.py:886] (1/4) Epoch 10, batch 3450, loss[loss=0.01394, audio_tagging_loss=0.01394, over 24750.00 frames. ], tot_loss[loss=0.01512, audio_tagging_loss=0.01512, over 4948720.81 frames. ], batch size: 99, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:26:10,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=308960.0, ans=0.04949747468305833 2023-12-22 00:26:23,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=309026.6666666667, ans=0.125 2023-12-22 00:26:29,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=309026.6666666667, ans=0.1 2023-12-22 00:26:35,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=309093.3333333333, ans=0.09899494936611666 2023-12-22 00:27:00,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=309226.6666666667, ans=12.0 2023-12-22 00:27:02,322 INFO [train.py:886] (1/4) Epoch 10, batch 3500, loss[loss=0.01594, audio_tagging_loss=0.01594, over 25000.00 frames. ], tot_loss[loss=0.01521, audio_tagging_loss=0.01521, over 4946085.80 frames. ], batch size: 100, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:27:15,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=309360.0, ans=0.2 2023-12-22 00:27:24,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=309426.6666666667, ans=0.0 2023-12-22 00:27:30,741 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.274e+01 2.637e+01 2.779e+01 3.001e+01 4.121e+01, threshold=5.558e+01, percent-clipped=0.0 2023-12-22 00:27:31,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=309426.6666666667, ans=0.1 2023-12-22 00:27:36,139 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.49 vs. limit=15.0 2023-12-22 00:27:40,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=309493.3333333333, ans=0.125 2023-12-22 00:27:46,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=309560.0, ans=0.1 2023-12-22 00:27:46,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=309560.0, ans=0.125 2023-12-22 00:27:54,239 INFO [train.py:886] (1/4) Epoch 10, batch 3550, loss[loss=0.01432, audio_tagging_loss=0.01432, over 24750.00 frames. ], tot_loss[loss=0.01516, audio_tagging_loss=0.01516, over 4941952.50 frames. ], batch size: 99, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:28:16,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=309760.0, ans=0.07 2023-12-22 00:28:22,865 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 00:28:23,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=309760.0, ans=0.125 2023-12-22 00:28:31,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=309826.6666666667, ans=0.5 2023-12-22 00:28:45,413 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.37 vs. limit=6.0 2023-12-22 00:28:45,837 INFO [train.py:886] (1/4) Epoch 10, batch 3600, loss[loss=0.01489, audio_tagging_loss=0.01489, over 25000.00 frames. ], tot_loss[loss=0.0151, audio_tagging_loss=0.0151, over 4943718.38 frames. ], batch size: 100, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:28:47,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=309960.0, ans=0.0 2023-12-22 00:28:48,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.24 vs. limit=22.5 2023-12-22 00:29:14,192 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.309e+01 2.609e+01 2.736e+01 2.893e+01 3.548e+01, threshold=5.471e+01, percent-clipped=0.0 2023-12-22 00:29:22,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.91 vs. limit=22.5 2023-12-22 00:29:24,648 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.21 vs. limit=12.0 2023-12-22 00:29:30,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=310226.6666666667, ans=0.05 2023-12-22 00:29:37,930 INFO [train.py:886] (1/4) Epoch 10, batch 3650, loss[loss=0.0139, audio_tagging_loss=0.0139, over 21885.00 frames. ], tot_loss[loss=0.01497, audio_tagging_loss=0.01497, over 4943031.38 frames. ], batch size: 107, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:29:45,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=310293.3333333333, ans=0.0 2023-12-22 00:29:48,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=310360.0, ans=0.1 2023-12-22 00:29:48,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=310360.0, ans=0.125 2023-12-22 00:29:55,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=310360.0, ans=0.125 2023-12-22 00:30:02,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=310426.6666666667, ans=0.125 2023-12-22 00:30:04,311 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.44 vs. limit=22.5 2023-12-22 00:30:21,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=310560.0, ans=0.0 2023-12-22 00:30:28,849 INFO [train.py:886] (1/4) Epoch 10, batch 3700, loss[loss=0.01477, audio_tagging_loss=0.01477, over 25000.00 frames. ], tot_loss[loss=0.01498, audio_tagging_loss=0.01498, over 4947778.00 frames. ], batch size: 100, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:30:37,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=310626.6666666667, ans=0.2 2023-12-22 00:30:48,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=310693.3333333333, ans=15.0 2023-12-22 00:30:57,720 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+01 2.651e+01 2.800e+01 2.999e+01 3.516e+01, threshold=5.600e+01, percent-clipped=0.0 2023-12-22 00:30:58,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=310760.0, ans=0.95 2023-12-22 00:30:58,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=310760.0, ans=0.2 2023-12-22 00:30:59,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=310826.6666666667, ans=0.2 2023-12-22 00:31:16,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=310893.3333333333, ans=0.04949747468305833 2023-12-22 00:31:21,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=310960.0, ans=0.125 2023-12-22 00:31:22,337 INFO [train.py:886] (1/4) Epoch 10, batch 3750, loss[loss=0.01307, audio_tagging_loss=0.01307, over 24016.00 frames. ], tot_loss[loss=0.01514, audio_tagging_loss=0.01514, over 4942211.57 frames. ], batch size: 100, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:31:26,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=310960.0, ans=0.0 2023-12-22 00:31:38,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=311026.6666666667, ans=0.035 2023-12-22 00:31:51,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.97 vs. limit=15.0 2023-12-22 00:32:07,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=311226.6666666667, ans=0.125 2023-12-22 00:32:13,555 INFO [train.py:886] (1/4) Epoch 10, batch 3800, loss[loss=0.01184, audio_tagging_loss=0.01184, over 24750.00 frames. ], tot_loss[loss=0.01512, audio_tagging_loss=0.01512, over 4934070.30 frames. ], batch size: 99, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:32:14,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=311293.3333333333, ans=10.0 2023-12-22 00:32:28,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=311360.0, ans=0.125 2023-12-22 00:32:31,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=311360.0, ans=0.0 2023-12-22 00:32:31,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=311360.0, ans=0.09899494936611666 2023-12-22 00:32:36,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=311426.6666666667, ans=0.125 2023-12-22 00:32:41,154 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.397e+01 2.644e+01 2.770e+01 2.975e+01 3.634e+01, threshold=5.540e+01, percent-clipped=0.0 2023-12-22 00:32:50,509 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.91 vs. limit=6.0 2023-12-22 00:33:05,085 INFO [train.py:886] (1/4) Epoch 10, batch 3850, loss[loss=0.01407, audio_tagging_loss=0.01407, over 25000.00 frames. ], tot_loss[loss=0.01511, audio_tagging_loss=0.01511, over 4934286.92 frames. ], batch size: 100, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:33:06,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=311626.6666666667, ans=0.2 2023-12-22 00:33:16,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=311693.3333333333, ans=0.0 2023-12-22 00:33:24,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=311693.3333333333, ans=0.2 2023-12-22 00:33:38,833 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.60 vs. limit=22.5 2023-12-22 00:33:49,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=311893.3333333333, ans=0.0 2023-12-22 00:33:58,130 INFO [train.py:886] (1/4) Epoch 10, batch 3900, loss[loss=0.0149, audio_tagging_loss=0.0149, over 25000.00 frames. ], tot_loss[loss=0.01504, audio_tagging_loss=0.01504, over 4937787.73 frames. ], batch size: 100, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:34:01,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=311960.0, ans=0.2 2023-12-22 00:34:07,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=312026.6666666667, ans=0.125 2023-12-22 00:34:12,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=312026.6666666667, ans=0.125 2023-12-22 00:34:13,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=312026.6666666667, ans=0.1 2023-12-22 00:34:25,853 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.307e+01 2.618e+01 2.786e+01 2.971e+01 3.570e+01, threshold=5.572e+01, percent-clipped=0.0 2023-12-22 00:34:30,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=312160.0, ans=0.0 2023-12-22 00:34:42,993 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.01 vs. limit=12.0 2023-12-22 00:34:49,004 INFO [train.py:886] (1/4) Epoch 10, batch 3950, loss[loss=0.01557, audio_tagging_loss=0.01557, over 25000.00 frames. ], tot_loss[loss=0.01502, audio_tagging_loss=0.01502, over 4943161.96 frames. ], batch size: 100, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:34:49,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=312293.3333333333, ans=0.125 2023-12-22 00:34:53,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=312293.3333333333, ans=0.2 2023-12-22 00:35:16,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=312426.6666666667, ans=0.125 2023-12-22 00:35:38,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.34 vs. limit=15.0 2023-12-22 00:35:40,666 INFO [train.py:886] (1/4) Epoch 10, batch 4000, loss[loss=0.01491, audio_tagging_loss=0.01491, over 25000.00 frames. ], tot_loss[loss=0.01505, audio_tagging_loss=0.01505, over 4951225.32 frames. ], batch size: 100, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:35:54,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=312693.3333333333, ans=0.035 2023-12-22 00:35:56,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=312693.3333333333, ans=0.125 2023-12-22 00:35:56,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=312693.3333333333, ans=15.0 2023-12-22 00:36:03,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=312760.0, ans=0.125 2023-12-22 00:36:09,004 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.378e+01 2.644e+01 2.805e+01 2.924e+01 3.481e+01, threshold=5.611e+01, percent-clipped=0.0 2023-12-22 00:36:14,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=312826.6666666667, ans=0.125 2023-12-22 00:36:17,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=312826.6666666667, ans=0.125 2023-12-22 00:36:29,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=312893.3333333333, ans=0.1 2023-12-22 00:36:29,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=312893.3333333333, ans=0.0 2023-12-22 00:36:32,401 INFO [train.py:886] (1/4) Epoch 10, batch 4050, loss[loss=0.01209, audio_tagging_loss=0.01209, over 24750.00 frames. ], tot_loss[loss=0.01512, audio_tagging_loss=0.01512, over 4944399.23 frames. ], batch size: 99, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:36:44,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=313026.6666666667, ans=0.2 2023-12-22 00:36:46,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=313026.6666666667, ans=0.125 2023-12-22 00:36:51,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=313093.3333333333, ans=0.125 2023-12-22 00:36:57,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=313093.3333333333, ans=0.125 2023-12-22 00:37:00,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=313093.3333333333, ans=0.125 2023-12-22 00:37:23,798 INFO [train.py:886] (1/4) Epoch 10, batch 4100, loss[loss=0.0148, audio_tagging_loss=0.0148, over 24750.00 frames. ], tot_loss[loss=0.01522, audio_tagging_loss=0.01522, over 4946096.22 frames. ], batch size: 99, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:37:24,318 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2023-12-22 00:37:35,573 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.21 vs. limit=10.0 2023-12-22 00:37:51,842 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.404e+01 2.742e+01 2.860e+01 3.064e+01 3.458e+01, threshold=5.720e+01, percent-clipped=0.0 2023-12-22 00:38:10,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=313560.0, ans=0.125 2023-12-22 00:38:16,639 INFO [train.py:886] (1/4) Epoch 10, batch 4150, loss[loss=0.01615, audio_tagging_loss=0.01615, over 25000.00 frames. ], tot_loss[loss=0.01512, audio_tagging_loss=0.01512, over 4947355.44 frames. ], batch size: 100, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:38:20,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=313626.6666666667, ans=0.125 2023-12-22 00:38:30,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=313693.3333333333, ans=0.0 2023-12-22 00:38:52,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=313826.6666666667, ans=0.125 2023-12-22 00:39:01,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=313893.3333333333, ans=10.0 2023-12-22 00:39:03,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=313893.3333333333, ans=0.0 2023-12-22 00:39:05,469 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.40 vs. limit=8.0 2023-12-22 00:39:07,571 INFO [train.py:886] (1/4) Epoch 10, batch 4200, loss[loss=0.01214, audio_tagging_loss=0.01214, over 25000.00 frames. ], tot_loss[loss=0.01503, audio_tagging_loss=0.01503, over 4951402.29 frames. ], batch size: 100, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:39:14,584 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.01 vs. limit=15.0 2023-12-22 00:39:29,248 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.51 vs. limit=15.0 2023-12-22 00:39:30,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=314093.3333333333, ans=0.125 2023-12-22 00:39:36,004 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.283e+01 2.632e+01 2.766e+01 2.962e+01 3.622e+01, threshold=5.532e+01, percent-clipped=0.0 2023-12-22 00:39:45,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=314160.0, ans=0.0 2023-12-22 00:39:55,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=314226.6666666667, ans=0.0 2023-12-22 00:40:00,010 INFO [train.py:886] (1/4) Epoch 10, batch 4250, loss[loss=0.01498, audio_tagging_loss=0.01498, over 24750.00 frames. ], tot_loss[loss=0.01502, audio_tagging_loss=0.01502, over 4950135.04 frames. ], batch size: 99, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:40:18,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=314360.0, ans=0.125 2023-12-22 00:40:19,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=314426.6666666667, ans=0.0 2023-12-22 00:40:30,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=314493.3333333333, ans=0.125 2023-12-22 00:40:36,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=314493.3333333333, ans=0.0 2023-12-22 00:40:42,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=314560.0, ans=10.0 2023-12-22 00:40:51,773 INFO [train.py:886] (1/4) Epoch 10, batch 4300, loss[loss=0.01632, audio_tagging_loss=0.01632, over 25000.00 frames. ], tot_loss[loss=0.01499, audio_tagging_loss=0.01499, over 4950618.80 frames. ], batch size: 100, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:41:08,902 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.75 vs. limit=15.0 2023-12-22 00:41:19,556 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.300e+01 2.660e+01 2.835e+01 2.994e+01 3.565e+01, threshold=5.669e+01, percent-clipped=0.0 2023-12-22 00:41:36,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=314893.3333333333, ans=0.1 2023-12-22 00:41:37,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=314893.3333333333, ans=15.0 2023-12-22 00:41:39,438 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.47 vs. limit=22.5 2023-12-22 00:41:43,498 INFO [train.py:886] (1/4) Epoch 10, batch 4350, loss[loss=0.01568, audio_tagging_loss=0.01568, over 21891.00 frames. ], tot_loss[loss=0.01498, audio_tagging_loss=0.01498, over 4951340.04 frames. ], batch size: 107, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:41:52,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=315026.6666666667, ans=0.125 2023-12-22 00:41:53,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=315026.6666666667, ans=0.0 2023-12-22 00:41:58,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.80 vs. limit=15.0 2023-12-22 00:42:01,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=315026.6666666667, ans=0.125 2023-12-22 00:42:22,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=315160.0, ans=0.2 2023-12-22 00:42:26,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=315226.6666666667, ans=0.125 2023-12-22 00:42:35,506 INFO [train.py:886] (1/4) Epoch 10, batch 4400, loss[loss=0.01423, audio_tagging_loss=0.01423, over 24750.00 frames. ], tot_loss[loss=0.0151, audio_tagging_loss=0.0151, over 4943691.33 frames. ], batch size: 99, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:42:44,380 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.48 vs. limit=15.0 2023-12-22 00:42:51,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=315360.0, ans=0.0 2023-12-22 00:43:04,209 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.249e+01 2.662e+01 2.809e+01 2.979e+01 4.012e+01, threshold=5.619e+01, percent-clipped=0.0 2023-12-22 00:43:12,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=315493.3333333333, ans=0.025 2023-12-22 00:43:13,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=315493.3333333333, ans=0.125 2023-12-22 00:43:27,638 INFO [train.py:886] (1/4) Epoch 10, batch 4450, loss[loss=0.01605, audio_tagging_loss=0.01605, over 22565.00 frames. ], tot_loss[loss=0.0152, audio_tagging_loss=0.0152, over 4940853.88 frames. ], batch size: 107, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:43:51,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=315760.0, ans=0.1 2023-12-22 00:43:51,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=315760.0, ans=0.1 2023-12-22 00:44:02,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=315826.6666666667, ans=0.5 2023-12-22 00:44:02,906 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.24 vs. limit=15.0 2023-12-22 00:44:19,737 INFO [train.py:886] (1/4) Epoch 10, batch 4500, loss[loss=0.01452, audio_tagging_loss=0.01452, over 24750.00 frames. ], tot_loss[loss=0.01506, audio_tagging_loss=0.01506, over 4944678.81 frames. ], batch size: 99, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:44:26,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=315960.0, ans=0.0 2023-12-22 00:44:29,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=316026.6666666667, ans=0.2 2023-12-22 00:44:31,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=316026.6666666667, ans=0.125 2023-12-22 00:44:33,250 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.16 vs. limit=15.0 2023-12-22 00:44:47,478 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.380e+01 2.672e+01 2.824e+01 2.974e+01 3.593e+01, threshold=5.647e+01, percent-clipped=0.0 2023-12-22 00:44:58,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=316160.0, ans=0.0 2023-12-22 00:45:03,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.89 vs. limit=15.0 2023-12-22 00:45:12,140 INFO [train.py:886] (1/4) Epoch 10, batch 4550, loss[loss=0.01381, audio_tagging_loss=0.01381, over 24750.00 frames. ], tot_loss[loss=0.01508, audio_tagging_loss=0.01508, over 4942219.10 frames. ], batch size: 99, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:45:26,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=316360.0, ans=0.035 2023-12-22 00:45:31,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=316426.6666666667, ans=0.1 2023-12-22 00:45:32,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=316426.6666666667, ans=0.0 2023-12-22 00:45:49,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=316493.3333333333, ans=0.0 2023-12-22 00:46:04,006 INFO [train.py:886] (1/4) Epoch 10, batch 4600, loss[loss=0.01374, audio_tagging_loss=0.01374, over 25000.00 frames. ], tot_loss[loss=0.01496, audio_tagging_loss=0.01496, over 4950252.27 frames. ], batch size: 100, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:46:30,858 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.288e+01 2.537e+01 2.757e+01 2.967e+01 3.317e+01, threshold=5.514e+01, percent-clipped=0.0 2023-12-22 00:46:35,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=316826.6666666667, ans=0.125 2023-12-22 00:46:43,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=316826.6666666667, ans=0.0 2023-12-22 00:46:46,071 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.74 vs. limit=6.0 2023-12-22 00:46:54,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=316893.3333333333, ans=0.1 2023-12-22 00:46:55,679 INFO [train.py:886] (1/4) Epoch 10, batch 4650, loss[loss=0.0139, audio_tagging_loss=0.0139, over 24128.00 frames. ], tot_loss[loss=0.01512, audio_tagging_loss=0.01512, over 4949678.10 frames. ], batch size: 100, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:46:56,148 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.61 vs. limit=6.0 2023-12-22 00:46:57,791 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 00:46:59,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=316960.0, ans=0.2 2023-12-22 00:47:11,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=317026.6666666667, ans=0.125 2023-12-22 00:47:23,573 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.43 vs. limit=22.5 2023-12-22 00:47:26,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=317160.0, ans=0.125 2023-12-22 00:47:28,148 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.23 vs. limit=15.0 2023-12-22 00:47:38,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=317226.6666666667, ans=0.125 2023-12-22 00:47:46,031 INFO [train.py:886] (1/4) Epoch 10, batch 4700, loss[loss=0.0143, audio_tagging_loss=0.0143, over 24750.00 frames. ], tot_loss[loss=0.01518, audio_tagging_loss=0.01518, over 4939665.73 frames. ], batch size: 99, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:47:48,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=317293.3333333333, ans=0.07 2023-12-22 00:48:07,233 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.62 vs. limit=15.0 2023-12-22 00:48:07,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=317426.6666666667, ans=0.0 2023-12-22 00:48:09,460 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 00:48:12,609 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.394e+01 2.720e+01 2.864e+01 3.016e+01 3.730e+01, threshold=5.728e+01, percent-clipped=0.0 2023-12-22 00:48:14,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=317493.3333333333, ans=0.125 2023-12-22 00:48:17,570 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.59 vs. limit=15.0 2023-12-22 00:48:28,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=317560.0, ans=0.0 2023-12-22 00:48:31,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=317560.0, ans=0.2 2023-12-22 00:48:32,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=317626.6666666667, ans=0.0 2023-12-22 00:48:33,410 INFO [train.py:886] (1/4) Epoch 10, batch 4750, loss[loss=0.01483, audio_tagging_loss=0.01483, over 24750.00 frames. ], tot_loss[loss=0.01527, audio_tagging_loss=0.01527, over 4942929.20 frames. ], batch size: 99, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:48:39,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=317626.6666666667, ans=0.125 2023-12-22 00:48:40,777 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.508e-03 2023-12-22 00:49:10,271 INFO [train.py:886] (1/4) Epoch 11, batch 0, loss[loss=0.04151, audio_tagging_loss=0.04151, over 21257.00 frames. ], tot_loss[loss=0.04151, audio_tagging_loss=0.04151, over 21257.00 frames. ], batch size: 107, lr: 1.02e-02, grad_scale: 64.0 2023-12-22 00:49:10,272 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 00:49:23,796 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.4002, 3.0115, 2.9171, 3.0870, 3.7029, 3.4641, 3.8142, 2.3360], device='cuda:1') 2023-12-22 00:49:30,817 INFO [train.py:917] (1/4) Epoch 11, validation: loss=0.03405, audio_tagging_loss=0.03405, over 3737520.00 frames. 2023-12-22 00:49:30,817 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 00:49:34,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=317733.3333333333, ans=0.125 2023-12-22 00:49:51,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=317866.6666666667, ans=0.0 2023-12-22 00:50:04,127 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.00 vs. limit=15.0 2023-12-22 00:50:07,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.out_whiten.whitening_limit, batch_count=317933.3333333333, ans=8.0 2023-12-22 00:50:21,203 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=9.480e-01 2023-12-22 00:50:22,921 INFO [train.py:886] (1/4) Epoch 11, batch 50, loss[loss=0.01832, audio_tagging_loss=0.01832, over 23985.00 frames. ], tot_loss[loss=0.02378, audio_tagging_loss=0.02378, over 1114023.00 frames. ], batch size: 100, lr: 1.02e-02, grad_scale: 64.0 2023-12-22 00:50:34,002 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.550e+01 2.913e+01 3.271e+01 4.041e+01 1.011e+02, threshold=6.542e+01, percent-clipped=6.0 2023-12-22 00:50:41,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=318133.3333333333, ans=0.125 2023-12-22 00:50:54,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=318266.6666666667, ans=0.125 2023-12-22 00:51:06,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=318333.3333333333, ans=0.1 2023-12-22 00:51:13,802 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.04 vs. limit=15.0 2023-12-22 00:51:14,410 INFO [train.py:886] (1/4) Epoch 11, batch 100, loss[loss=0.01889, audio_tagging_loss=0.01889, over 25000.00 frames. ], tot_loss[loss=0.0208, audio_tagging_loss=0.0208, over 1965151.18 frames. ], batch size: 100, lr: 1.02e-02, grad_scale: 64.0 2023-12-22 00:51:44,577 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.76 vs. limit=22.5 2023-12-22 00:52:06,789 INFO [train.py:886] (1/4) Epoch 11, batch 150, loss[loss=0.01517, audio_tagging_loss=0.01517, over 25000.00 frames. ], tot_loss[loss=0.01897, audio_tagging_loss=0.01897, over 2626850.63 frames. ], batch size: 100, lr: 1.02e-02, grad_scale: 64.0 2023-12-22 00:52:09,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=318733.3333333333, ans=0.125 2023-12-22 00:52:17,110 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.395e+01 2.830e+01 2.997e+01 3.217e+01 3.667e+01, threshold=5.994e+01, percent-clipped=0.0 2023-12-22 00:52:25,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=318800.0, ans=0.125 2023-12-22 00:52:26,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.03 vs. limit=15.0 2023-12-22 00:52:44,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=318933.3333333333, ans=0.2 2023-12-22 00:52:46,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=319000.0, ans=0.125 2023-12-22 00:52:47,729 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.75 vs. limit=6.0 2023-12-22 00:52:58,315 INFO [train.py:886] (1/4) Epoch 11, batch 200, loss[loss=0.01918, audio_tagging_loss=0.01918, over 25000.00 frames. ], tot_loss[loss=0.01772, audio_tagging_loss=0.01772, over 3147845.55 frames. ], batch size: 100, lr: 1.02e-02, grad_scale: 64.0 2023-12-22 00:52:58,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=319066.6666666667, ans=0.1 2023-12-22 00:53:20,357 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.39 vs. limit=15.0 2023-12-22 00:53:26,499 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2023-12-22 00:53:30,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=319266.6666666667, ans=0.125 2023-12-22 00:53:31,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=319266.6666666667, ans=0.125 2023-12-22 00:53:32,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=319266.6666666667, ans=0.125 2023-12-22 00:53:37,116 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 00:53:44,700 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.17 vs. limit=6.0 2023-12-22 00:53:48,507 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.62 vs. limit=12.0 2023-12-22 00:53:50,009 INFO [train.py:886] (1/4) Epoch 11, batch 250, loss[loss=0.01476, audio_tagging_loss=0.01476, over 25000.00 frames. ], tot_loss[loss=0.01696, audio_tagging_loss=0.01696, over 3555177.59 frames. ], batch size: 100, lr: 1.02e-02, grad_scale: 64.0 2023-12-22 00:54:01,136 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.316e+01 2.668e+01 2.780e+01 2.958e+01 3.295e+01, threshold=5.560e+01, percent-clipped=0.0 2023-12-22 00:54:12,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=319533.3333333333, ans=0.2 2023-12-22 00:54:14,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=319533.3333333333, ans=0.025 2023-12-22 00:54:21,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=319600.0, ans=0.125 2023-12-22 00:54:24,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=319600.0, ans=0.1 2023-12-22 00:54:25,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=319600.0, ans=0.125 2023-12-22 00:54:42,156 INFO [train.py:886] (1/4) Epoch 11, batch 300, loss[loss=0.01312, audio_tagging_loss=0.01312, over 24750.00 frames. ], tot_loss[loss=0.01645, audio_tagging_loss=0.01645, over 3860652.79 frames. ], batch size: 99, lr: 1.02e-02, grad_scale: 64.0 2023-12-22 00:54:44,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=319733.3333333333, ans=0.125 2023-12-22 00:55:03,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=319866.6666666667, ans=0.125 2023-12-22 00:55:08,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=319866.6666666667, ans=0.125 2023-12-22 00:55:09,126 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.35 vs. limit=22.5 2023-12-22 00:55:32,166 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.97 vs. limit=22.5 2023-12-22 00:55:36,166 INFO [train.py:886] (1/4) Epoch 11, batch 350, loss[loss=0.01402, audio_tagging_loss=0.01402, over 24750.00 frames. ], tot_loss[loss=0.01607, audio_tagging_loss=0.01607, over 4097570.54 frames. ], batch size: 99, lr: 1.02e-02, grad_scale: 64.0 2023-12-22 00:55:43,914 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.68 vs. limit=15.0 2023-12-22 00:55:48,027 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.298e+01 2.599e+01 2.795e+01 2.968e+01 3.574e+01, threshold=5.590e+01, percent-clipped=0.0 2023-12-22 00:55:55,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=320133.3333333333, ans=0.125 2023-12-22 00:56:09,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=320266.6666666667, ans=0.125 2023-12-22 00:56:17,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=320333.3333333333, ans=0.125 2023-12-22 00:56:22,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=320333.3333333333, ans=0.125 2023-12-22 00:56:26,186 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.01 vs. limit=15.0 2023-12-22 00:56:28,445 INFO [train.py:886] (1/4) Epoch 11, batch 400, loss[loss=0.01678, audio_tagging_loss=0.01678, over 24750.00 frames. ], tot_loss[loss=0.01579, audio_tagging_loss=0.01579, over 4286878.54 frames. ], batch size: 99, lr: 1.02e-02, grad_scale: 64.0 2023-12-22 00:56:29,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=320400.0, ans=0.1 2023-12-22 00:56:32,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=320400.0, ans=0.125 2023-12-22 00:56:38,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=320466.6666666667, ans=0.0 2023-12-22 00:56:46,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=320466.6666666667, ans=0.07 2023-12-22 00:56:53,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=320533.3333333333, ans=0.1 2023-12-22 00:56:54,602 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 00:57:01,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=320600.0, ans=0.0 2023-12-22 00:57:03,183 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.77 vs. limit=15.0 2023-12-22 00:57:14,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=320666.6666666667, ans=0.125 2023-12-22 00:57:19,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=320733.3333333333, ans=0.0 2023-12-22 00:57:20,461 INFO [train.py:886] (1/4) Epoch 11, batch 450, loss[loss=0.01619, audio_tagging_loss=0.01619, over 25000.00 frames. ], tot_loss[loss=0.01554, audio_tagging_loss=0.01554, over 4439451.71 frames. ], batch size: 100, lr: 1.02e-02, grad_scale: 64.0 2023-12-22 00:57:32,229 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.268e+01 2.613e+01 2.762e+01 2.922e+01 3.563e+01, threshold=5.524e+01, percent-clipped=0.0 2023-12-22 00:57:38,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=320800.0, ans=0.2 2023-12-22 00:57:40,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=320866.6666666667, ans=0.0 2023-12-22 00:57:50,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=320933.3333333333, ans=0.125 2023-12-22 00:58:02,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=321000.0, ans=0.125 2023-12-22 00:58:12,056 INFO [train.py:886] (1/4) Epoch 11, batch 500, loss[loss=0.01481, audio_tagging_loss=0.01481, over 25000.00 frames. ], tot_loss[loss=0.01537, audio_tagging_loss=0.01537, over 4556089.50 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 00:58:21,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=321133.3333333333, ans=0.07 2023-12-22 00:58:29,972 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.72 vs. limit=22.5 2023-12-22 00:58:39,206 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.66 vs. limit=22.5 2023-12-22 00:58:54,168 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.08 vs. limit=6.0 2023-12-22 00:59:04,101 INFO [train.py:886] (1/4) Epoch 11, batch 550, loss[loss=0.01492, audio_tagging_loss=0.01492, over 25000.00 frames. ], tot_loss[loss=0.01535, audio_tagging_loss=0.01535, over 4648542.29 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 00:59:09,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=321400.0, ans=0.125 2023-12-22 00:59:15,292 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.371e+01 2.605e+01 2.796e+01 2.937e+01 3.436e+01, threshold=5.591e+01, percent-clipped=0.0 2023-12-22 00:59:15,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=321466.6666666667, ans=0.95 2023-12-22 00:59:17,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=321466.6666666667, ans=12.0 2023-12-22 00:59:18,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=321466.6666666667, ans=0.05 2023-12-22 00:59:24,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.25 vs. limit=15.0 2023-12-22 00:59:33,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=321533.3333333333, ans=0.1 2023-12-22 00:59:36,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=321600.0, ans=0.125 2023-12-22 00:59:36,774 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.38 vs. limit=22.5 2023-12-22 00:59:45,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=321666.6666666667, ans=0.125 2023-12-22 00:59:48,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=321666.6666666667, ans=0.0 2023-12-22 00:59:51,491 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.48 vs. limit=22.5 2023-12-22 00:59:51,625 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.58 vs. limit=15.0 2023-12-22 00:59:53,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=321666.6666666667, ans=0.125 2023-12-22 00:59:55,404 INFO [train.py:886] (1/4) Epoch 11, batch 600, loss[loss=0.0129, audio_tagging_loss=0.0129, over 24750.00 frames. ], tot_loss[loss=0.01539, audio_tagging_loss=0.01539, over 4714775.14 frames. ], batch size: 99, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:00:14,934 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.45 vs. limit=15.0 2023-12-22 01:00:23,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=321866.6666666667, ans=0.0 2023-12-22 01:00:37,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=322000.0, ans=0.125 2023-12-22 01:00:39,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=322000.0, ans=0.1 2023-12-22 01:00:42,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=322000.0, ans=0.125 2023-12-22 01:00:47,584 INFO [train.py:886] (1/4) Epoch 11, batch 650, loss[loss=0.01322, audio_tagging_loss=0.01322, over 24750.00 frames. ], tot_loss[loss=0.01542, audio_tagging_loss=0.01542, over 4763247.19 frames. ], batch size: 99, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:00:58,780 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.252e+01 2.635e+01 2.798e+01 2.937e+01 3.276e+01, threshold=5.597e+01, percent-clipped=0.0 2023-12-22 01:01:08,392 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.68 vs. limit=15.0 2023-12-22 01:01:09,935 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.71 vs. limit=10.0 2023-12-22 01:01:16,632 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.54 vs. limit=22.5 2023-12-22 01:01:18,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=322266.6666666667, ans=0.1 2023-12-22 01:01:37,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=322333.3333333333, ans=0.125 2023-12-22 01:01:39,232 INFO [train.py:886] (1/4) Epoch 11, batch 700, loss[loss=0.01802, audio_tagging_loss=0.01802, over 24750.00 frames. ], tot_loss[loss=0.01531, audio_tagging_loss=0.01531, over 4804421.97 frames. ], batch size: 99, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:01:50,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=322466.6666666667, ans=0.125 2023-12-22 01:02:10,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=322600.0, ans=0.95 2023-12-22 01:02:11,354 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.31 vs. limit=15.0 2023-12-22 01:02:13,777 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.518e-03 2023-12-22 01:02:14,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=322600.0, ans=0.0 2023-12-22 01:02:20,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=322666.6666666667, ans=0.2 2023-12-22 01:02:22,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=322666.6666666667, ans=0.2 2023-12-22 01:02:31,627 INFO [train.py:886] (1/4) Epoch 11, batch 750, loss[loss=0.01699, audio_tagging_loss=0.01699, over 25000.00 frames. ], tot_loss[loss=0.01528, audio_tagging_loss=0.01528, over 4839365.58 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:02:33,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=322733.3333333333, ans=0.0 2023-12-22 01:02:44,362 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.310e+01 2.637e+01 2.770e+01 2.926e+01 3.754e+01, threshold=5.541e+01, percent-clipped=0.0 2023-12-22 01:02:47,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=322800.0, ans=0.2 2023-12-22 01:03:24,103 INFO [train.py:886] (1/4) Epoch 11, batch 800, loss[loss=0.01208, audio_tagging_loss=0.01208, over 25000.00 frames. ], tot_loss[loss=0.0151, audio_tagging_loss=0.0151, over 4868912.99 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:03:24,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=323066.6666666667, ans=0.125 2023-12-22 01:03:27,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=323066.6666666667, ans=0.125 2023-12-22 01:03:32,365 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.24 vs. limit=12.0 2023-12-22 01:03:38,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=323133.3333333333, ans=0.125 2023-12-22 01:03:41,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=323133.3333333333, ans=0.125 2023-12-22 01:03:45,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=323200.0, ans=0.125 2023-12-22 01:03:47,935 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.04 vs. limit=15.0 2023-12-22 01:03:57,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=323266.6666666667, ans=0.125 2023-12-22 01:03:57,816 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=7.591e-03 2023-12-22 01:04:15,545 INFO [train.py:886] (1/4) Epoch 11, batch 850, loss[loss=0.01525, audio_tagging_loss=0.01525, over 25000.00 frames. ], tot_loss[loss=0.01507, audio_tagging_loss=0.01507, over 4893046.55 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:04:21,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=323400.0, ans=0.125 2023-12-22 01:04:28,274 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.323e+01 2.659e+01 2.776e+01 2.937e+01 3.524e+01, threshold=5.552e+01, percent-clipped=0.0 2023-12-22 01:05:01,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=323666.6666666667, ans=0.2 2023-12-22 01:05:06,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=323666.6666666667, ans=0.125 2023-12-22 01:05:07,994 INFO [train.py:886] (1/4) Epoch 11, batch 900, loss[loss=0.01379, audio_tagging_loss=0.01379, over 24750.00 frames. ], tot_loss[loss=0.0151, audio_tagging_loss=0.0151, over 4907969.59 frames. ], batch size: 99, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:05:10,410 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.96 vs. limit=22.5 2023-12-22 01:05:14,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=323733.3333333333, ans=0.0 2023-12-22 01:05:20,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=323800.0, ans=0.05 2023-12-22 01:05:32,656 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.35 vs. limit=15.0 2023-12-22 01:05:41,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=323933.3333333333, ans=0.07 2023-12-22 01:06:00,103 INFO [train.py:886] (1/4) Epoch 11, batch 950, loss[loss=0.01645, audio_tagging_loss=0.01645, over 24750.00 frames. ], tot_loss[loss=0.01514, audio_tagging_loss=0.01514, over 4908732.76 frames. ], batch size: 99, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:06:01,590 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.70 vs. limit=15.0 2023-12-22 01:06:12,722 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.426e+01 2.727e+01 2.870e+01 3.011e+01 3.522e+01, threshold=5.740e+01, percent-clipped=0.0 2023-12-22 01:06:13,785 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 01:06:15,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=324133.3333333333, ans=0.2 2023-12-22 01:06:16,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=324133.3333333333, ans=0.0 2023-12-22 01:06:18,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=324133.3333333333, ans=0.1 2023-12-22 01:06:21,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.whiten.whitening_limit, batch_count=324200.0, ans=15.0 2023-12-22 01:06:22,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.37 vs. limit=15.0 2023-12-22 01:06:45,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=324333.3333333333, ans=0.125 2023-12-22 01:06:50,214 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.62 vs. limit=12.0 2023-12-22 01:06:51,532 INFO [train.py:886] (1/4) Epoch 11, batch 1000, loss[loss=0.01541, audio_tagging_loss=0.01541, over 24750.00 frames. ], tot_loss[loss=0.01514, audio_tagging_loss=0.01514, over 4912688.00 frames. ], batch size: 99, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:07:19,060 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.63 vs. limit=12.0 2023-12-22 01:07:41,992 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.78 vs. limit=15.0 2023-12-22 01:07:44,403 INFO [train.py:886] (1/4) Epoch 11, batch 1050, loss[loss=0.01326, audio_tagging_loss=0.01326, over 25000.00 frames. ], tot_loss[loss=0.01498, audio_tagging_loss=0.01498, over 4922215.57 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:07:49,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=324733.3333333333, ans=0.0 2023-12-22 01:07:53,288 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.02 vs. limit=15.0 2023-12-22 01:07:56,581 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.315e+01 2.626e+01 2.739e+01 2.887e+01 3.384e+01, threshold=5.477e+01, percent-clipped=0.0 2023-12-22 01:08:03,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=324800.0, ans=0.5 2023-12-22 01:08:05,080 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.93 vs. limit=15.0 2023-12-22 01:08:07,027 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.89 vs. limit=15.0 2023-12-22 01:08:12,592 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.79 vs. limit=12.0 2023-12-22 01:08:19,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=324933.3333333333, ans=0.1 2023-12-22 01:08:36,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=325066.6666666667, ans=0.2 2023-12-22 01:08:36,758 INFO [train.py:886] (1/4) Epoch 11, batch 1100, loss[loss=0.01215, audio_tagging_loss=0.01215, over 24750.00 frames. ], tot_loss[loss=0.015, audio_tagging_loss=0.015, over 4923419.81 frames. ], batch size: 99, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:08:44,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=325066.6666666667, ans=0.0 2023-12-22 01:08:46,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=325133.3333333333, ans=0.2 2023-12-22 01:09:13,158 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.61 vs. limit=15.0 2023-12-22 01:09:27,740 INFO [train.py:886] (1/4) Epoch 11, batch 1150, loss[loss=0.01294, audio_tagging_loss=0.01294, over 24750.00 frames. ], tot_loss[loss=0.01503, audio_tagging_loss=0.01503, over 4923048.99 frames. ], batch size: 99, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:09:31,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=325400.0, ans=0.125 2023-12-22 01:09:32,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=325400.0, ans=0.0 2023-12-22 01:09:33,572 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.74 vs. limit=15.0 2023-12-22 01:09:41,020 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.264e+01 2.635e+01 2.806e+01 2.956e+01 3.723e+01, threshold=5.612e+01, percent-clipped=0.0 2023-12-22 01:09:45,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=325466.6666666667, ans=0.0 2023-12-22 01:09:46,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=325466.6666666667, ans=0.025 2023-12-22 01:09:50,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=325533.3333333333, ans=0.1 2023-12-22 01:09:51,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=325533.3333333333, ans=0.0 2023-12-22 01:09:51,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=325533.3333333333, ans=0.2 2023-12-22 01:09:57,385 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.13 vs. limit=15.0 2023-12-22 01:10:11,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=325666.6666666667, ans=0.125 2023-12-22 01:10:19,967 INFO [train.py:886] (1/4) Epoch 11, batch 1200, loss[loss=0.01308, audio_tagging_loss=0.01308, over 25000.00 frames. ], tot_loss[loss=0.01509, audio_tagging_loss=0.01509, over 4937452.69 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:10:27,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=325733.3333333333, ans=0.0 2023-12-22 01:10:55,222 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.84 vs. limit=15.0 2023-12-22 01:10:56,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=325933.3333333333, ans=0.0 2023-12-22 01:11:12,395 INFO [train.py:886] (1/4) Epoch 11, batch 1250, loss[loss=0.01386, audio_tagging_loss=0.01386, over 24750.00 frames. ], tot_loss[loss=0.01522, audio_tagging_loss=0.01522, over 4935399.46 frames. ], batch size: 99, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:11:12,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=326066.6666666667, ans=0.0 2023-12-22 01:11:14,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=326066.6666666667, ans=0.07 2023-12-22 01:11:18,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=326066.6666666667, ans=0.125 2023-12-22 01:11:19,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=326066.6666666667, ans=0.0 2023-12-22 01:11:25,127 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 2.715e+01 2.889e+01 3.133e+01 4.404e+01, threshold=5.779e+01, percent-clipped=0.0 2023-12-22 01:11:27,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=326133.3333333333, ans=0.125 2023-12-22 01:11:35,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=326200.0, ans=0.09899494936611666 2023-12-22 01:11:45,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=326266.6666666667, ans=0.025 2023-12-22 01:11:56,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=326333.3333333333, ans=0.125 2023-12-22 01:11:58,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=326333.3333333333, ans=0.0 2023-12-22 01:12:03,788 INFO [train.py:886] (1/4) Epoch 11, batch 1300, loss[loss=0.01534, audio_tagging_loss=0.01534, over 25000.00 frames. ], tot_loss[loss=0.01523, audio_tagging_loss=0.01523, over 4929131.24 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:12:13,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=326466.6666666667, ans=0.0 2023-12-22 01:12:28,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=326533.3333333333, ans=0.125 2023-12-22 01:12:37,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=326600.0, ans=0.0 2023-12-22 01:12:41,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=326600.0, ans=0.125 2023-12-22 01:12:49,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=326666.6666666667, ans=0.0 2023-12-22 01:12:56,094 INFO [train.py:886] (1/4) Epoch 11, batch 1350, loss[loss=0.01442, audio_tagging_loss=0.01442, over 25000.00 frames. ], tot_loss[loss=0.01518, audio_tagging_loss=0.01518, over 4931865.98 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:13:07,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=326800.0, ans=0.125 2023-12-22 01:13:07,973 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+01 2.643e+01 2.800e+01 2.940e+01 3.448e+01, threshold=5.600e+01, percent-clipped=0.0 2023-12-22 01:13:22,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=326866.6666666667, ans=0.125 2023-12-22 01:13:23,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=326866.6666666667, ans=0.125 2023-12-22 01:13:25,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=326933.3333333333, ans=0.0 2023-12-22 01:13:41,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=327000.0, ans=0.0 2023-12-22 01:13:46,081 INFO [train.py:886] (1/4) Epoch 11, batch 1400, loss[loss=0.01225, audio_tagging_loss=0.01225, over 24750.00 frames. ], tot_loss[loss=0.01504, audio_tagging_loss=0.01504, over 4937044.44 frames. ], batch size: 99, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:13:46,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=327066.6666666667, ans=0.0 2023-12-22 01:13:51,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=327066.6666666667, ans=0.1 2023-12-22 01:14:16,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=327266.6666666667, ans=10.0 2023-12-22 01:14:34,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=327333.3333333333, ans=0.1 2023-12-22 01:14:35,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=327333.3333333333, ans=0.125 2023-12-22 01:14:37,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=327333.3333333333, ans=0.2 2023-12-22 01:14:38,944 INFO [train.py:886] (1/4) Epoch 11, batch 1450, loss[loss=0.01371, audio_tagging_loss=0.01371, over 25000.00 frames. ], tot_loss[loss=0.01501, audio_tagging_loss=0.01501, over 4944022.78 frames. ], batch size: 100, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:14:48,945 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.51 vs. limit=22.5 2023-12-22 01:14:50,218 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.361e+01 2.629e+01 2.758e+01 2.923e+01 4.200e+01, threshold=5.515e+01, percent-clipped=0.0 2023-12-22 01:15:00,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=327533.3333333333, ans=0.0 2023-12-22 01:15:00,477 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.35 vs. limit=15.0 2023-12-22 01:15:04,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=327533.3333333333, ans=0.1 2023-12-22 01:15:29,689 INFO [train.py:886] (1/4) Epoch 11, batch 1500, loss[loss=0.0159, audio_tagging_loss=0.0159, over 25000.00 frames. ], tot_loss[loss=0.01507, audio_tagging_loss=0.01507, over 4949759.96 frames. ], batch size: 100, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:15:51,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=327866.6666666667, ans=0.0 2023-12-22 01:16:15,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=328000.0, ans=0.2 2023-12-22 01:16:21,549 INFO [train.py:886] (1/4) Epoch 11, batch 1550, loss[loss=0.0127, audio_tagging_loss=0.0127, over 24750.00 frames. ], tot_loss[loss=0.0152, audio_tagging_loss=0.0152, over 4949791.39 frames. ], batch size: 99, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:16:33,808 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.382e+01 2.672e+01 2.837e+01 3.019e+01 3.569e+01, threshold=5.673e+01, percent-clipped=0.0 2023-12-22 01:16:46,832 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.15 vs. limit=15.0 2023-12-22 01:16:51,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=328200.0, ans=0.125 2023-12-22 01:16:51,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=328266.6666666667, ans=0.125 2023-12-22 01:16:53,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=328266.6666666667, ans=0.05 2023-12-22 01:16:58,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=328266.6666666667, ans=0.125 2023-12-22 01:17:14,545 INFO [train.py:886] (1/4) Epoch 11, batch 1600, loss[loss=0.01386, audio_tagging_loss=0.01386, over 24750.00 frames. ], tot_loss[loss=0.01518, audio_tagging_loss=0.01518, over 4946659.33 frames. ], batch size: 99, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:17:16,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=328400.0, ans=0.125 2023-12-22 01:17:31,792 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2023-12-22 01:17:35,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=328533.3333333333, ans=0.5 2023-12-22 01:17:41,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=328533.3333333333, ans=0.1 2023-12-22 01:17:45,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=328600.0, ans=0.0 2023-12-22 01:18:05,214 INFO [train.py:886] (1/4) Epoch 11, batch 1650, loss[loss=0.01624, audio_tagging_loss=0.01624, over 25000.00 frames. ], tot_loss[loss=0.0152, audio_tagging_loss=0.0152, over 4941626.35 frames. ], batch size: 100, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:18:07,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=328733.3333333333, ans=0.2 2023-12-22 01:18:18,585 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.282e+01 2.598e+01 2.768e+01 2.940e+01 3.602e+01, threshold=5.536e+01, percent-clipped=0.0 2023-12-22 01:18:46,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.80 vs. limit=15.0 2023-12-22 01:18:50,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=329000.0, ans=0.5 2023-12-22 01:18:53,061 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.26 vs. limit=15.0 2023-12-22 01:18:57,182 INFO [train.py:886] (1/4) Epoch 11, batch 1700, loss[loss=0.01404, audio_tagging_loss=0.01404, over 24269.00 frames. ], tot_loss[loss=0.01507, audio_tagging_loss=0.01507, over 4939370.74 frames. ], batch size: 101, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:19:00,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=329066.6666666667, ans=0.2 2023-12-22 01:19:10,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=329133.3333333333, ans=0.0 2023-12-22 01:19:25,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=329200.0, ans=0.1 2023-12-22 01:19:37,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=329333.3333333333, ans=0.1 2023-12-22 01:19:49,227 INFO [train.py:886] (1/4) Epoch 11, batch 1750, loss[loss=0.01392, audio_tagging_loss=0.01392, over 25000.00 frames. ], tot_loss[loss=0.015, audio_tagging_loss=0.015, over 4941697.41 frames. ], batch size: 100, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:20:01,332 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.366e+01 2.668e+01 2.806e+01 3.006e+01 3.774e+01, threshold=5.613e+01, percent-clipped=0.0 2023-12-22 01:20:16,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=329533.3333333333, ans=0.0 2023-12-22 01:20:20,478 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.80 vs. limit=22.5 2023-12-22 01:20:23,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=329600.0, ans=0.0 2023-12-22 01:20:35,339 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.32 vs. limit=15.0 2023-12-22 01:20:36,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=329666.6666666667, ans=0.1 2023-12-22 01:20:39,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=329733.3333333333, ans=0.0 2023-12-22 01:20:40,641 INFO [train.py:886] (1/4) Epoch 11, batch 1800, loss[loss=0.01591, audio_tagging_loss=0.01591, over 25000.00 frames. ], tot_loss[loss=0.01497, audio_tagging_loss=0.01497, over 4949464.00 frames. ], batch size: 100, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:20:58,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=329800.0, ans=0.0 2023-12-22 01:20:58,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=329800.0, ans=0.125 2023-12-22 01:21:27,005 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 01:21:28,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=330000.0, ans=0.125 2023-12-22 01:21:32,719 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.79 vs. limit=22.5 2023-12-22 01:21:33,115 INFO [train.py:886] (1/4) Epoch 11, batch 1850, loss[loss=0.01663, audio_tagging_loss=0.01663, over 24750.00 frames. ], tot_loss[loss=0.01493, audio_tagging_loss=0.01493, over 4952625.24 frames. ], batch size: 99, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:21:35,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=330066.6666666667, ans=0.2 2023-12-22 01:21:45,285 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.362e+01 2.664e+01 2.766e+01 2.942e+01 3.434e+01, threshold=5.531e+01, percent-clipped=0.0 2023-12-22 01:21:48,851 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.58 vs. limit=15.0 2023-12-22 01:21:54,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=330200.0, ans=0.0 2023-12-22 01:21:56,673 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.17 vs. limit=22.5 2023-12-22 01:22:09,577 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.65 vs. limit=6.0 2023-12-22 01:22:13,123 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.72 vs. limit=6.0 2023-12-22 01:22:17,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=330333.3333333333, ans=0.125 2023-12-22 01:22:19,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=330333.3333333333, ans=0.125 2023-12-22 01:22:20,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=330333.3333333333, ans=0.0 2023-12-22 01:22:20,664 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.87 vs. limit=10.0 2023-12-22 01:22:21,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=330333.3333333333, ans=0.125 2023-12-22 01:22:24,768 INFO [train.py:886] (1/4) Epoch 11, batch 1900, loss[loss=0.0141, audio_tagging_loss=0.0141, over 24050.00 frames. ], tot_loss[loss=0.0151, audio_tagging_loss=0.0151, over 4947365.29 frames. ], batch size: 100, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:22:42,504 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.54 vs. limit=15.0 2023-12-22 01:22:51,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.71 vs. limit=10.0 2023-12-22 01:23:04,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=330600.0, ans=0.1 2023-12-22 01:23:15,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=330666.6666666667, ans=0.125 2023-12-22 01:23:16,942 INFO [train.py:886] (1/4) Epoch 11, batch 1950, loss[loss=0.01479, audio_tagging_loss=0.01479, over 25000.00 frames. ], tot_loss[loss=0.01508, audio_tagging_loss=0.01508, over 4946998.38 frames. ], batch size: 100, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:23:29,046 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.379e+01 2.676e+01 2.806e+01 2.987e+01 3.356e+01, threshold=5.613e+01, percent-clipped=0.0 2023-12-22 01:23:34,189 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.98 vs. limit=15.0 2023-12-22 01:24:04,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=331000.0, ans=0.125 2023-12-22 01:24:05,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=331000.0, ans=0.0 2023-12-22 01:24:09,400 INFO [train.py:886] (1/4) Epoch 11, batch 2000, loss[loss=0.01385, audio_tagging_loss=0.01385, over 25000.00 frames. ], tot_loss[loss=0.015, audio_tagging_loss=0.015, over 4946098.58 frames. ], batch size: 100, lr: 9.99e-03, grad_scale: 64.0 2023-12-22 01:24:18,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=331133.3333333333, ans=0.0 2023-12-22 01:24:48,741 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.75 vs. limit=22.5 2023-12-22 01:25:00,498 INFO [train.py:886] (1/4) Epoch 11, batch 2050, loss[loss=0.01391, audio_tagging_loss=0.01391, over 25000.00 frames. ], tot_loss[loss=0.01495, audio_tagging_loss=0.01495, over 4947450.37 frames. ], batch size: 100, lr: 9.99e-03, grad_scale: 64.0 2023-12-22 01:25:07,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=331400.0, ans=0.04949747468305833 2023-12-22 01:25:10,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=331466.6666666667, ans=0.0 2023-12-22 01:25:13,297 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.247e+01 2.569e+01 2.759e+01 2.903e+01 3.847e+01, threshold=5.517e+01, percent-clipped=0.0 2023-12-22 01:25:27,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=331533.3333333333, ans=0.125 2023-12-22 01:25:53,375 INFO [train.py:886] (1/4) Epoch 11, batch 2100, loss[loss=0.0116, audio_tagging_loss=0.0116, over 21056.00 frames. ], tot_loss[loss=0.01488, audio_tagging_loss=0.01488, over 4950248.40 frames. ], batch size: 107, lr: 9.98e-03, grad_scale: 64.0 2023-12-22 01:25:56,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=331733.3333333333, ans=0.1 2023-12-22 01:25:58,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=331733.3333333333, ans=0.125 2023-12-22 01:26:01,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=331733.3333333333, ans=0.0 2023-12-22 01:26:08,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=331800.0, ans=0.125 2023-12-22 01:26:11,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=331800.0, ans=0.0 2023-12-22 01:26:15,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=331866.6666666667, ans=0.125 2023-12-22 01:26:15,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=331866.6666666667, ans=0.125 2023-12-22 01:26:16,252 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.02 vs. limit=15.0 2023-12-22 01:26:37,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=332000.0, ans=0.2 2023-12-22 01:26:45,352 INFO [train.py:886] (1/4) Epoch 11, batch 2150, loss[loss=0.01437, audio_tagging_loss=0.01437, over 25000.00 frames. ], tot_loss[loss=0.01492, audio_tagging_loss=0.01492, over 4956067.45 frames. ], batch size: 100, lr: 9.98e-03, grad_scale: 64.0 2023-12-22 01:26:50,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=332066.6666666667, ans=0.0 2023-12-22 01:26:58,039 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.407e+01 2.658e+01 2.791e+01 2.942e+01 3.654e+01, threshold=5.583e+01, percent-clipped=0.0 2023-12-22 01:27:05,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=332200.0, ans=0.1 2023-12-22 01:27:06,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=332200.0, ans=0.0 2023-12-22 01:27:06,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=332200.0, ans=0.125 2023-12-22 01:27:07,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=332200.0, ans=0.125 2023-12-22 01:27:18,898 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.06 vs. limit=15.0 2023-12-22 01:27:22,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=332266.6666666667, ans=0.0 2023-12-22 01:27:26,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=332333.3333333333, ans=0.0 2023-12-22 01:27:31,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=332333.3333333333, ans=0.125 2023-12-22 01:27:34,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=332333.3333333333, ans=0.125 2023-12-22 01:27:36,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=332400.0, ans=0.125 2023-12-22 01:27:37,468 INFO [train.py:886] (1/4) Epoch 11, batch 2200, loss[loss=0.01622, audio_tagging_loss=0.01622, over 24750.00 frames. ], tot_loss[loss=0.01508, audio_tagging_loss=0.01508, over 4950819.71 frames. ], batch size: 99, lr: 9.98e-03, grad_scale: 64.0 2023-12-22 01:27:37,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=332400.0, ans=0.0 2023-12-22 01:27:37,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=332400.0, ans=0.125 2023-12-22 01:27:53,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=332466.6666666667, ans=0.125 2023-12-22 01:27:53,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=332466.6666666667, ans=0.125 2023-12-22 01:28:00,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=332533.3333333333, ans=0.125 2023-12-22 01:28:03,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=332533.3333333333, ans=0.1 2023-12-22 01:28:25,395 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.87 vs. limit=22.5 2023-12-22 01:28:29,521 INFO [train.py:886] (1/4) Epoch 11, batch 2250, loss[loss=0.01533, audio_tagging_loss=0.01533, over 24750.00 frames. ], tot_loss[loss=0.01512, audio_tagging_loss=0.01512, over 4946848.59 frames. ], batch size: 99, lr: 9.97e-03, grad_scale: 64.0 2023-12-22 01:28:41,538 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.301e+01 2.628e+01 2.793e+01 2.962e+01 3.343e+01, threshold=5.585e+01, percent-clipped=0.0 2023-12-22 01:28:49,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=332866.6666666667, ans=0.125 2023-12-22 01:28:58,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=332866.6666666667, ans=0.1 2023-12-22 01:29:07,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=332933.3333333333, ans=0.125 2023-12-22 01:29:09,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=333000.0, ans=0.125 2023-12-22 01:29:15,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=333000.0, ans=0.0 2023-12-22 01:29:17,453 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.81 vs. limit=5.0 2023-12-22 01:29:21,146 INFO [train.py:886] (1/4) Epoch 11, batch 2300, loss[loss=0.01311, audio_tagging_loss=0.01311, over 24750.00 frames. ], tot_loss[loss=0.01511, audio_tagging_loss=0.01511, over 4945215.25 frames. ], batch size: 99, lr: 9.97e-03, grad_scale: 64.0 2023-12-22 01:30:12,976 INFO [train.py:886] (1/4) Epoch 11, batch 2350, loss[loss=0.01388, audio_tagging_loss=0.01388, over 25000.00 frames. ], tot_loss[loss=0.01499, audio_tagging_loss=0.01499, over 4950002.91 frames. ], batch size: 100, lr: 9.96e-03, grad_scale: 64.0 2023-12-22 01:30:24,504 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.71 vs. limit=12.0 2023-12-22 01:30:25,750 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.294e+01 2.666e+01 2.800e+01 2.976e+01 3.525e+01, threshold=5.600e+01, percent-clipped=0.0 2023-12-22 01:30:26,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=333466.6666666667, ans=0.125 2023-12-22 01:30:32,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=333466.6666666667, ans=0.1 2023-12-22 01:31:05,353 INFO [train.py:886] (1/4) Epoch 11, batch 2400, loss[loss=0.01577, audio_tagging_loss=0.01577, over 25000.00 frames. ], tot_loss[loss=0.01493, audio_tagging_loss=0.01493, over 4954724.16 frames. ], batch size: 100, lr: 9.96e-03, grad_scale: 64.0 2023-12-22 01:31:13,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=333733.3333333333, ans=0.035 2023-12-22 01:31:21,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=333800.0, ans=0.2 2023-12-22 01:31:25,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=333866.6666666667, ans=0.0 2023-12-22 01:31:27,588 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.03 vs. limit=15.0 2023-12-22 01:31:56,642 INFO [train.py:886] (1/4) Epoch 11, batch 2450, loss[loss=0.01431, audio_tagging_loss=0.01431, over 25000.00 frames. ], tot_loss[loss=0.01488, audio_tagging_loss=0.01488, over 4955104.63 frames. ], batch size: 100, lr: 9.95e-03, grad_scale: 64.0 2023-12-22 01:32:02,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=334066.6666666667, ans=0.125 2023-12-22 01:32:02,571 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.96 vs. limit=6.0 2023-12-22 01:32:03,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=334066.6666666667, ans=0.125 2023-12-22 01:32:04,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=334066.6666666667, ans=0.125 2023-12-22 01:32:09,953 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 2.647e+01 2.778e+01 2.930e+01 3.813e+01, threshold=5.556e+01, percent-clipped=0.0 2023-12-22 01:32:28,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=334266.6666666667, ans=0.07 2023-12-22 01:32:29,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=334266.6666666667, ans=0.125 2023-12-22 01:32:34,712 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.78 vs. limit=22.5 2023-12-22 01:32:49,593 INFO [train.py:886] (1/4) Epoch 11, batch 2500, loss[loss=0.01564, audio_tagging_loss=0.01564, over 24750.00 frames. ], tot_loss[loss=0.01496, audio_tagging_loss=0.01496, over 4951602.12 frames. ], batch size: 99, lr: 9.95e-03, grad_scale: 64.0 2023-12-22 01:32:50,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=334400.0, ans=0.07 2023-12-22 01:32:53,844 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.22 vs. limit=15.0 2023-12-22 01:32:55,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=334400.0, ans=0.125 2023-12-22 01:32:57,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=334400.0, ans=0.025 2023-12-22 01:33:07,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=334466.6666666667, ans=0.125 2023-12-22 01:33:16,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=334533.3333333333, ans=0.125 2023-12-22 01:33:23,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=334600.0, ans=0.07 2023-12-22 01:33:39,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=334733.3333333333, ans=0.0 2023-12-22 01:33:41,393 INFO [train.py:886] (1/4) Epoch 11, batch 2550, loss[loss=0.01299, audio_tagging_loss=0.01299, over 25000.00 frames. ], tot_loss[loss=0.01507, audio_tagging_loss=0.01507, over 4951243.68 frames. ], batch size: 100, lr: 9.94e-03, grad_scale: 64.0 2023-12-22 01:33:47,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=334733.3333333333, ans=0.0 2023-12-22 01:33:53,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=334800.0, ans=0.125 2023-12-22 01:33:54,293 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.329e+01 2.667e+01 2.808e+01 2.946e+01 3.351e+01, threshold=5.616e+01, percent-clipped=0.0 2023-12-22 01:33:56,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=334800.0, ans=15.0 2023-12-22 01:34:00,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=334800.0, ans=0.035 2023-12-22 01:34:11,492 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.83 vs. limit=15.0 2023-12-22 01:34:21,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.06 vs. limit=15.0 2023-12-22 01:34:23,967 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.69 vs. limit=15.0 2023-12-22 01:34:33,209 INFO [train.py:886] (1/4) Epoch 11, batch 2600, loss[loss=0.01589, audio_tagging_loss=0.01589, over 24750.00 frames. ], tot_loss[loss=0.01502, audio_tagging_loss=0.01502, over 4944894.10 frames. ], batch size: 99, lr: 9.94e-03, grad_scale: 64.0 2023-12-22 01:34:43,810 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.85 vs. limit=15.0 2023-12-22 01:34:57,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=335200.0, ans=0.125 2023-12-22 01:35:06,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=335266.6666666667, ans=0.0 2023-12-22 01:35:25,031 INFO [train.py:886] (1/4) Epoch 11, batch 2650, loss[loss=0.01222, audio_tagging_loss=0.01222, over 23984.00 frames. ], tot_loss[loss=0.01488, audio_tagging_loss=0.01488, over 4947808.52 frames. ], batch size: 100, lr: 9.93e-03, grad_scale: 64.0 2023-12-22 01:35:36,559 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.354e+01 2.628e+01 2.801e+01 2.924e+01 4.214e+01, threshold=5.602e+01, percent-clipped=0.0 2023-12-22 01:36:07,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=335666.6666666667, ans=0.0 2023-12-22 01:36:15,984 INFO [train.py:886] (1/4) Epoch 11, batch 2700, loss[loss=0.01753, audio_tagging_loss=0.01753, over 25000.00 frames. ], tot_loss[loss=0.01496, audio_tagging_loss=0.01496, over 4952511.94 frames. ], batch size: 100, lr: 9.93e-03, grad_scale: 128.0 2023-12-22 01:36:23,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=335733.3333333333, ans=0.1 2023-12-22 01:36:40,518 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.72 vs. limit=22.5 2023-12-22 01:36:42,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=335866.6666666667, ans=0.1 2023-12-22 01:36:44,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=335866.6666666667, ans=0.2 2023-12-22 01:36:59,332 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2023-12-22 01:37:06,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=336000.0, ans=0.1 2023-12-22 01:37:08,126 INFO [train.py:886] (1/4) Epoch 11, batch 2750, loss[loss=0.01842, audio_tagging_loss=0.01842, over 24896.00 frames. ], tot_loss[loss=0.01492, audio_tagging_loss=0.01492, over 4951503.98 frames. ], batch size: 100, lr: 9.92e-03, grad_scale: 64.0 2023-12-22 01:37:09,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=336066.6666666667, ans=0.2 2023-12-22 01:37:21,138 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.353e+01 2.681e+01 2.814e+01 2.995e+01 3.460e+01, threshold=5.627e+01, percent-clipped=0.0 2023-12-22 01:37:22,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=336133.3333333333, ans=0.125 2023-12-22 01:37:42,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=336266.6666666667, ans=0.125 2023-12-22 01:37:43,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=336266.6666666667, ans=0.125 2023-12-22 01:37:45,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=336266.6666666667, ans=0.1 2023-12-22 01:37:49,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=336333.3333333333, ans=0.1 2023-12-22 01:37:59,152 INFO [train.py:886] (1/4) Epoch 11, batch 2800, loss[loss=0.01743, audio_tagging_loss=0.01743, over 24750.00 frames. ], tot_loss[loss=0.015, audio_tagging_loss=0.015, over 4954058.34 frames. ], batch size: 99, lr: 9.92e-03, grad_scale: 64.0 2023-12-22 01:38:09,201 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.09 vs. limit=12.0 2023-12-22 01:38:11,546 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.86 vs. limit=5.0 2023-12-22 01:38:25,232 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.49 vs. limit=22.5 2023-12-22 01:38:36,774 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.62 vs. limit=15.0 2023-12-22 01:38:40,030 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.51 vs. limit=15.0 2023-12-22 01:38:52,012 INFO [train.py:886] (1/4) Epoch 11, batch 2850, loss[loss=0.01444, audio_tagging_loss=0.01444, over 24750.00 frames. ], tot_loss[loss=0.01511, audio_tagging_loss=0.01511, over 4947817.20 frames. ], batch size: 99, lr: 9.91e-03, grad_scale: 64.0 2023-12-22 01:38:53,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=336733.3333333333, ans=0.125 2023-12-22 01:38:54,368 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.73 vs. limit=10.0 2023-12-22 01:39:05,710 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.219e+01 2.677e+01 2.844e+01 3.022e+01 3.570e+01, threshold=5.688e+01, percent-clipped=0.0 2023-12-22 01:39:14,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=336866.6666666667, ans=0.1 2023-12-22 01:39:16,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=336866.6666666667, ans=0.1 2023-12-22 01:39:21,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=336866.6666666667, ans=0.1 2023-12-22 01:39:24,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=336933.3333333333, ans=0.1 2023-12-22 01:39:27,415 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.52 vs. limit=15.0 2023-12-22 01:39:31,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=336933.3333333333, ans=0.0 2023-12-22 01:39:45,145 INFO [train.py:886] (1/4) Epoch 11, batch 2900, loss[loss=0.01382, audio_tagging_loss=0.01382, over 25000.00 frames. ], tot_loss[loss=0.01502, audio_tagging_loss=0.01502, over 4945913.55 frames. ], batch size: 100, lr: 9.91e-03, grad_scale: 64.0 2023-12-22 01:39:45,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=337066.6666666667, ans=0.0 2023-12-22 01:39:45,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=337066.6666666667, ans=0.0 2023-12-22 01:40:13,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=337200.0, ans=0.0 2023-12-22 01:40:16,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=337266.6666666667, ans=15.0 2023-12-22 01:40:20,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=337266.6666666667, ans=0.0 2023-12-22 01:40:23,649 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.78 vs. limit=15.0 2023-12-22 01:40:27,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=337333.3333333333, ans=0.0 2023-12-22 01:40:34,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=337333.3333333333, ans=0.125 2023-12-22 01:40:36,307 INFO [train.py:886] (1/4) Epoch 11, batch 2950, loss[loss=0.01549, audio_tagging_loss=0.01549, over 25000.00 frames. ], tot_loss[loss=0.01496, audio_tagging_loss=0.01496, over 4944728.85 frames. ], batch size: 100, lr: 9.90e-03, grad_scale: 64.0 2023-12-22 01:40:38,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=337400.0, ans=0.125 2023-12-22 01:40:50,444 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.319e+01 2.642e+01 2.776e+01 2.943e+01 5.115e+01, threshold=5.551e+01, percent-clipped=0.0 2023-12-22 01:40:56,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=337466.6666666667, ans=0.0 2023-12-22 01:40:59,326 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.48 vs. limit=15.0 2023-12-22 01:41:28,792 INFO [train.py:886] (1/4) Epoch 11, batch 3000, loss[loss=0.01056, audio_tagging_loss=0.01056, over 25000.00 frames. ], tot_loss[loss=0.01486, audio_tagging_loss=0.01486, over 4945213.16 frames. ], batch size: 100, lr: 9.90e-03, grad_scale: 64.0 2023-12-22 01:41:28,793 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 01:41:41,385 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.1934, 1.4499, 4.4054, 4.2403], device='cuda:1') 2023-12-22 01:41:49,995 INFO [train.py:917] (1/4) Epoch 11, validation: loss=0.03489, audio_tagging_loss=0.03489, over 3737520.00 frames. 2023-12-22 01:41:49,995 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 01:41:52,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=337733.3333333333, ans=0.125 2023-12-22 01:41:53,988 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.01 vs. limit=22.5 2023-12-22 01:42:07,058 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.30 vs. limit=15.0 2023-12-22 01:42:35,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=338000.0, ans=0.125 2023-12-22 01:42:36,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=338000.0, ans=0.1 2023-12-22 01:42:40,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=338066.6666666667, ans=0.07 2023-12-22 01:42:40,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=338066.6666666667, ans=0.125 2023-12-22 01:42:41,462 INFO [train.py:886] (1/4) Epoch 11, batch 3050, loss[loss=0.01843, audio_tagging_loss=0.01843, over 25000.00 frames. ], tot_loss[loss=0.01486, audio_tagging_loss=0.01486, over 4954532.58 frames. ], batch size: 100, lr: 9.89e-03, grad_scale: 64.0 2023-12-22 01:42:55,340 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.188e+01 2.694e+01 2.782e+01 2.959e+01 3.737e+01, threshold=5.564e+01, percent-clipped=0.0 2023-12-22 01:42:55,900 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=15.0 2023-12-22 01:42:56,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=338133.3333333333, ans=0.2 2023-12-22 01:43:04,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=338200.0, ans=0.2 2023-12-22 01:43:04,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=338200.0, ans=0.0 2023-12-22 01:43:15,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=338266.6666666667, ans=0.1 2023-12-22 01:43:15,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=338266.6666666667, ans=0.1 2023-12-22 01:43:19,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=338266.6666666667, ans=0.2 2023-12-22 01:43:22,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=338333.3333333333, ans=0.125 2023-12-22 01:43:23,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=338333.3333333333, ans=0.0 2023-12-22 01:43:28,701 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.65 vs. limit=15.0 2023-12-22 01:43:33,637 INFO [train.py:886] (1/4) Epoch 11, batch 3100, loss[loss=0.01653, audio_tagging_loss=0.01653, over 24750.00 frames. ], tot_loss[loss=0.01492, audio_tagging_loss=0.01492, over 4955869.48 frames. ], batch size: 99, lr: 9.89e-03, grad_scale: 64.0 2023-12-22 01:43:42,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=338466.6666666667, ans=0.1 2023-12-22 01:43:46,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=338466.6666666667, ans=0.125 2023-12-22 01:43:55,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=338533.3333333333, ans=0.125 2023-12-22 01:43:57,198 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.20 vs. limit=15.0 2023-12-22 01:44:19,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=338666.6666666667, ans=0.125 2023-12-22 01:44:20,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=338666.6666666667, ans=0.125 2023-12-22 01:44:24,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=338733.3333333333, ans=0.125 2023-12-22 01:44:24,961 INFO [train.py:886] (1/4) Epoch 11, batch 3150, loss[loss=0.0171, audio_tagging_loss=0.0171, over 25000.00 frames. ], tot_loss[loss=0.01496, audio_tagging_loss=0.01496, over 4948838.47 frames. ], batch size: 100, lr: 9.88e-03, grad_scale: 64.0 2023-12-22 01:44:33,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=338733.3333333333, ans=0.0 2023-12-22 01:44:38,754 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.475e+01 2.710e+01 2.835e+01 3.003e+01 4.249e+01, threshold=5.670e+01, percent-clipped=0.0 2023-12-22 01:44:39,310 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=15.0 2023-12-22 01:44:49,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=338866.6666666667, ans=0.125 2023-12-22 01:44:58,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=338933.3333333333, ans=0.125 2023-12-22 01:45:10,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=339000.0, ans=0.2 2023-12-22 01:45:14,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=339000.0, ans=0.2 2023-12-22 01:45:17,426 INFO [train.py:886] (1/4) Epoch 11, batch 3200, loss[loss=0.01545, audio_tagging_loss=0.01545, over 25000.00 frames. ], tot_loss[loss=0.01492, audio_tagging_loss=0.01492, over 4946551.88 frames. ], batch size: 100, lr: 9.88e-03, grad_scale: 64.0 2023-12-22 01:45:28,872 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.30 vs. limit=15.0 2023-12-22 01:45:29,893 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.45 vs. limit=15.0 2023-12-22 01:45:34,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=339133.3333333333, ans=0.125 2023-12-22 01:45:46,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=339200.0, ans=0.1 2023-12-22 01:45:51,880 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.756e-01 2023-12-22 01:46:04,185 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.142e-01 2023-12-22 01:46:09,473 INFO [train.py:886] (1/4) Epoch 11, batch 3250, loss[loss=0.01464, audio_tagging_loss=0.01464, over 25000.00 frames. ], tot_loss[loss=0.01499, audio_tagging_loss=0.01499, over 4946453.30 frames. ], batch size: 100, lr: 9.87e-03, grad_scale: 64.0 2023-12-22 01:46:18,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=339466.6666666667, ans=0.125 2023-12-22 01:46:23,057 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.400e+01 2.628e+01 2.776e+01 2.963e+01 3.661e+01, threshold=5.553e+01, percent-clipped=0.0 2023-12-22 01:46:27,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=339466.6666666667, ans=0.0 2023-12-22 01:46:37,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=339533.3333333333, ans=0.0 2023-12-22 01:46:44,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=339600.0, ans=0.0 2023-12-22 01:47:01,039 INFO [train.py:886] (1/4) Epoch 11, batch 3300, loss[loss=0.01579, audio_tagging_loss=0.01579, over 22197.00 frames. ], tot_loss[loss=0.01488, audio_tagging_loss=0.01488, over 4943840.30 frames. ], batch size: 107, lr: 9.87e-03, grad_scale: 64.0 2023-12-22 01:47:23,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=339866.6666666667, ans=0.125 2023-12-22 01:47:34,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=339933.3333333333, ans=0.0 2023-12-22 01:47:34,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=339933.3333333333, ans=0.05 2023-12-22 01:47:41,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=340000.0, ans=0.0 2023-12-22 01:47:52,833 INFO [train.py:886] (1/4) Epoch 11, batch 3350, loss[loss=0.01608, audio_tagging_loss=0.01608, over 25000.00 frames. ], tot_loss[loss=0.01488, audio_tagging_loss=0.01488, over 4947484.48 frames. ], batch size: 100, lr: 9.86e-03, grad_scale: 64.0 2023-12-22 01:47:56,948 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.50 vs. limit=22.5 2023-12-22 01:48:00,367 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.01 vs. limit=12.0 2023-12-22 01:48:06,416 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.342e+01 2.626e+01 2.784e+01 2.981e+01 3.630e+01, threshold=5.569e+01, percent-clipped=0.0 2023-12-22 01:48:11,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=340133.3333333333, ans=0.0 2023-12-22 01:48:16,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=340200.0, ans=0.125 2023-12-22 01:48:36,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=340333.3333333333, ans=0.0 2023-12-22 01:48:45,043 INFO [train.py:886] (1/4) Epoch 11, batch 3400, loss[loss=0.01583, audio_tagging_loss=0.01583, over 25000.00 frames. ], tot_loss[loss=0.01498, audio_tagging_loss=0.01498, over 4949617.24 frames. ], batch size: 100, lr: 9.86e-03, grad_scale: 64.0 2023-12-22 01:48:53,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=340400.0, ans=0.2 2023-12-22 01:49:13,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=340533.3333333333, ans=0.125 2023-12-22 01:49:22,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=340600.0, ans=0.07 2023-12-22 01:49:36,295 INFO [train.py:886] (1/4) Epoch 11, batch 3450, loss[loss=0.01548, audio_tagging_loss=0.01548, over 24750.00 frames. ], tot_loss[loss=0.01502, audio_tagging_loss=0.01502, over 4943250.05 frames. ], batch size: 99, lr: 9.86e-03, grad_scale: 64.0 2023-12-22 01:49:39,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=340733.3333333333, ans=0.025 2023-12-22 01:49:49,461 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 2.658e+01 2.786e+01 2.914e+01 3.460e+01, threshold=5.572e+01, percent-clipped=0.0 2023-12-22 01:49:55,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=340800.0, ans=0.125 2023-12-22 01:49:58,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2023-12-22 01:50:20,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=341000.0, ans=0.0 2023-12-22 01:50:27,739 INFO [train.py:886] (1/4) Epoch 11, batch 3500, loss[loss=0.0138, audio_tagging_loss=0.0138, over 24750.00 frames. ], tot_loss[loss=0.01506, audio_tagging_loss=0.01506, over 4941662.50 frames. ], batch size: 99, lr: 9.85e-03, grad_scale: 64.0 2023-12-22 01:50:29,321 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.16 vs. limit=15.0 2023-12-22 01:50:35,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=341066.6666666667, ans=0.125 2023-12-22 01:50:38,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=341133.3333333333, ans=0.125 2023-12-22 01:51:01,445 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.56 vs. limit=6.0 2023-12-22 01:51:02,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=341266.6666666667, ans=0.125 2023-12-22 01:51:17,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=341333.3333333333, ans=0.0 2023-12-22 01:51:17,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=341333.3333333333, ans=0.125 2023-12-22 01:51:18,040 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=12.02 vs. limit=12.0 2023-12-22 01:51:20,417 INFO [train.py:886] (1/4) Epoch 11, batch 3550, loss[loss=0.01034, audio_tagging_loss=0.01034, over 25000.00 frames. ], tot_loss[loss=0.01496, audio_tagging_loss=0.01496, over 4939708.33 frames. ], batch size: 100, lr: 9.85e-03, grad_scale: 64.0 2023-12-22 01:51:20,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=341400.0, ans=0.0 2023-12-22 01:51:33,442 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.305e+01 2.655e+01 2.795e+01 3.045e+01 3.842e+01, threshold=5.591e+01, percent-clipped=0.0 2023-12-22 01:51:36,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=341466.6666666667, ans=0.04949747468305833 2023-12-22 01:51:42,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=341533.3333333333, ans=0.05 2023-12-22 01:51:45,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=341533.3333333333, ans=0.1 2023-12-22 01:52:00,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=341600.0, ans=0.0 2023-12-22 01:52:01,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=341666.6666666667, ans=0.2 2023-12-22 01:52:03,877 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 01:52:12,048 INFO [train.py:886] (1/4) Epoch 11, batch 3600, loss[loss=0.014, audio_tagging_loss=0.014, over 24032.00 frames. ], tot_loss[loss=0.01489, audio_tagging_loss=0.01489, over 4944653.42 frames. ], batch size: 100, lr: 9.84e-03, grad_scale: 64.0 2023-12-22 01:52:12,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=341733.3333333333, ans=0.0 2023-12-22 01:52:26,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=341800.0, ans=0.0 2023-12-22 01:52:40,946 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.327e-01 2023-12-22 01:52:45,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=341933.3333333333, ans=0.0 2023-12-22 01:52:45,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=341933.3333333333, ans=0.0 2023-12-22 01:52:52,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=342000.0, ans=0.1 2023-12-22 01:53:03,697 INFO [train.py:886] (1/4) Epoch 11, batch 3650, loss[loss=0.01556, audio_tagging_loss=0.01556, over 25000.00 frames. ], tot_loss[loss=0.01484, audio_tagging_loss=0.01484, over 4951004.61 frames. ], batch size: 100, lr: 9.84e-03, grad_scale: 64.0 2023-12-22 01:53:05,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=342066.6666666667, ans=0.0 2023-12-22 01:53:07,898 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.99 vs. limit=12.0 2023-12-22 01:53:12,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=342066.6666666667, ans=0.0 2023-12-22 01:53:17,499 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.382e+01 2.641e+01 2.779e+01 2.904e+01 3.423e+01, threshold=5.558e+01, percent-clipped=0.0 2023-12-22 01:53:20,875 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.21 vs. limit=15.0 2023-12-22 01:53:35,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=342266.6666666667, ans=0.0 2023-12-22 01:53:54,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=342400.0, ans=0.125 2023-12-22 01:53:55,476 INFO [train.py:886] (1/4) Epoch 11, batch 3700, loss[loss=0.01432, audio_tagging_loss=0.01432, over 25000.00 frames. ], tot_loss[loss=0.0148, audio_tagging_loss=0.0148, over 4955820.34 frames. ], batch size: 100, lr: 9.83e-03, grad_scale: 64.0 2023-12-22 01:53:59,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=342400.0, ans=0.1 2023-12-22 01:54:05,660 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.122e-02 2023-12-22 01:54:16,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=342533.3333333333, ans=0.0 2023-12-22 01:54:17,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=342533.3333333333, ans=0.1 2023-12-22 01:54:18,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=342533.3333333333, ans=0.125 2023-12-22 01:54:43,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=342666.6666666667, ans=0.125 2023-12-22 01:54:47,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=342733.3333333333, ans=0.0 2023-12-22 01:54:48,096 INFO [train.py:886] (1/4) Epoch 11, batch 3750, loss[loss=0.01688, audio_tagging_loss=0.01688, over 24940.00 frames. ], tot_loss[loss=0.01493, audio_tagging_loss=0.01493, over 4948570.43 frames. ], batch size: 100, lr: 9.83e-03, grad_scale: 64.0 2023-12-22 01:54:50,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=342733.3333333333, ans=0.125 2023-12-22 01:55:01,196 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.410e+01 2.750e+01 2.894e+01 3.049e+01 3.560e+01, threshold=5.788e+01, percent-clipped=0.0 2023-12-22 01:55:04,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=342800.0, ans=0.125 2023-12-22 01:55:11,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=342866.6666666667, ans=0.125 2023-12-22 01:55:18,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=342933.3333333333, ans=0.0 2023-12-22 01:55:33,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=343000.0, ans=0.125 2023-12-22 01:55:37,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=343000.0, ans=0.0 2023-12-22 01:55:39,609 INFO [train.py:886] (1/4) Epoch 11, batch 3800, loss[loss=0.01367, audio_tagging_loss=0.01367, over 24750.00 frames. ], tot_loss[loss=0.01492, audio_tagging_loss=0.01492, over 4942476.15 frames. ], batch size: 99, lr: 9.82e-03, grad_scale: 64.0 2023-12-22 01:55:43,870 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.84 vs. limit=6.0 2023-12-22 01:55:50,307 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.20 vs. limit=22.5 2023-12-22 01:55:57,585 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.05 vs. limit=15.0 2023-12-22 01:55:58,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=343133.3333333333, ans=0.0 2023-12-22 01:56:11,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=343266.6666666667, ans=0.09899494936611666 2023-12-22 01:56:12,415 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.05 vs. limit=6.0 2023-12-22 01:56:14,093 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.04 vs. limit=10.0 2023-12-22 01:56:20,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=343333.3333333333, ans=0.0 2023-12-22 01:56:25,255 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=31.78 vs. limit=22.5 2023-12-22 01:56:25,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=343333.3333333333, ans=0.0 2023-12-22 01:56:31,189 INFO [train.py:886] (1/4) Epoch 11, batch 3850, loss[loss=0.01345, audio_tagging_loss=0.01345, over 25000.00 frames. ], tot_loss[loss=0.0149, audio_tagging_loss=0.0149, over 4937547.41 frames. ], batch size: 100, lr: 9.82e-03, grad_scale: 64.0 2023-12-22 01:56:31,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=343400.0, ans=0.035 2023-12-22 01:56:44,874 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.345e+01 2.640e+01 2.758e+01 2.966e+01 3.475e+01, threshold=5.515e+01, percent-clipped=0.0 2023-12-22 01:56:46,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=343466.6666666667, ans=0.2 2023-12-22 01:56:48,299 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.99 vs. limit=22.5 2023-12-22 01:56:49,880 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.50 vs. limit=15.0 2023-12-22 01:56:55,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=343533.3333333333, ans=0.125 2023-12-22 01:56:55,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.10 vs. limit=15.0 2023-12-22 01:57:23,758 INFO [train.py:886] (1/4) Epoch 11, batch 3900, loss[loss=0.01415, audio_tagging_loss=0.01415, over 25000.00 frames. ], tot_loss[loss=0.01475, audio_tagging_loss=0.01475, over 4941671.78 frames. ], batch size: 100, lr: 9.81e-03, grad_scale: 64.0 2023-12-22 01:57:26,499 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.83 vs. limit=15.0 2023-12-22 01:57:46,758 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.30 vs. limit=15.0 2023-12-22 01:58:15,479 INFO [train.py:886] (1/4) Epoch 11, batch 3950, loss[loss=0.0151, audio_tagging_loss=0.0151, over 25000.00 frames. ], tot_loss[loss=0.01478, audio_tagging_loss=0.01478, over 4947308.42 frames. ], batch size: 100, lr: 9.81e-03, grad_scale: 64.0 2023-12-22 01:58:15,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=344066.6666666667, ans=0.2 2023-12-22 01:58:24,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=344066.6666666667, ans=0.0 2023-12-22 01:58:29,731 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.266e+01 2.678e+01 2.766e+01 2.913e+01 3.341e+01, threshold=5.531e+01, percent-clipped=0.0 2023-12-22 01:58:31,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=344133.3333333333, ans=0.1 2023-12-22 01:58:36,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=344200.0, ans=0.04949747468305833 2023-12-22 01:59:00,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=344333.3333333333, ans=0.125 2023-12-22 01:59:08,121 INFO [train.py:886] (1/4) Epoch 11, batch 4000, loss[loss=0.01281, audio_tagging_loss=0.01281, over 25000.00 frames. ], tot_loss[loss=0.01479, audio_tagging_loss=0.01479, over 4947587.71 frames. ], batch size: 100, lr: 9.80e-03, grad_scale: 64.0 2023-12-22 01:59:13,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=344400.0, ans=0.125 2023-12-22 01:59:20,782 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.55 vs. limit=15.0 2023-12-22 01:59:31,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=344533.3333333333, ans=0.125 2023-12-22 01:59:53,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=344666.6666666667, ans=0.125 2023-12-22 01:59:59,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=344733.3333333333, ans=0.125 2023-12-22 01:59:59,656 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.35 vs. limit=15.0 2023-12-22 02:00:00,091 INFO [train.py:886] (1/4) Epoch 11, batch 4050, loss[loss=0.01341, audio_tagging_loss=0.01341, over 25000.00 frames. ], tot_loss[loss=0.01486, audio_tagging_loss=0.01486, over 4948420.14 frames. ], batch size: 100, lr: 9.80e-03, grad_scale: 64.0 2023-12-22 02:00:02,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=344733.3333333333, ans=0.125 2023-12-22 02:00:13,200 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.311e+01 2.657e+01 2.838e+01 3.013e+01 3.411e+01, threshold=5.677e+01, percent-clipped=0.0 2023-12-22 02:00:30,201 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.44 vs. limit=22.5 2023-12-22 02:00:30,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=344933.3333333333, ans=0.035 2023-12-22 02:00:37,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=344933.3333333333, ans=0.125 2023-12-22 02:00:42,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=345000.0, ans=0.05 2023-12-22 02:00:48,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=345000.0, ans=0.125 2023-12-22 02:00:51,610 INFO [train.py:886] (1/4) Epoch 11, batch 4100, loss[loss=0.01389, audio_tagging_loss=0.01389, over 24750.00 frames. ], tot_loss[loss=0.01495, audio_tagging_loss=0.01495, over 4938542.14 frames. ], batch size: 99, lr: 9.79e-03, grad_scale: 64.0 2023-12-22 02:00:59,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=345066.6666666667, ans=0.2 2023-12-22 02:01:02,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=345133.3333333333, ans=0.0 2023-12-22 02:01:05,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=345133.3333333333, ans=0.0 2023-12-22 02:01:23,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=345266.6666666667, ans=0.0 2023-12-22 02:01:25,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=345266.6666666667, ans=0.0 2023-12-22 02:01:28,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=345266.6666666667, ans=0.2 2023-12-22 02:01:44,244 INFO [train.py:886] (1/4) Epoch 11, batch 4150, loss[loss=0.0176, audio_tagging_loss=0.0176, over 25000.00 frames. ], tot_loss[loss=0.01498, audio_tagging_loss=0.01498, over 4939169.30 frames. ], batch size: 100, lr: 9.79e-03, grad_scale: 64.0 2023-12-22 02:01:45,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=345400.0, ans=0.125 2023-12-22 02:01:57,137 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.334e+01 2.652e+01 2.805e+01 3.053e+01 3.563e+01, threshold=5.610e+01, percent-clipped=0.0 2023-12-22 02:01:59,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=345466.6666666667, ans=0.125 2023-12-22 02:01:59,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=345466.6666666667, ans=0.1 2023-12-22 02:02:26,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=345666.6666666667, ans=0.125 2023-12-22 02:02:26,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=345666.6666666667, ans=0.125 2023-12-22 02:02:35,111 INFO [train.py:886] (1/4) Epoch 11, batch 4200, loss[loss=0.01262, audio_tagging_loss=0.01262, over 25000.00 frames. ], tot_loss[loss=0.01481, audio_tagging_loss=0.01481, over 4945409.63 frames. ], batch size: 100, lr: 9.79e-03, grad_scale: 64.0 2023-12-22 02:02:55,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=345800.0, ans=0.0 2023-12-22 02:03:00,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=345866.6666666667, ans=0.125 2023-12-22 02:03:01,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=345866.6666666667, ans=0.125 2023-12-22 02:03:11,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=345933.3333333333, ans=0.0 2023-12-22 02:03:27,980 INFO [train.py:886] (1/4) Epoch 11, batch 4250, loss[loss=0.01591, audio_tagging_loss=0.01591, over 25000.00 frames. ], tot_loss[loss=0.01474, audio_tagging_loss=0.01474, over 4946870.81 frames. ], batch size: 100, lr: 9.78e-03, grad_scale: 64.0 2023-12-22 02:03:40,220 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.357e+01 2.635e+01 2.806e+01 3.008e+01 3.451e+01, threshold=5.611e+01, percent-clipped=0.0 2023-12-22 02:03:41,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=346133.3333333333, ans=0.1 2023-12-22 02:03:45,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=346133.3333333333, ans=0.125 2023-12-22 02:03:49,599 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.65 vs. limit=10.0 2023-12-22 02:04:03,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.34 vs. limit=22.5 2023-12-22 02:04:08,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=346333.3333333333, ans=0.0 2023-12-22 02:04:18,636 INFO [train.py:886] (1/4) Epoch 11, batch 4300, loss[loss=0.0134, audio_tagging_loss=0.0134, over 25000.00 frames. ], tot_loss[loss=0.01479, audio_tagging_loss=0.01479, over 4954390.11 frames. ], batch size: 100, lr: 9.78e-03, grad_scale: 64.0 2023-12-22 02:04:36,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=346466.6666666667, ans=0.125 2023-12-22 02:04:36,456 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.20 vs. limit=22.5 2023-12-22 02:04:38,484 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.86 vs. limit=22.5 2023-12-22 02:04:43,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=346533.3333333333, ans=0.125 2023-12-22 02:04:44,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=346533.3333333333, ans=0.04949747468305833 2023-12-22 02:04:51,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=346600.0, ans=0.5 2023-12-22 02:04:55,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=346600.0, ans=0.05 2023-12-22 02:05:12,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=346733.3333333333, ans=0.0 2023-12-22 02:05:13,321 INFO [train.py:886] (1/4) Epoch 11, batch 4350, loss[loss=0.01726, audio_tagging_loss=0.01726, over 24750.00 frames. ], tot_loss[loss=0.01477, audio_tagging_loss=0.01477, over 4958376.98 frames. ], batch size: 99, lr: 9.77e-03, grad_scale: 64.0 2023-12-22 02:05:25,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=346800.0, ans=0.1 2023-12-22 02:05:26,216 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+01 2.594e+01 2.762e+01 2.910e+01 3.539e+01, threshold=5.524e+01, percent-clipped=0.0 2023-12-22 02:05:30,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=346800.0, ans=0.1 2023-12-22 02:05:42,175 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.942e-02 2023-12-22 02:05:46,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=346933.3333333333, ans=0.2 2023-12-22 02:05:52,436 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.02 vs. limit=22.5 2023-12-22 02:05:55,102 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.84 vs. limit=12.0 2023-12-22 02:05:58,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=347000.0, ans=0.2 2023-12-22 02:06:04,915 INFO [train.py:886] (1/4) Epoch 11, batch 4400, loss[loss=0.01539, audio_tagging_loss=0.01539, over 24750.00 frames. ], tot_loss[loss=0.01496, audio_tagging_loss=0.01496, over 4948088.50 frames. ], batch size: 99, lr: 9.77e-03, grad_scale: 64.0 2023-12-22 02:06:38,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=347266.6666666667, ans=0.0 2023-12-22 02:06:45,951 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.87 vs. limit=12.0 2023-12-22 02:06:57,393 INFO [train.py:886] (1/4) Epoch 11, batch 4450, loss[loss=0.01182, audio_tagging_loss=0.01182, over 21916.00 frames. ], tot_loss[loss=0.01505, audio_tagging_loss=0.01505, over 4936898.47 frames. ], batch size: 107, lr: 9.76e-03, grad_scale: 64.0 2023-12-22 02:06:57,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=347400.0, ans=0.125 2023-12-22 02:07:06,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=347466.6666666667, ans=0.1 2023-12-22 02:07:10,414 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.427e+01 2.637e+01 2.813e+01 2.949e+01 3.771e+01, threshold=5.626e+01, percent-clipped=0.0 2023-12-22 02:07:25,378 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.718e-03 2023-12-22 02:07:37,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=347600.0, ans=0.125 2023-12-22 02:07:39,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=347666.6666666667, ans=0.125 2023-12-22 02:07:49,035 INFO [train.py:886] (1/4) Epoch 11, batch 4500, loss[loss=0.01342, audio_tagging_loss=0.01342, over 25000.00 frames. ], tot_loss[loss=0.0149, audio_tagging_loss=0.0149, over 4938657.62 frames. ], batch size: 100, lr: 9.76e-03, grad_scale: 64.0 2023-12-22 02:07:55,357 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.20 vs. limit=22.5 2023-12-22 02:07:58,203 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.98 vs. limit=6.0 2023-12-22 02:08:10,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=347866.6666666667, ans=0.1 2023-12-22 02:08:22,709 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 02:08:23,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=347933.3333333333, ans=0.1 2023-12-22 02:08:25,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=347933.3333333333, ans=0.125 2023-12-22 02:08:28,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=347933.3333333333, ans=0.0 2023-12-22 02:08:41,331 INFO [train.py:886] (1/4) Epoch 11, batch 4550, loss[loss=0.01247, audio_tagging_loss=0.01247, over 25000.00 frames. ], tot_loss[loss=0.01484, audio_tagging_loss=0.01484, over 4940477.17 frames. ], batch size: 100, lr: 9.75e-03, grad_scale: 64.0 2023-12-22 02:08:45,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=348066.6666666667, ans=0.125 2023-12-22 02:08:55,125 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.197e+01 2.595e+01 2.747e+01 2.922e+01 3.602e+01, threshold=5.493e+01, percent-clipped=0.0 2023-12-22 02:09:13,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=348266.6666666667, ans=0.125 2023-12-22 02:09:28,254 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.95 vs. limit=15.0 2023-12-22 02:09:31,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=348333.3333333333, ans=0.125 2023-12-22 02:09:33,447 INFO [train.py:886] (1/4) Epoch 11, batch 4600, loss[loss=0.01718, audio_tagging_loss=0.01718, over 24750.00 frames. ], tot_loss[loss=0.01488, audio_tagging_loss=0.01488, over 4945861.61 frames. ], batch size: 99, lr: 9.75e-03, grad_scale: 64.0 2023-12-22 02:09:33,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=348400.0, ans=0.0 2023-12-22 02:10:25,540 INFO [train.py:886] (1/4) Epoch 11, batch 4650, loss[loss=0.01437, audio_tagging_loss=0.01437, over 25000.00 frames. ], tot_loss[loss=0.01493, audio_tagging_loss=0.01493, over 4949728.27 frames. ], batch size: 100, lr: 9.74e-03, grad_scale: 64.0 2023-12-22 02:10:31,774 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.06 vs. limit=15.0 2023-12-22 02:10:39,505 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.249e+01 2.626e+01 2.815e+01 2.915e+01 3.579e+01, threshold=5.630e+01, percent-clipped=0.0 2023-12-22 02:10:58,710 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.81 vs. limit=12.0 2023-12-22 02:11:17,321 INFO [train.py:886] (1/4) Epoch 11, batch 4700, loss[loss=0.01532, audio_tagging_loss=0.01532, over 24750.00 frames. ], tot_loss[loss=0.01498, audio_tagging_loss=0.01498, over 4950361.73 frames. ], batch size: 99, lr: 9.74e-03, grad_scale: 64.0 2023-12-22 02:11:20,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=349066.6666666667, ans=0.125 2023-12-22 02:11:21,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=349066.6666666667, ans=0.125 2023-12-22 02:11:37,592 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.04 vs. limit=15.0 2023-12-22 02:12:04,810 INFO [train.py:886] (1/4) Epoch 11, batch 4750, loss[loss=0.01249, audio_tagging_loss=0.01249, over 24750.00 frames. ], tot_loss[loss=0.01497, audio_tagging_loss=0.01497, over 4945438.96 frames. ], batch size: 99, lr: 9.73e-03, grad_scale: 64.0 2023-12-22 02:12:09,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=349400.0, ans=0.1 2023-12-22 02:12:13,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=349466.6666666667, ans=0.1 2023-12-22 02:12:17,672 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.414e+01 2.685e+01 2.810e+01 2.973e+01 3.471e+01, threshold=5.619e+01, percent-clipped=0.0 2023-12-22 02:12:40,996 INFO [train.py:886] (1/4) Epoch 12, batch 0, loss[loss=0.03449, audio_tagging_loss=0.03449, over 25000.00 frames. ], tot_loss[loss=0.03449, audio_tagging_loss=0.03449, over 25000.00 frames. ], batch size: 100, lr: 9.32e-03, grad_scale: 64.0 2023-12-22 02:12:40,996 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 02:12:54,681 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.6404, 2.4666, 3.1957, 3.0442, 3.7368, 3.5979, 3.8012, 2.7754], device='cuda:1') 2023-12-22 02:13:02,309 INFO [train.py:917] (1/4) Epoch 12, validation: loss=0.03393, audio_tagging_loss=0.03393, over 3737520.00 frames. 2023-12-22 02:13:02,310 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 02:13:08,288 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 02:13:14,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=349573.3333333333, ans=0.0 2023-12-22 02:13:24,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=349640.0, ans=0.0 2023-12-22 02:13:25,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=349640.0, ans=0.07 2023-12-22 02:13:41,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=349706.6666666667, ans=0.125 2023-12-22 02:13:45,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=349773.3333333333, ans=0.1 2023-12-22 02:13:53,153 INFO [train.py:886] (1/4) Epoch 12, batch 50, loss[loss=0.01929, audio_tagging_loss=0.01929, over 25000.00 frames. ], tot_loss[loss=0.02364, audio_tagging_loss=0.02364, over 1120115.85 frames. ], batch size: 100, lr: 9.32e-03, grad_scale: 64.0 2023-12-22 02:13:53,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=349840.0, ans=0.125 2023-12-22 02:14:00,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=349840.0, ans=0.125 2023-12-22 02:14:06,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=349906.6666666667, ans=0.0 2023-12-22 02:14:11,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=349906.6666666667, ans=0.0 2023-12-22 02:14:42,747 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.586e+01 3.132e+01 3.484e+01 4.021e+01 8.947e+01, threshold=6.968e+01, percent-clipped=8.0 2023-12-22 02:14:45,344 INFO [train.py:886] (1/4) Epoch 12, batch 100, loss[loss=0.01633, audio_tagging_loss=0.01633, over 24750.00 frames. ], tot_loss[loss=0.0201, audio_tagging_loss=0.0201, over 1971795.62 frames. ], batch size: 99, lr: 9.32e-03, grad_scale: 64.0 2023-12-22 02:15:00,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=350240.0, ans=0.1 2023-12-22 02:15:03,591 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2023-12-22 02:15:04,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=350306.6666666667, ans=0.0 2023-12-22 02:15:27,289 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.43 vs. limit=12.0 2023-12-22 02:15:28,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=350440.0, ans=0.2 2023-12-22 02:15:29,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=350440.0, ans=0.125 2023-12-22 02:15:36,298 INFO [train.py:886] (1/4) Epoch 12, batch 150, loss[loss=0.01638, audio_tagging_loss=0.01638, over 24750.00 frames. ], tot_loss[loss=0.0184, audio_tagging_loss=0.0184, over 2639377.89 frames. ], batch size: 99, lr: 9.31e-03, grad_scale: 64.0 2023-12-22 02:16:03,809 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2023-12-22 02:16:04,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=350640.0, ans=0.95 2023-12-22 02:16:07,825 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.40 vs. limit=22.5 2023-12-22 02:16:12,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=350706.6666666667, ans=0.125 2023-12-22 02:16:18,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=350773.3333333333, ans=0.125 2023-12-22 02:16:27,250 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.398e+01 2.700e+01 2.881e+01 3.001e+01 3.518e+01, threshold=5.761e+01, percent-clipped=0.0 2023-12-22 02:16:27,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=350773.3333333333, ans=0.125 2023-12-22 02:16:29,886 INFO [train.py:886] (1/4) Epoch 12, batch 200, loss[loss=0.01469, audio_tagging_loss=0.01469, over 25000.00 frames. ], tot_loss[loss=0.01729, audio_tagging_loss=0.01729, over 3158417.26 frames. ], batch size: 100, lr: 9.31e-03, grad_scale: 64.0 2023-12-22 02:17:20,969 INFO [train.py:886] (1/4) Epoch 12, batch 250, loss[loss=0.01868, audio_tagging_loss=0.01868, over 25000.00 frames. ], tot_loss[loss=0.01664, audio_tagging_loss=0.01664, over 3560927.21 frames. ], batch size: 100, lr: 9.30e-03, grad_scale: 64.0 2023-12-22 02:17:24,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=351173.3333333333, ans=0.1 2023-12-22 02:17:56,867 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.75 vs. limit=15.0 2023-12-22 02:18:01,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=351440.0, ans=0.125 2023-12-22 02:18:10,375 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.198e+01 2.670e+01 2.789e+01 2.930e+01 3.431e+01, threshold=5.578e+01, percent-clipped=0.0 2023-12-22 02:18:12,283 INFO [train.py:886] (1/4) Epoch 12, batch 300, loss[loss=0.01466, audio_tagging_loss=0.01466, over 24750.00 frames. ], tot_loss[loss=0.01625, audio_tagging_loss=0.01625, over 3866707.75 frames. ], batch size: 99, lr: 9.30e-03, grad_scale: 64.0 2023-12-22 02:18:13,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=351506.6666666667, ans=0.1 2023-12-22 02:18:14,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=351506.6666666667, ans=0.125 2023-12-22 02:18:17,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=351506.6666666667, ans=0.1 2023-12-22 02:18:51,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=351706.6666666667, ans=0.125 2023-12-22 02:18:59,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=351773.3333333333, ans=0.125 2023-12-22 02:19:02,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=351773.3333333333, ans=0.0 2023-12-22 02:19:03,711 INFO [train.py:886] (1/4) Epoch 12, batch 350, loss[loss=0.01896, audio_tagging_loss=0.01896, over 25000.00 frames. ], tot_loss[loss=0.01609, audio_tagging_loss=0.01609, over 4096885.26 frames. ], batch size: 100, lr: 9.29e-03, grad_scale: 64.0 2023-12-22 02:19:04,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=351840.0, ans=0.0 2023-12-22 02:19:11,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=351840.0, ans=0.0 2023-12-22 02:19:52,870 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.226e+01 2.603e+01 2.805e+01 2.915e+01 3.693e+01, threshold=5.611e+01, percent-clipped=0.0 2023-12-22 02:19:55,518 INFO [train.py:886] (1/4) Epoch 12, batch 400, loss[loss=0.01429, audio_tagging_loss=0.01429, over 25000.00 frames. ], tot_loss[loss=0.01569, audio_tagging_loss=0.01569, over 4290971.98 frames. ], batch size: 100, lr: 9.29e-03, grad_scale: 64.0 2023-12-22 02:19:55,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=352173.3333333333, ans=0.0 2023-12-22 02:19:56,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=352173.3333333333, ans=0.04949747468305833 2023-12-22 02:19:57,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=352173.3333333333, ans=10.0 2023-12-22 02:20:12,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=352240.0, ans=0.125 2023-12-22 02:20:43,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=352440.0, ans=0.05 2023-12-22 02:20:48,159 INFO [train.py:886] (1/4) Epoch 12, batch 450, loss[loss=0.01336, audio_tagging_loss=0.01336, over 25000.00 frames. ], tot_loss[loss=0.0154, audio_tagging_loss=0.0154, over 4437429.07 frames. ], batch size: 100, lr: 9.29e-03, grad_scale: 64.0 2023-12-22 02:20:51,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=352506.6666666667, ans=0.1 2023-12-22 02:20:56,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.51 vs. limit=22.5 2023-12-22 02:21:37,259 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.343e+01 2.603e+01 2.721e+01 2.857e+01 3.643e+01, threshold=5.441e+01, percent-clipped=0.0 2023-12-22 02:21:39,865 INFO [train.py:886] (1/4) Epoch 12, batch 500, loss[loss=0.01602, audio_tagging_loss=0.01602, over 25000.00 frames. ], tot_loss[loss=0.01512, audio_tagging_loss=0.01512, over 4550116.78 frames. ], batch size: 100, lr: 9.28e-03, grad_scale: 64.0 2023-12-22 02:21:50,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=352906.6666666667, ans=0.125 2023-12-22 02:21:54,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.52 vs. limit=22.5 2023-12-22 02:21:55,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=352906.6666666667, ans=15.0 2023-12-22 02:22:02,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=352973.3333333333, ans=0.05 2023-12-22 02:22:03,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=352973.3333333333, ans=0.0 2023-12-22 02:22:31,418 INFO [train.py:886] (1/4) Epoch 12, batch 550, loss[loss=0.014, audio_tagging_loss=0.014, over 25000.00 frames. ], tot_loss[loss=0.01502, audio_tagging_loss=0.01502, over 4644822.30 frames. ], batch size: 100, lr: 9.28e-03, grad_scale: 64.0 2023-12-22 02:22:39,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.21 vs. limit=10.0 2023-12-22 02:22:42,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=353240.0, ans=0.0 2023-12-22 02:22:57,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=353306.6666666667, ans=0.2 2023-12-22 02:23:03,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=353373.3333333333, ans=0.125 2023-12-22 02:23:04,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=353373.3333333333, ans=0.0 2023-12-22 02:23:21,418 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.353e+01 2.657e+01 2.754e+01 2.932e+01 3.860e+01, threshold=5.508e+01, percent-clipped=0.0 2023-12-22 02:23:23,355 INFO [train.py:886] (1/4) Epoch 12, batch 600, loss[loss=0.01397, audio_tagging_loss=0.01397, over 24750.00 frames. ], tot_loss[loss=0.01509, audio_tagging_loss=0.01509, over 4711105.69 frames. ], batch size: 99, lr: 9.27e-03, grad_scale: 64.0 2023-12-22 02:23:39,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=353573.3333333333, ans=0.0 2023-12-22 02:23:42,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=353573.3333333333, ans=0.125 2023-12-22 02:23:46,613 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.46 vs. limit=15.0 2023-12-22 02:24:15,642 INFO [train.py:886] (1/4) Epoch 12, batch 650, loss[loss=0.01746, audio_tagging_loss=0.01746, over 25000.00 frames. ], tot_loss[loss=0.01515, audio_tagging_loss=0.01515, over 4761004.45 frames. ], batch size: 100, lr: 9.27e-03, grad_scale: 64.0 2023-12-22 02:24:24,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=353840.0, ans=0.125 2023-12-22 02:24:35,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=353973.3333333333, ans=0.2 2023-12-22 02:24:39,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=353973.3333333333, ans=0.125 2023-12-22 02:24:45,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=353973.3333333333, ans=0.0 2023-12-22 02:24:46,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=354040.0, ans=0.0 2023-12-22 02:24:56,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=354106.6666666667, ans=0.0 2023-12-22 02:25:05,086 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.302e+01 2.660e+01 2.826e+01 2.982e+01 3.638e+01, threshold=5.651e+01, percent-clipped=0.0 2023-12-22 02:25:07,018 INFO [train.py:886] (1/4) Epoch 12, batch 700, loss[loss=0.01476, audio_tagging_loss=0.01476, over 24750.00 frames. ], tot_loss[loss=0.01504, audio_tagging_loss=0.01504, over 4797727.04 frames. ], batch size: 99, lr: 9.26e-03, grad_scale: 64.0 2023-12-22 02:25:15,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=354173.3333333333, ans=0.05 2023-12-22 02:25:20,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=354240.0, ans=0.125 2023-12-22 02:25:27,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=354306.6666666667, ans=0.2 2023-12-22 02:25:27,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=354306.6666666667, ans=0.125 2023-12-22 02:25:29,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=354306.6666666667, ans=0.125 2023-12-22 02:25:30,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=354306.6666666667, ans=0.025 2023-12-22 02:25:31,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=354306.6666666667, ans=0.125 2023-12-22 02:25:45,857 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.79 vs. limit=12.0 2023-12-22 02:25:47,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=354373.3333333333, ans=0.125 2023-12-22 02:25:49,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=354440.0, ans=0.2 2023-12-22 02:25:53,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=354440.0, ans=0.0 2023-12-22 02:25:57,003 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.14 vs. limit=15.0 2023-12-22 02:25:59,242 INFO [train.py:886] (1/4) Epoch 12, batch 750, loss[loss=0.01419, audio_tagging_loss=0.01419, over 25000.00 frames. ], tot_loss[loss=0.01493, audio_tagging_loss=0.01493, over 4833484.42 frames. ], batch size: 100, lr: 9.26e-03, grad_scale: 64.0 2023-12-22 02:26:02,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=354506.6666666667, ans=0.125 2023-12-22 02:26:02,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=354506.6666666667, ans=0.0 2023-12-22 02:26:03,515 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.74 vs. limit=6.0 2023-12-22 02:26:12,656 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.56 vs. limit=15.0 2023-12-22 02:26:16,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=354573.3333333333, ans=0.0 2023-12-22 02:26:22,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=354640.0, ans=15.0 2023-12-22 02:26:38,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=354706.6666666667, ans=0.125 2023-12-22 02:26:45,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=354773.3333333333, ans=0.125 2023-12-22 02:26:47,762 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.279e+01 2.606e+01 2.795e+01 2.916e+01 3.346e+01, threshold=5.591e+01, percent-clipped=0.0 2023-12-22 02:26:50,426 INFO [train.py:886] (1/4) Epoch 12, batch 800, loss[loss=0.01264, audio_tagging_loss=0.01264, over 25000.00 frames. ], tot_loss[loss=0.0149, audio_tagging_loss=0.0149, over 4862824.77 frames. ], batch size: 100, lr: 9.26e-03, grad_scale: 64.0 2023-12-22 02:26:53,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=354840.0, ans=0.1 2023-12-22 02:26:57,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=354840.0, ans=0.0 2023-12-22 02:27:27,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=355040.0, ans=0.125 2023-12-22 02:27:40,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=355106.6666666667, ans=0.0 2023-12-22 02:27:41,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=355173.3333333333, ans=0.09899494936611666 2023-12-22 02:27:42,027 INFO [train.py:886] (1/4) Epoch 12, batch 850, loss[loss=0.01585, audio_tagging_loss=0.01585, over 25000.00 frames. ], tot_loss[loss=0.01477, audio_tagging_loss=0.01477, over 4887540.88 frames. ], batch size: 100, lr: 9.25e-03, grad_scale: 64.0 2023-12-22 02:28:18,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=355373.3333333333, ans=0.125 2023-12-22 02:28:22,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=355440.0, ans=0.0 2023-12-22 02:28:24,991 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.48 vs. limit=15.0 2023-12-22 02:28:32,636 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.327e+01 2.693e+01 2.795e+01 2.929e+01 3.864e+01, threshold=5.590e+01, percent-clipped=0.0 2023-12-22 02:28:34,548 INFO [train.py:886] (1/4) Epoch 12, batch 900, loss[loss=0.01494, audio_tagging_loss=0.01494, over 24750.00 frames. ], tot_loss[loss=0.01475, audio_tagging_loss=0.01475, over 4898552.46 frames. ], batch size: 99, lr: 9.25e-03, grad_scale: 64.0 2023-12-22 02:28:38,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=355506.6666666667, ans=0.125 2023-12-22 02:28:44,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=355573.3333333333, ans=0.125 2023-12-22 02:28:48,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=355573.3333333333, ans=0.0 2023-12-22 02:28:58,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=355640.0, ans=0.125 2023-12-22 02:29:07,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=355706.6666666667, ans=0.125 2023-12-22 02:29:20,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=355773.3333333333, ans=15.0 2023-12-22 02:29:26,317 INFO [train.py:886] (1/4) Epoch 12, batch 950, loss[loss=0.01423, audio_tagging_loss=0.01423, over 24750.00 frames. ], tot_loss[loss=0.01491, audio_tagging_loss=0.01491, over 4907125.66 frames. ], batch size: 99, lr: 9.24e-03, grad_scale: 64.0 2023-12-22 02:29:58,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=356040.0, ans=0.0 2023-12-22 02:30:08,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=356106.6666666667, ans=0.125 2023-12-22 02:30:16,918 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.327e+01 2.679e+01 2.792e+01 2.958e+01 3.407e+01, threshold=5.583e+01, percent-clipped=0.0 2023-12-22 02:30:18,816 INFO [train.py:886] (1/4) Epoch 12, batch 1000, loss[loss=0.0125, audio_tagging_loss=0.0125, over 24750.00 frames. ], tot_loss[loss=0.01501, audio_tagging_loss=0.01501, over 4910605.01 frames. ], batch size: 99, lr: 9.24e-03, grad_scale: 64.0 2023-12-22 02:30:21,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=356173.3333333333, ans=0.125 2023-12-22 02:30:30,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=356240.0, ans=0.1 2023-12-22 02:30:33,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.92 vs. limit=22.5 2023-12-22 02:30:36,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=356240.0, ans=0.1 2023-12-22 02:30:37,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=356240.0, ans=0.2 2023-12-22 02:30:54,337 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2023-12-22 02:31:03,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=356440.0, ans=0.025 2023-12-22 02:31:03,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=356440.0, ans=0.0 2023-12-22 02:31:10,663 INFO [train.py:886] (1/4) Epoch 12, batch 1050, loss[loss=0.0135, audio_tagging_loss=0.0135, over 24750.00 frames. ], tot_loss[loss=0.01487, audio_tagging_loss=0.01487, over 4913649.66 frames. ], batch size: 99, lr: 9.23e-03, grad_scale: 64.0 2023-12-22 02:31:13,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=356506.6666666667, ans=0.2 2023-12-22 02:31:17,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=356506.6666666667, ans=0.125 2023-12-22 02:31:39,263 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.37 vs. limit=15.0 2023-12-22 02:31:48,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=356706.6666666667, ans=0.125 2023-12-22 02:31:56,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=356773.3333333333, ans=0.125 2023-12-22 02:32:00,447 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.351e+01 2.676e+01 2.799e+01 2.940e+01 3.260e+01, threshold=5.599e+01, percent-clipped=0.0 2023-12-22 02:32:02,367 INFO [train.py:886] (1/4) Epoch 12, batch 1100, loss[loss=0.01247, audio_tagging_loss=0.01247, over 23942.00 frames. ], tot_loss[loss=0.01471, audio_tagging_loss=0.01471, over 4920508.92 frames. ], batch size: 100, lr: 9.23e-03, grad_scale: 64.0 2023-12-22 02:32:08,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=356840.0, ans=0.125 2023-12-22 02:32:11,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=356906.6666666667, ans=0.09899494936611666 2023-12-22 02:32:14,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=356906.6666666667, ans=0.125 2023-12-22 02:32:32,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=357040.0, ans=0.125 2023-12-22 02:32:37,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=357040.0, ans=0.125 2023-12-22 02:32:52,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=357106.6666666667, ans=0.125 2023-12-22 02:32:54,166 INFO [train.py:886] (1/4) Epoch 12, batch 1150, loss[loss=0.01585, audio_tagging_loss=0.01585, over 25000.00 frames. ], tot_loss[loss=0.01475, audio_tagging_loss=0.01475, over 4927930.06 frames. ], batch size: 100, lr: 9.23e-03, grad_scale: 64.0 2023-12-22 02:33:10,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=357240.0, ans=0.1 2023-12-22 02:33:13,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=357240.0, ans=0.125 2023-12-22 02:33:16,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=357306.6666666667, ans=0.125 2023-12-22 02:33:21,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=357306.6666666667, ans=0.1 2023-12-22 02:33:22,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=357306.6666666667, ans=0.0 2023-12-22 02:33:27,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=357373.3333333333, ans=0.125 2023-12-22 02:33:44,381 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 2.648e+01 2.751e+01 2.936e+01 3.446e+01, threshold=5.502e+01, percent-clipped=0.0 2023-12-22 02:33:46,304 INFO [train.py:886] (1/4) Epoch 12, batch 1200, loss[loss=0.01457, audio_tagging_loss=0.01457, over 25000.00 frames. ], tot_loss[loss=0.01484, audio_tagging_loss=0.01484, over 4938645.93 frames. ], batch size: 100, lr: 9.22e-03, grad_scale: 64.0 2023-12-22 02:33:48,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=357506.6666666667, ans=0.125 2023-12-22 02:33:55,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=357573.3333333333, ans=0.2 2023-12-22 02:34:01,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=357573.3333333333, ans=0.0 2023-12-22 02:34:03,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=357573.3333333333, ans=0.125 2023-12-22 02:34:24,634 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.05 vs. limit=12.0 2023-12-22 02:34:26,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=357706.6666666667, ans=0.125 2023-12-22 02:34:38,780 INFO [train.py:886] (1/4) Epoch 12, batch 1250, loss[loss=0.01562, audio_tagging_loss=0.01562, over 22171.00 frames. ], tot_loss[loss=0.01485, audio_tagging_loss=0.01485, over 4929895.36 frames. ], batch size: 107, lr: 9.22e-03, grad_scale: 64.0 2023-12-22 02:34:45,530 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.99 vs. limit=15.0 2023-12-22 02:35:02,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=357973.3333333333, ans=0.2 2023-12-22 02:35:05,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=357973.3333333333, ans=0.1 2023-12-22 02:35:19,696 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.59 vs. limit=15.0 2023-12-22 02:35:28,222 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.325e+01 2.720e+01 2.848e+01 2.999e+01 3.607e+01, threshold=5.697e+01, percent-clipped=0.0 2023-12-22 02:35:30,134 INFO [train.py:886] (1/4) Epoch 12, batch 1300, loss[loss=0.01577, audio_tagging_loss=0.01577, over 24750.00 frames. ], tot_loss[loss=0.01499, audio_tagging_loss=0.01499, over 4930376.67 frames. ], batch size: 99, lr: 9.21e-03, grad_scale: 64.0 2023-12-22 02:35:45,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=358240.0, ans=0.0 2023-12-22 02:36:02,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=358373.3333333333, ans=0.125 2023-12-22 02:36:02,684 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.39 vs. limit=15.0 2023-12-22 02:36:16,122 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.87 vs. limit=15.0 2023-12-22 02:36:22,382 INFO [train.py:886] (1/4) Epoch 12, batch 1350, loss[loss=0.01379, audio_tagging_loss=0.01379, over 25000.00 frames. ], tot_loss[loss=0.01494, audio_tagging_loss=0.01494, over 4926523.90 frames. ], batch size: 100, lr: 9.21e-03, grad_scale: 64.0 2023-12-22 02:36:34,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=358573.3333333333, ans=0.125 2023-12-22 02:36:58,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=358706.6666666667, ans=0.125 2023-12-22 02:37:12,422 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.310e+01 2.686e+01 2.846e+01 3.039e+01 3.537e+01, threshold=5.691e+01, percent-clipped=0.0 2023-12-22 02:37:14,349 INFO [train.py:886] (1/4) Epoch 12, batch 1400, loss[loss=0.01639, audio_tagging_loss=0.01639, over 25000.00 frames. ], tot_loss[loss=0.01484, audio_tagging_loss=0.01484, over 4933124.39 frames. ], batch size: 100, lr: 9.20e-03, grad_scale: 64.0 2023-12-22 02:37:16,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=358840.0, ans=0.2 2023-12-22 02:37:28,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=358906.6666666667, ans=0.125 2023-12-22 02:37:29,050 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.70 vs. limit=10.0 2023-12-22 02:37:31,686 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=32.33 vs. limit=22.5 2023-12-22 02:37:32,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=358906.6666666667, ans=0.125 2023-12-22 02:37:34,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=358973.3333333333, ans=0.125 2023-12-22 02:37:36,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=358973.3333333333, ans=0.0 2023-12-22 02:37:37,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=358973.3333333333, ans=0.2 2023-12-22 02:38:04,764 INFO [train.py:886] (1/4) Epoch 12, batch 1450, loss[loss=0.01337, audio_tagging_loss=0.01337, over 24750.00 frames. ], tot_loss[loss=0.01475, audio_tagging_loss=0.01475, over 4937408.09 frames. ], batch size: 99, lr: 9.20e-03, grad_scale: 64.0 2023-12-22 02:38:05,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=359173.3333333333, ans=0.125 2023-12-22 02:38:17,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=359240.0, ans=0.125 2023-12-22 02:38:32,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=359306.6666666667, ans=0.0 2023-12-22 02:38:34,036 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.80 vs. limit=15.0 2023-12-22 02:38:39,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.93 vs. limit=15.0 2023-12-22 02:38:54,586 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.353e+01 2.648e+01 2.789e+01 2.949e+01 3.520e+01, threshold=5.579e+01, percent-clipped=0.0 2023-12-22 02:38:56,519 INFO [train.py:886] (1/4) Epoch 12, batch 1500, loss[loss=0.01497, audio_tagging_loss=0.01497, over 21580.00 frames. ], tot_loss[loss=0.01482, audio_tagging_loss=0.01482, over 4941395.01 frames. ], batch size: 107, lr: 9.20e-03, grad_scale: 64.0 2023-12-22 02:39:30,283 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.72 vs. limit=15.0 2023-12-22 02:39:32,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=359706.6666666667, ans=0.0 2023-12-22 02:39:37,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=359706.6666666667, ans=0.125 2023-12-22 02:39:39,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=359773.3333333333, ans=10.0 2023-12-22 02:39:39,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=359773.3333333333, ans=0.2 2023-12-22 02:39:44,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=359773.3333333333, ans=0.0 2023-12-22 02:39:50,016 INFO [train.py:886] (1/4) Epoch 12, batch 1550, loss[loss=0.01574, audio_tagging_loss=0.01574, over 24750.00 frames. ], tot_loss[loss=0.01496, audio_tagging_loss=0.01496, over 4944074.97 frames. ], batch size: 99, lr: 9.19e-03, grad_scale: 64.0 2023-12-22 02:39:56,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=359840.0, ans=0.09899494936611666 2023-12-22 02:40:02,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=359906.6666666667, ans=0.1 2023-12-22 02:40:04,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.68 vs. limit=15.0 2023-12-22 02:40:08,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=359906.6666666667, ans=0.5 2023-12-22 02:40:10,835 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.42 vs. limit=15.0 2023-12-22 02:40:23,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=360040.0, ans=0.1 2023-12-22 02:40:26,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=360040.0, ans=0.125 2023-12-22 02:40:26,671 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.73 vs. limit=10.0 2023-12-22 02:40:31,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=360106.6666666667, ans=0.1 2023-12-22 02:40:39,641 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.401e+01 2.713e+01 2.839e+01 3.034e+01 3.584e+01, threshold=5.677e+01, percent-clipped=0.0 2023-12-22 02:40:41,603 INFO [train.py:886] (1/4) Epoch 12, batch 1600, loss[loss=0.01482, audio_tagging_loss=0.01482, over 24750.00 frames. ], tot_loss[loss=0.0149, audio_tagging_loss=0.0149, over 4936896.78 frames. ], batch size: 99, lr: 9.19e-03, grad_scale: 64.0 2023-12-22 02:40:42,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=360173.3333333333, ans=0.1 2023-12-22 02:40:43,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=360173.3333333333, ans=0.1 2023-12-22 02:41:05,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=360306.6666666667, ans=0.125 2023-12-22 02:41:13,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=360373.3333333333, ans=0.1 2023-12-22 02:41:28,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=360440.0, ans=0.2 2023-12-22 02:41:32,826 INFO [train.py:886] (1/4) Epoch 12, batch 1650, loss[loss=0.01578, audio_tagging_loss=0.01578, over 24750.00 frames. ], tot_loss[loss=0.01489, audio_tagging_loss=0.01489, over 4941930.63 frames. ], batch size: 99, lr: 9.18e-03, grad_scale: 64.0 2023-12-22 02:41:45,237 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.52 vs. limit=15.0 2023-12-22 02:41:48,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=360573.3333333333, ans=0.125 2023-12-22 02:42:02,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=360640.0, ans=0.125 2023-12-22 02:42:22,671 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.365e+01 2.644e+01 2.817e+01 2.973e+01 3.664e+01, threshold=5.633e+01, percent-clipped=0.0 2023-12-22 02:42:25,265 INFO [train.py:886] (1/4) Epoch 12, batch 1700, loss[loss=0.01556, audio_tagging_loss=0.01556, over 25000.00 frames. ], tot_loss[loss=0.01481, audio_tagging_loss=0.01481, over 4941497.33 frames. ], batch size: 100, lr: 9.18e-03, grad_scale: 64.0 2023-12-22 02:42:32,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=360840.0, ans=0.125 2023-12-22 02:42:48,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=360973.3333333333, ans=0.125 2023-12-22 02:42:49,985 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.25 vs. limit=6.0 2023-12-22 02:43:05,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=361106.6666666667, ans=0.125 2023-12-22 02:43:13,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=361106.6666666667, ans=0.0 2023-12-22 02:43:16,379 INFO [train.py:886] (1/4) Epoch 12, batch 1750, loss[loss=0.0158, audio_tagging_loss=0.0158, over 24750.00 frames. ], tot_loss[loss=0.01477, audio_tagging_loss=0.01477, over 4951638.51 frames. ], batch size: 99, lr: 9.18e-03, grad_scale: 64.0 2023-12-22 02:43:24,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=361173.3333333333, ans=0.0 2023-12-22 02:43:24,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=361173.3333333333, ans=0.0 2023-12-22 02:43:24,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=361173.3333333333, ans=0.125 2023-12-22 02:43:47,528 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 02:43:59,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=361440.0, ans=0.2 2023-12-22 02:44:05,160 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=17.51 vs. limit=15.0 2023-12-22 02:44:06,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=361440.0, ans=0.2 2023-12-22 02:44:07,343 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.350e+01 2.657e+01 2.802e+01 2.979e+01 3.545e+01, threshold=5.603e+01, percent-clipped=0.0 2023-12-22 02:44:09,259 INFO [train.py:886] (1/4) Epoch 12, batch 1800, loss[loss=0.01504, audio_tagging_loss=0.01504, over 25000.00 frames. ], tot_loss[loss=0.0148, audio_tagging_loss=0.0148, over 4957411.85 frames. ], batch size: 100, lr: 9.17e-03, grad_scale: 64.0 2023-12-22 02:44:09,748 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.50 vs. limit=6.0 2023-12-22 02:44:13,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=361506.6666666667, ans=0.0 2023-12-22 02:44:21,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=361573.3333333333, ans=0.125 2023-12-22 02:44:58,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=361773.3333333333, ans=0.0 2023-12-22 02:45:00,408 INFO [train.py:886] (1/4) Epoch 12, batch 1850, loss[loss=0.01526, audio_tagging_loss=0.01526, over 24750.00 frames. ], tot_loss[loss=0.01489, audio_tagging_loss=0.01489, over 4960829.94 frames. ], batch size: 99, lr: 9.17e-03, grad_scale: 64.0 2023-12-22 02:45:03,519 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.84 vs. limit=15.0 2023-12-22 02:45:16,110 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.844e-01 2023-12-22 02:45:29,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=361973.3333333333, ans=0.125 2023-12-22 02:45:47,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=362106.6666666667, ans=0.125 2023-12-22 02:45:48,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=362106.6666666667, ans=0.0 2023-12-22 02:45:50,666 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.408e+01 2.739e+01 2.901e+01 3.056e+01 4.118e+01, threshold=5.802e+01, percent-clipped=0.0 2023-12-22 02:45:53,271 INFO [train.py:886] (1/4) Epoch 12, batch 1900, loss[loss=0.01703, audio_tagging_loss=0.01703, over 24750.00 frames. ], tot_loss[loss=0.01497, audio_tagging_loss=0.01497, over 4955978.54 frames. ], batch size: 99, lr: 9.16e-03, grad_scale: 64.0 2023-12-22 02:45:53,761 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.22 vs. limit=15.0 2023-12-22 02:46:01,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=362173.3333333333, ans=0.125 2023-12-22 02:46:10,243 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.70 vs. limit=15.0 2023-12-22 02:46:14,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=362306.6666666667, ans=0.0 2023-12-22 02:46:15,932 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.75 vs. limit=22.5 2023-12-22 02:46:25,976 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.54 vs. limit=15.0 2023-12-22 02:46:33,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=362440.0, ans=0.125 2023-12-22 02:46:33,604 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.65 vs. limit=15.0 2023-12-22 02:46:45,597 INFO [train.py:886] (1/4) Epoch 12, batch 1950, loss[loss=0.01517, audio_tagging_loss=0.01517, over 24750.00 frames. ], tot_loss[loss=0.01489, audio_tagging_loss=0.01489, over 4948281.93 frames. ], batch size: 99, lr: 9.16e-03, grad_scale: 64.0 2023-12-22 02:46:57,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=362573.3333333333, ans=0.035 2023-12-22 02:47:08,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=362640.0, ans=0.125 2023-12-22 02:47:10,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=362640.0, ans=0.125 2023-12-22 02:47:24,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=362773.3333333333, ans=0.125 2023-12-22 02:47:27,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=362773.3333333333, ans=0.0 2023-12-22 02:47:33,919 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 2.661e+01 2.847e+01 3.018e+01 3.786e+01, threshold=5.694e+01, percent-clipped=0.0 2023-12-22 02:47:35,875 INFO [train.py:886] (1/4) Epoch 12, batch 2000, loss[loss=0.0134, audio_tagging_loss=0.0134, over 24750.00 frames. ], tot_loss[loss=0.01473, audio_tagging_loss=0.01473, over 4944657.45 frames. ], batch size: 99, lr: 9.16e-03, grad_scale: 128.0 2023-12-22 02:47:56,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=362973.3333333333, ans=0.2 2023-12-22 02:47:57,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=362973.3333333333, ans=0.2 2023-12-22 02:48:00,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=362973.3333333333, ans=0.2 2023-12-22 02:48:04,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=362973.3333333333, ans=0.125 2023-12-22 02:48:04,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=362973.3333333333, ans=0.0 2023-12-22 02:48:07,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=363040.0, ans=0.0 2023-12-22 02:48:14,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=363040.0, ans=0.1 2023-12-22 02:48:15,205 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.29 vs. limit=22.5 2023-12-22 02:48:28,427 INFO [train.py:886] (1/4) Epoch 12, batch 2050, loss[loss=0.0178, audio_tagging_loss=0.0178, over 25000.00 frames. ], tot_loss[loss=0.0147, audio_tagging_loss=0.0147, over 4936061.34 frames. ], batch size: 100, lr: 9.15e-03, grad_scale: 64.0 2023-12-22 02:49:17,503 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.419e+01 2.662e+01 2.833e+01 2.958e+01 3.467e+01, threshold=5.665e+01, percent-clipped=0.0 2023-12-22 02:49:18,482 INFO [train.py:886] (1/4) Epoch 12, batch 2100, loss[loss=0.0176, audio_tagging_loss=0.0176, over 25000.00 frames. ], tot_loss[loss=0.01477, audio_tagging_loss=0.01477, over 4940647.98 frames. ], batch size: 100, lr: 9.15e-03, grad_scale: 64.0 2023-12-22 02:49:33,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.39 vs. limit=15.0 2023-12-22 02:49:35,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=363573.3333333333, ans=0.2 2023-12-22 02:49:36,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=363573.3333333333, ans=0.125 2023-12-22 02:50:11,417 INFO [train.py:886] (1/4) Epoch 12, batch 2150, loss[loss=0.01772, audio_tagging_loss=0.01772, over 25000.00 frames. ], tot_loss[loss=0.01481, audio_tagging_loss=0.01481, over 4949218.49 frames. ], batch size: 100, lr: 9.14e-03, grad_scale: 64.0 2023-12-22 02:50:17,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=363840.0, ans=0.5 2023-12-22 02:50:20,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=363906.6666666667, ans=0.025 2023-12-22 02:50:25,269 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.17 vs. limit=15.0 2023-12-22 02:50:33,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=363973.3333333333, ans=0.0 2023-12-22 02:50:43,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=364040.0, ans=0.125 2023-12-22 02:50:49,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.93 vs. limit=15.0 2023-12-22 02:51:02,677 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.357e+01 2.698e+01 2.881e+01 3.027e+01 3.417e+01, threshold=5.762e+01, percent-clipped=0.0 2023-12-22 02:51:03,658 INFO [train.py:886] (1/4) Epoch 12, batch 2200, loss[loss=0.01339, audio_tagging_loss=0.01339, over 24750.00 frames. ], tot_loss[loss=0.01485, audio_tagging_loss=0.01485, over 4948067.98 frames. ], batch size: 99, lr: 9.14e-03, grad_scale: 64.0 2023-12-22 02:51:13,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=364240.0, ans=0.125 2023-12-22 02:51:24,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=364306.6666666667, ans=0.1 2023-12-22 02:51:25,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=364306.6666666667, ans=0.1 2023-12-22 02:51:37,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=364373.3333333333, ans=0.125 2023-12-22 02:51:37,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=364373.3333333333, ans=0.0 2023-12-22 02:51:55,240 INFO [train.py:886] (1/4) Epoch 12, batch 2250, loss[loss=0.0134, audio_tagging_loss=0.0134, over 25000.00 frames. ], tot_loss[loss=0.01496, audio_tagging_loss=0.01496, over 4938897.86 frames. ], batch size: 100, lr: 9.13e-03, grad_scale: 64.0 2023-12-22 02:52:02,102 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.08 vs. limit=15.0 2023-12-22 02:52:08,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=364573.3333333333, ans=0.1 2023-12-22 02:52:08,445 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.64 vs. limit=10.0 2023-12-22 02:52:19,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=364640.0, ans=0.125 2023-12-22 02:52:27,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=364706.6666666667, ans=0.0 2023-12-22 02:52:28,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=364706.6666666667, ans=0.125 2023-12-22 02:52:30,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=364706.6666666667, ans=0.0 2023-12-22 02:52:30,751 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.99 vs. limit=15.0 2023-12-22 02:52:44,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=364773.3333333333, ans=0.2 2023-12-22 02:52:46,432 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.374e+01 2.658e+01 2.783e+01 2.953e+01 5.165e+01, threshold=5.566e+01, percent-clipped=0.0 2023-12-22 02:52:47,412 INFO [train.py:886] (1/4) Epoch 12, batch 2300, loss[loss=0.01294, audio_tagging_loss=0.01294, over 24750.00 frames. ], tot_loss[loss=0.01485, audio_tagging_loss=0.01485, over 4944185.99 frames. ], batch size: 99, lr: 9.13e-03, grad_scale: 64.0 2023-12-22 02:52:50,232 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=9.105e-02 2023-12-22 02:52:53,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=364840.0, ans=0.125 2023-12-22 02:52:58,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=364906.6666666667, ans=15.0 2023-12-22 02:53:02,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=364906.6666666667, ans=0.125 2023-12-22 02:53:07,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=364973.3333333333, ans=0.125 2023-12-22 02:53:36,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=365106.6666666667, ans=0.125 2023-12-22 02:53:39,771 INFO [train.py:886] (1/4) Epoch 12, batch 2350, loss[loss=0.01848, audio_tagging_loss=0.01848, over 25000.00 frames. ], tot_loss[loss=0.01481, audio_tagging_loss=0.01481, over 4947150.63 frames. ], batch size: 100, lr: 9.13e-03, grad_scale: 64.0 2023-12-22 02:53:49,438 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.71 vs. limit=22.5 2023-12-22 02:53:59,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=365306.6666666667, ans=0.125 2023-12-22 02:54:03,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.36 vs. limit=15.0 2023-12-22 02:54:13,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=365373.3333333333, ans=0.05 2023-12-22 02:54:30,884 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.340e+01 2.667e+01 2.832e+01 3.022e+01 3.621e+01, threshold=5.664e+01, percent-clipped=0.0 2023-12-22 02:54:31,879 INFO [train.py:886] (1/4) Epoch 12, batch 2400, loss[loss=0.01647, audio_tagging_loss=0.01647, over 24750.00 frames. ], tot_loss[loss=0.0148, audio_tagging_loss=0.0148, over 4949458.48 frames. ], batch size: 99, lr: 9.12e-03, grad_scale: 64.0 2023-12-22 02:54:31,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=365506.6666666667, ans=0.125 2023-12-22 02:54:50,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=365573.3333333333, ans=0.125 2023-12-22 02:55:23,517 INFO [train.py:886] (1/4) Epoch 12, batch 2450, loss[loss=0.01589, audio_tagging_loss=0.01589, over 25000.00 frames. ], tot_loss[loss=0.01487, audio_tagging_loss=0.01487, over 4951863.73 frames. ], batch size: 100, lr: 9.12e-03, grad_scale: 64.0 2023-12-22 02:55:26,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=365840.0, ans=0.125 2023-12-22 02:55:32,496 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.93 vs. limit=12.0 2023-12-22 02:55:36,113 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.25 vs. limit=22.5 2023-12-22 02:55:38,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=365906.6666666667, ans=0.1 2023-12-22 02:55:39,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.48 vs. limit=15.0 2023-12-22 02:55:43,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=365973.3333333333, ans=0.125 2023-12-22 02:55:48,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=365973.3333333333, ans=0.2 2023-12-22 02:56:01,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=366040.0, ans=0.0 2023-12-22 02:56:14,273 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.373e+01 2.716e+01 2.829e+01 2.975e+01 3.465e+01, threshold=5.658e+01, percent-clipped=0.0 2023-12-22 02:56:15,273 INFO [train.py:886] (1/4) Epoch 12, batch 2500, loss[loss=0.01571, audio_tagging_loss=0.01571, over 24750.00 frames. ], tot_loss[loss=0.01482, audio_tagging_loss=0.01482, over 4942440.62 frames. ], batch size: 99, lr: 9.11e-03, grad_scale: 64.0 2023-12-22 02:56:17,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=366173.3333333333, ans=0.125 2023-12-22 02:56:24,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=366173.3333333333, ans=0.95 2023-12-22 02:56:41,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=366306.6666666667, ans=0.1 2023-12-22 02:56:43,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=366306.6666666667, ans=0.125 2023-12-22 02:56:53,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=366373.3333333333, ans=0.0 2023-12-22 02:57:01,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=366440.0, ans=0.125 2023-12-22 02:57:05,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=366506.6666666667, ans=0.125 2023-12-22 02:57:05,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=366506.6666666667, ans=6.0 2023-12-22 02:57:06,193 INFO [train.py:886] (1/4) Epoch 12, batch 2550, loss[loss=0.01647, audio_tagging_loss=0.01647, over 24750.00 frames. ], tot_loss[loss=0.01491, audio_tagging_loss=0.01491, over 4939979.38 frames. ], batch size: 99, lr: 9.11e-03, grad_scale: 64.0 2023-12-22 02:57:14,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=366506.6666666667, ans=0.125 2023-12-22 02:57:15,628 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.55 vs. limit=15.0 2023-12-22 02:57:35,118 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 02:57:51,008 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.22 vs. limit=10.0 2023-12-22 02:57:52,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=366773.3333333333, ans=0.5 2023-12-22 02:57:56,935 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.401e+01 2.648e+01 2.817e+01 3.039e+01 3.435e+01, threshold=5.634e+01, percent-clipped=0.0 2023-12-22 02:57:57,908 INFO [train.py:886] (1/4) Epoch 12, batch 2600, loss[loss=0.01528, audio_tagging_loss=0.01528, over 24750.00 frames. ], tot_loss[loss=0.01488, audio_tagging_loss=0.01488, over 4935906.23 frames. ], batch size: 99, lr: 9.11e-03, grad_scale: 64.0 2023-12-22 02:58:07,516 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 02:58:15,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=366906.6666666667, ans=0.1 2023-12-22 02:58:30,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=367040.0, ans=0.125 2023-12-22 02:58:47,994 INFO [train.py:886] (1/4) Epoch 12, batch 2650, loss[loss=0.01818, audio_tagging_loss=0.01818, over 25000.00 frames. ], tot_loss[loss=0.01484, audio_tagging_loss=0.01484, over 4937068.97 frames. ], batch size: 100, lr: 9.10e-03, grad_scale: 64.0 2023-12-22 02:59:01,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=367240.0, ans=0.1 2023-12-22 02:59:03,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=367240.0, ans=0.125 2023-12-22 02:59:12,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=367306.6666666667, ans=0.125 2023-12-22 02:59:12,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=367306.6666666667, ans=0.125 2023-12-22 02:59:20,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=367373.3333333333, ans=0.125 2023-12-22 02:59:22,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=367373.3333333333, ans=0.125 2023-12-22 02:59:24,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=367373.3333333333, ans=0.125 2023-12-22 02:59:33,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=367440.0, ans=0.2 2023-12-22 02:59:36,312 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=15.0 2023-12-22 02:59:38,847 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+01 2.659e+01 2.800e+01 2.953e+01 3.305e+01, threshold=5.601e+01, percent-clipped=0.0 2023-12-22 02:59:39,832 INFO [train.py:886] (1/4) Epoch 12, batch 2700, loss[loss=0.0177, audio_tagging_loss=0.0177, over 25000.00 frames. ], tot_loss[loss=0.01484, audio_tagging_loss=0.01484, over 4940873.81 frames. ], batch size: 100, lr: 9.10e-03, grad_scale: 64.0 2023-12-22 02:59:46,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=367506.6666666667, ans=0.125 2023-12-22 02:59:52,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=367573.3333333333, ans=0.0 2023-12-22 03:00:20,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=367773.3333333333, ans=0.125 2023-12-22 03:00:31,443 INFO [train.py:886] (1/4) Epoch 12, batch 2750, loss[loss=0.01447, audio_tagging_loss=0.01447, over 25000.00 frames. ], tot_loss[loss=0.01475, audio_tagging_loss=0.01475, over 4942118.23 frames. ], batch size: 100, lr: 9.09e-03, grad_scale: 64.0 2023-12-22 03:00:43,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=367906.6666666667, ans=10.0 2023-12-22 03:00:48,682 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.01 vs. limit=22.5 2023-12-22 03:00:57,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=367973.3333333333, ans=0.09899494936611666 2023-12-22 03:01:07,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=368040.0, ans=0.1 2023-12-22 03:01:07,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=368040.0, ans=0.04949747468305833 2023-12-22 03:01:16,140 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.77 vs. limit=22.5 2023-12-22 03:01:22,051 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.53 vs. limit=15.0 2023-12-22 03:01:22,370 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.314e+01 2.734e+01 2.863e+01 2.984e+01 3.983e+01, threshold=5.726e+01, percent-clipped=0.0 2023-12-22 03:01:23,374 INFO [train.py:886] (1/4) Epoch 12, batch 2800, loss[loss=0.01417, audio_tagging_loss=0.01417, over 24750.00 frames. ], tot_loss[loss=0.0148, audio_tagging_loss=0.0148, over 4949694.94 frames. ], batch size: 99, lr: 9.09e-03, grad_scale: 64.0 2023-12-22 03:01:30,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=368173.3333333333, ans=0.2 2023-12-22 03:01:35,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=368240.0, ans=0.125 2023-12-22 03:02:11,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=368440.0, ans=0.2 2023-12-22 03:02:16,471 INFO [train.py:886] (1/4) Epoch 12, batch 2850, loss[loss=0.01385, audio_tagging_loss=0.01385, over 24750.00 frames. ], tot_loss[loss=0.01485, audio_tagging_loss=0.01485, over 4941692.50 frames. ], batch size: 99, lr: 9.09e-03, grad_scale: 64.0 2023-12-22 03:02:25,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=368573.3333333333, ans=0.125 2023-12-22 03:02:36,396 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.18 vs. limit=15.0 2023-12-22 03:02:38,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=368640.0, ans=0.125 2023-12-22 03:02:46,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=368706.6666666667, ans=0.125 2023-12-22 03:02:46,974 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=12.0 2023-12-22 03:02:49,238 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.96 vs. limit=22.5 2023-12-22 03:03:00,838 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.34 vs. limit=15.0 2023-12-22 03:03:06,050 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+01 2.672e+01 2.791e+01 2.941e+01 3.403e+01, threshold=5.583e+01, percent-clipped=0.0 2023-12-22 03:03:07,714 INFO [train.py:886] (1/4) Epoch 12, batch 2900, loss[loss=0.01562, audio_tagging_loss=0.01562, over 24750.00 frames. ], tot_loss[loss=0.01483, audio_tagging_loss=0.01483, over 4939799.16 frames. ], batch size: 99, lr: 9.08e-03, grad_scale: 64.0 2023-12-22 03:03:12,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=368840.0, ans=0.07 2023-12-22 03:03:53,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=369106.6666666667, ans=0.0 2023-12-22 03:03:59,029 INFO [train.py:886] (1/4) Epoch 12, batch 2950, loss[loss=0.01288, audio_tagging_loss=0.01288, over 25000.00 frames. ], tot_loss[loss=0.01472, audio_tagging_loss=0.01472, over 4943805.54 frames. ], batch size: 100, lr: 9.08e-03, grad_scale: 64.0 2023-12-22 03:04:07,415 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 03:04:11,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=369240.0, ans=0.1 2023-12-22 03:04:16,645 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.79 vs. limit=15.0 2023-12-22 03:04:31,893 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.77 vs. limit=15.0 2023-12-22 03:04:34,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=369373.3333333333, ans=0.1 2023-12-22 03:04:39,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=369373.3333333333, ans=0.2 2023-12-22 03:04:40,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.86 vs. limit=15.0 2023-12-22 03:04:50,023 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.303e+01 2.641e+01 2.799e+01 3.004e+01 3.517e+01, threshold=5.597e+01, percent-clipped=0.0 2023-12-22 03:04:51,018 INFO [train.py:886] (1/4) Epoch 12, batch 3000, loss[loss=0.01926, audio_tagging_loss=0.01926, over 25000.00 frames. ], tot_loss[loss=0.01461, audio_tagging_loss=0.01461, over 4951440.85 frames. ], batch size: 100, lr: 9.07e-03, grad_scale: 64.0 2023-12-22 03:04:51,019 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 03:05:12,277 INFO [train.py:917] (1/4) Epoch 12, validation: loss=0.03429, audio_tagging_loss=0.03429, over 3737520.00 frames. 2023-12-22 03:05:12,278 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 03:05:32,367 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.53 vs. limit=15.0 2023-12-22 03:05:37,045 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.07 vs. limit=15.0 2023-12-22 03:05:45,255 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.06 vs. limit=15.0 2023-12-22 03:05:52,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=369706.6666666667, ans=0.025 2023-12-22 03:06:02,811 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.35 vs. limit=15.0 2023-12-22 03:06:03,276 INFO [train.py:886] (1/4) Epoch 12, batch 3050, loss[loss=0.01502, audio_tagging_loss=0.01502, over 25000.00 frames. ], tot_loss[loss=0.01458, audio_tagging_loss=0.01458, over 4951205.76 frames. ], batch size: 100, lr: 9.07e-03, grad_scale: 64.0 2023-12-22 03:06:35,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=370040.0, ans=0.0 2023-12-22 03:06:54,448 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.244e+01 2.661e+01 2.811e+01 2.974e+01 3.661e+01, threshold=5.621e+01, percent-clipped=0.0 2023-12-22 03:06:55,405 INFO [train.py:886] (1/4) Epoch 12, batch 3100, loss[loss=0.01228, audio_tagging_loss=0.01228, over 24750.00 frames. ], tot_loss[loss=0.01461, audio_tagging_loss=0.01461, over 4951331.75 frames. ], batch size: 99, lr: 9.07e-03, grad_scale: 64.0 2023-12-22 03:07:16,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=370306.6666666667, ans=0.2 2023-12-22 03:07:39,624 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 03:07:45,194 INFO [train.py:886] (1/4) Epoch 12, batch 3150, loss[loss=0.01639, audio_tagging_loss=0.01639, over 24947.00 frames. ], tot_loss[loss=0.01469, audio_tagging_loss=0.01469, over 4951604.19 frames. ], batch size: 100, lr: 9.06e-03, grad_scale: 64.0 2023-12-22 03:07:54,740 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.15 vs. limit=15.0 2023-12-22 03:07:55,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=370573.3333333333, ans=0.125 2023-12-22 03:08:35,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=370773.3333333333, ans=0.125 2023-12-22 03:08:37,466 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.371e+01 2.651e+01 2.845e+01 3.043e+01 3.695e+01, threshold=5.689e+01, percent-clipped=0.0 2023-12-22 03:08:38,470 INFO [train.py:886] (1/4) Epoch 12, batch 3200, loss[loss=0.0164, audio_tagging_loss=0.0164, over 24750.00 frames. ], tot_loss[loss=0.01478, audio_tagging_loss=0.01478, over 4946663.50 frames. ], batch size: 99, lr: 9.06e-03, grad_scale: 64.0 2023-12-22 03:08:46,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=370840.0, ans=0.5 2023-12-22 03:08:48,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=370906.6666666667, ans=0.1 2023-12-22 03:09:06,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=370973.3333333333, ans=0.0 2023-12-22 03:09:19,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=371106.6666666667, ans=0.125 2023-12-22 03:09:23,998 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.51 vs. limit=10.0 2023-12-22 03:09:29,977 INFO [train.py:886] (1/4) Epoch 12, batch 3250, loss[loss=0.01654, audio_tagging_loss=0.01654, over 24750.00 frames. ], tot_loss[loss=0.0147, audio_tagging_loss=0.0147, over 4949777.33 frames. ], batch size: 99, lr: 9.05e-03, grad_scale: 64.0 2023-12-22 03:09:34,779 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.00 vs. limit=15.0 2023-12-22 03:09:35,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=371173.3333333333, ans=0.0 2023-12-22 03:09:36,586 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.80 vs. limit=6.0 2023-12-22 03:09:38,109 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.309e-02 2023-12-22 03:09:38,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=371173.3333333333, ans=0.2 2023-12-22 03:09:50,696 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.79 vs. limit=10.0 2023-12-22 03:10:05,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=371373.3333333333, ans=0.125 2023-12-22 03:10:14,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=371440.0, ans=0.1 2023-12-22 03:10:16,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.40 vs. limit=15.0 2023-12-22 03:10:18,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=371440.0, ans=0.1 2023-12-22 03:10:18,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=371440.0, ans=0.0 2023-12-22 03:10:20,352 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.332e+01 2.685e+01 2.819e+01 2.963e+01 3.522e+01, threshold=5.637e+01, percent-clipped=0.0 2023-12-22 03:10:21,387 INFO [train.py:886] (1/4) Epoch 12, batch 3300, loss[loss=0.01375, audio_tagging_loss=0.01375, over 25000.00 frames. ], tot_loss[loss=0.01456, audio_tagging_loss=0.01456, over 4948347.76 frames. ], batch size: 100, lr: 9.05e-03, grad_scale: 64.0 2023-12-22 03:10:27,388 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.93 vs. limit=15.0 2023-12-22 03:10:40,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=371573.3333333333, ans=0.2 2023-12-22 03:10:47,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=371640.0, ans=0.2 2023-12-22 03:10:47,794 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.43 vs. limit=12.0 2023-12-22 03:10:48,762 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.15 vs. limit=15.0 2023-12-22 03:11:14,000 INFO [train.py:886] (1/4) Epoch 12, batch 3350, loss[loss=0.01548, audio_tagging_loss=0.01548, over 24068.00 frames. ], tot_loss[loss=0.01455, audio_tagging_loss=0.01455, over 4947453.99 frames. ], batch size: 100, lr: 9.05e-03, grad_scale: 64.0 2023-12-22 03:11:21,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=371840.0, ans=0.0 2023-12-22 03:11:22,854 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.49 vs. limit=15.0 2023-12-22 03:11:23,544 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.56 vs. limit=10.0 2023-12-22 03:11:25,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=371906.6666666667, ans=0.2 2023-12-22 03:11:38,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=371973.3333333333, ans=0.1 2023-12-22 03:11:44,246 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.05 vs. limit=15.0 2023-12-22 03:11:50,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=372040.0, ans=0.1 2023-12-22 03:12:01,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=372106.6666666667, ans=0.0 2023-12-22 03:12:02,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=372106.6666666667, ans=0.125 2023-12-22 03:12:04,925 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.439e+01 2.659e+01 2.803e+01 3.006e+01 4.806e+01, threshold=5.606e+01, percent-clipped=0.0 2023-12-22 03:12:05,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=372173.3333333333, ans=0.125 2023-12-22 03:12:05,924 INFO [train.py:886] (1/4) Epoch 12, batch 3400, loss[loss=0.01436, audio_tagging_loss=0.01436, over 25000.00 frames. ], tot_loss[loss=0.01458, audio_tagging_loss=0.01458, over 4950218.46 frames. ], batch size: 100, lr: 9.04e-03, grad_scale: 64.0 2023-12-22 03:12:13,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=372173.3333333333, ans=0.125 2023-12-22 03:12:26,819 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.46 vs. limit=22.5 2023-12-22 03:12:30,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=372306.6666666667, ans=0.0 2023-12-22 03:12:33,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=372306.6666666667, ans=0.0 2023-12-22 03:12:33,356 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.46 vs. limit=15.0 2023-12-22 03:12:58,609 INFO [train.py:886] (1/4) Epoch 12, batch 3450, loss[loss=0.01548, audio_tagging_loss=0.01548, over 24750.00 frames. ], tot_loss[loss=0.0147, audio_tagging_loss=0.0147, over 4942959.20 frames. ], batch size: 99, lr: 9.04e-03, grad_scale: 64.0 2023-12-22 03:13:05,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=372506.6666666667, ans=0.0 2023-12-22 03:13:14,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=372573.3333333333, ans=0.1 2023-12-22 03:13:24,545 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.841e-01 2023-12-22 03:13:28,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=372706.6666666667, ans=0.05 2023-12-22 03:13:46,842 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.22 vs. limit=15.0 2023-12-22 03:13:49,405 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.514e+01 2.768e+01 2.902e+01 3.061e+01 3.695e+01, threshold=5.804e+01, percent-clipped=0.0 2023-12-22 03:13:49,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=372840.0, ans=0.2 2023-12-22 03:13:51,115 INFO [train.py:886] (1/4) Epoch 12, batch 3500, loss[loss=0.01309, audio_tagging_loss=0.01309, over 24750.00 frames. ], tot_loss[loss=0.01467, audio_tagging_loss=0.01467, over 4939919.98 frames. ], batch size: 99, lr: 9.03e-03, grad_scale: 64.0 2023-12-22 03:13:53,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=372840.0, ans=0.125 2023-12-22 03:14:10,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=372973.3333333333, ans=0.0 2023-12-22 03:14:11,763 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 03:14:18,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=372973.3333333333, ans=0.1 2023-12-22 03:14:19,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=372973.3333333333, ans=0.125 2023-12-22 03:14:41,955 INFO [train.py:886] (1/4) Epoch 12, batch 3550, loss[loss=0.01729, audio_tagging_loss=0.01729, over 25000.00 frames. ], tot_loss[loss=0.01457, audio_tagging_loss=0.01457, over 4944108.81 frames. ], batch size: 100, lr: 9.03e-03, grad_scale: 64.0 2023-12-22 03:15:11,951 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.52 vs. limit=15.0 2023-12-22 03:15:31,161 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.05 vs. limit=15.0 2023-12-22 03:15:35,219 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.266e+01 2.600e+01 2.744e+01 2.864e+01 3.557e+01, threshold=5.489e+01, percent-clipped=0.0 2023-12-22 03:15:36,198 INFO [train.py:886] (1/4) Epoch 12, batch 3600, loss[loss=0.01393, audio_tagging_loss=0.01393, over 25000.00 frames. ], tot_loss[loss=0.01454, audio_tagging_loss=0.01454, over 4948596.76 frames. ], batch size: 100, lr: 9.03e-03, grad_scale: 64.0 2023-12-22 03:15:55,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=373573.3333333333, ans=0.125 2023-12-22 03:16:13,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=373706.6666666667, ans=0.125 2023-12-22 03:16:27,969 INFO [train.py:886] (1/4) Epoch 12, batch 3650, loss[loss=0.01449, audio_tagging_loss=0.01449, over 25000.00 frames. ], tot_loss[loss=0.01454, audio_tagging_loss=0.01454, over 4948408.45 frames. ], batch size: 100, lr: 9.02e-03, grad_scale: 64.0 2023-12-22 03:16:28,594 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.76 vs. limit=6.0 2023-12-22 03:16:33,853 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.35 vs. limit=15.0 2023-12-22 03:16:36,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=373840.0, ans=0.0 2023-12-22 03:16:42,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=373906.6666666667, ans=0.125 2023-12-22 03:16:43,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=373906.6666666667, ans=0.025 2023-12-22 03:17:05,966 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.31 vs. limit=22.5 2023-12-22 03:17:18,604 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.492e+01 2.722e+01 2.831e+01 2.983e+01 3.559e+01, threshold=5.662e+01, percent-clipped=0.0 2023-12-22 03:17:19,592 INFO [train.py:886] (1/4) Epoch 12, batch 3700, loss[loss=0.01668, audio_tagging_loss=0.01668, over 25000.00 frames. ], tot_loss[loss=0.01459, audio_tagging_loss=0.01459, over 4955917.02 frames. ], batch size: 100, lr: 9.02e-03, grad_scale: 64.0 2023-12-22 03:17:29,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=374240.0, ans=0.125 2023-12-22 03:17:46,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=374306.6666666667, ans=0.2 2023-12-22 03:18:09,356 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.69 vs. limit=15.0 2023-12-22 03:18:12,531 INFO [train.py:886] (1/4) Epoch 12, batch 3750, loss[loss=0.0195, audio_tagging_loss=0.0195, over 24750.00 frames. ], tot_loss[loss=0.01467, audio_tagging_loss=0.01467, over 4953020.04 frames. ], batch size: 99, lr: 9.01e-03, grad_scale: 64.0 2023-12-22 03:18:13,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=374506.6666666667, ans=0.125 2023-12-22 03:18:35,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=374640.0, ans=0.1 2023-12-22 03:18:39,820 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.89 vs. limit=15.0 2023-12-22 03:18:40,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=374640.0, ans=0.0 2023-12-22 03:18:52,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=374773.3333333333, ans=0.1 2023-12-22 03:18:55,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=374773.3333333333, ans=0.125 2023-12-22 03:19:00,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=374773.3333333333, ans=10.0 2023-12-22 03:19:01,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=374773.3333333333, ans=0.0 2023-12-22 03:19:02,676 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+01 2.712e+01 2.843e+01 3.020e+01 3.497e+01, threshold=5.685e+01, percent-clipped=0.0 2023-12-22 03:19:03,671 INFO [train.py:886] (1/4) Epoch 12, batch 3800, loss[loss=0.01371, audio_tagging_loss=0.01371, over 24750.00 frames. ], tot_loss[loss=0.01476, audio_tagging_loss=0.01476, over 4947330.88 frames. ], batch size: 99, lr: 9.01e-03, grad_scale: 64.0 2023-12-22 03:19:05,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=374840.0, ans=0.125 2023-12-22 03:19:11,202 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.596e-03 2023-12-22 03:19:13,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=374906.6666666667, ans=0.0 2023-12-22 03:19:19,819 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.21 vs. limit=22.5 2023-12-22 03:19:25,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=374973.3333333333, ans=0.1 2023-12-22 03:19:42,706 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 03:19:43,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=375040.0, ans=0.125 2023-12-22 03:19:44,999 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.90 vs. limit=10.0 2023-12-22 03:19:55,532 INFO [train.py:886] (1/4) Epoch 12, batch 3850, loss[loss=0.01311, audio_tagging_loss=0.01311, over 25000.00 frames. ], tot_loss[loss=0.01474, audio_tagging_loss=0.01474, over 4946701.13 frames. ], batch size: 100, lr: 9.01e-03, grad_scale: 64.0 2023-12-22 03:19:58,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=375173.3333333333, ans=0.1 2023-12-22 03:20:16,021 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.82 vs. limit=6.0 2023-12-22 03:20:17,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=375306.6666666667, ans=0.09899494936611666 2023-12-22 03:20:38,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=375440.0, ans=0.0 2023-12-22 03:20:39,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=375440.0, ans=0.125 2023-12-22 03:20:44,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=375440.0, ans=0.0 2023-12-22 03:20:46,917 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.427e+01 2.664e+01 2.840e+01 2.981e+01 3.799e+01, threshold=5.681e+01, percent-clipped=0.0 2023-12-22 03:20:47,899 INFO [train.py:886] (1/4) Epoch 12, batch 3900, loss[loss=0.01374, audio_tagging_loss=0.01374, over 24750.00 frames. ], tot_loss[loss=0.01468, audio_tagging_loss=0.01468, over 4950523.32 frames. ], batch size: 99, lr: 9.00e-03, grad_scale: 64.0 2023-12-22 03:20:49,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=375506.6666666667, ans=0.2 2023-12-22 03:20:53,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=375506.6666666667, ans=0.0 2023-12-22 03:21:04,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=375573.3333333333, ans=0.0 2023-12-22 03:21:04,472 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.90 vs. limit=12.0 2023-12-22 03:21:06,286 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.44 vs. limit=15.0 2023-12-22 03:21:26,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=375706.6666666667, ans=0.2 2023-12-22 03:21:39,169 INFO [train.py:886] (1/4) Epoch 12, batch 3950, loss[loss=0.0168, audio_tagging_loss=0.0168, over 25000.00 frames. ], tot_loss[loss=0.01465, audio_tagging_loss=0.01465, over 4959680.29 frames. ], batch size: 100, lr: 9.00e-03, grad_scale: 64.0 2023-12-22 03:21:58,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=375906.6666666667, ans=0.125 2023-12-22 03:22:06,931 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.54 vs. limit=15.0 2023-12-22 03:22:11,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=376040.0, ans=0.0 2023-12-22 03:22:30,717 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.312e+01 2.669e+01 2.811e+01 2.989e+01 3.723e+01, threshold=5.621e+01, percent-clipped=0.0 2023-12-22 03:22:31,696 INFO [train.py:886] (1/4) Epoch 12, batch 4000, loss[loss=0.01321, audio_tagging_loss=0.01321, over 25000.00 frames. ], tot_loss[loss=0.01463, audio_tagging_loss=0.01463, over 4958315.20 frames. ], batch size: 100, lr: 8.99e-03, grad_scale: 64.0 2023-12-22 03:22:36,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=376173.3333333333, ans=0.1 2023-12-22 03:22:37,093 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.52 vs. limit=15.0 2023-12-22 03:22:45,479 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.32 vs. limit=22.5 2023-12-22 03:22:52,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=376306.6666666667, ans=0.0 2023-12-22 03:22:54,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=376306.6666666667, ans=0.0 2023-12-22 03:22:58,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=376306.6666666667, ans=0.125 2023-12-22 03:23:09,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=376373.3333333333, ans=0.1 2023-12-22 03:23:22,938 INFO [train.py:886] (1/4) Epoch 12, batch 4050, loss[loss=0.01432, audio_tagging_loss=0.01432, over 24750.00 frames. ], tot_loss[loss=0.01467, audio_tagging_loss=0.01467, over 4958260.71 frames. ], batch size: 99, lr: 8.99e-03, grad_scale: 128.0 2023-12-22 03:23:33,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=376573.3333333333, ans=0.125 2023-12-22 03:23:37,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=376573.3333333333, ans=0.125 2023-12-22 03:23:41,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=376573.3333333333, ans=0.1 2023-12-22 03:23:46,342 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.67 vs. limit=15.0 2023-12-22 03:23:50,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=376640.0, ans=0.2 2023-12-22 03:23:57,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=376706.6666666667, ans=0.2 2023-12-22 03:24:07,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=376773.3333333333, ans=0.125 2023-12-22 03:24:09,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=376773.3333333333, ans=0.125 2023-12-22 03:24:10,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=376773.3333333333, ans=0.125 2023-12-22 03:24:14,135 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.422e+01 2.760e+01 2.894e+01 3.048e+01 3.546e+01, threshold=5.789e+01, percent-clipped=0.0 2023-12-22 03:24:14,161 INFO [train.py:886] (1/4) Epoch 12, batch 4100, loss[loss=0.01375, audio_tagging_loss=0.01375, over 24750.00 frames. ], tot_loss[loss=0.01479, audio_tagging_loss=0.01479, over 4952095.16 frames. ], batch size: 99, lr: 8.99e-03, grad_scale: 64.0 2023-12-22 03:24:25,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=376906.6666666667, ans=0.125 2023-12-22 03:24:44,250 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.43 vs. limit=15.0 2023-12-22 03:24:51,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=377040.0, ans=0.125 2023-12-22 03:25:05,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=377173.3333333333, ans=0.0 2023-12-22 03:25:07,066 INFO [train.py:886] (1/4) Epoch 12, batch 4150, loss[loss=0.01578, audio_tagging_loss=0.01578, over 24750.00 frames. ], tot_loss[loss=0.01476, audio_tagging_loss=0.01476, over 4945514.45 frames. ], batch size: 99, lr: 8.98e-03, grad_scale: 64.0 2023-12-22 03:25:46,745 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.95 vs. limit=15.0 2023-12-22 03:25:58,381 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.381e+01 2.599e+01 2.785e+01 2.978e+01 3.505e+01, threshold=5.570e+01, percent-clipped=0.0 2023-12-22 03:25:58,407 INFO [train.py:886] (1/4) Epoch 12, batch 4200, loss[loss=0.01735, audio_tagging_loss=0.01735, over 25000.00 frames. ], tot_loss[loss=0.01475, audio_tagging_loss=0.01475, over 4946044.43 frames. ], batch size: 100, lr: 8.98e-03, grad_scale: 64.0 2023-12-22 03:25:58,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=377506.6666666667, ans=0.125 2023-12-22 03:26:11,822 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.10 vs. limit=15.0 2023-12-22 03:26:40,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=377773.3333333333, ans=0.125 2023-12-22 03:26:42,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=377773.3333333333, ans=0.125 2023-12-22 03:26:50,239 INFO [train.py:886] (1/4) Epoch 12, batch 4250, loss[loss=0.01411, audio_tagging_loss=0.01411, over 25000.00 frames. ], tot_loss[loss=0.01471, audio_tagging_loss=0.01471, over 4944320.84 frames. ], batch size: 100, lr: 8.97e-03, grad_scale: 64.0 2023-12-22 03:27:05,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=377906.6666666667, ans=0.0 2023-12-22 03:27:23,138 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.05 vs. limit=12.0 2023-12-22 03:27:34,267 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.91 vs. limit=22.5 2023-12-22 03:27:41,793 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.354e+01 2.668e+01 2.819e+01 2.959e+01 3.495e+01, threshold=5.638e+01, percent-clipped=0.0 2023-12-22 03:27:41,821 INFO [train.py:886] (1/4) Epoch 12, batch 4300, loss[loss=0.01352, audio_tagging_loss=0.01352, over 25000.00 frames. ], tot_loss[loss=0.01466, audio_tagging_loss=0.01466, over 4945964.65 frames. ], batch size: 100, lr: 8.97e-03, grad_scale: 64.0 2023-12-22 03:27:43,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=378173.3333333333, ans=0.0 2023-12-22 03:27:44,220 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.89 vs. limit=10.0 2023-12-22 03:28:03,463 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.49 vs. limit=15.0 2023-12-22 03:28:19,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=378373.3333333333, ans=0.125 2023-12-22 03:28:21,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=378373.3333333333, ans=0.1 2023-12-22 03:28:30,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=378440.0, ans=0.0 2023-12-22 03:28:32,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=378506.6666666667, ans=0.1 2023-12-22 03:28:32,933 INFO [train.py:886] (1/4) Epoch 12, batch 4350, loss[loss=0.01612, audio_tagging_loss=0.01612, over 25000.00 frames. ], tot_loss[loss=0.01477, audio_tagging_loss=0.01477, over 4944975.20 frames. ], batch size: 100, lr: 8.97e-03, grad_scale: 64.0 2023-12-22 03:28:33,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=378506.6666666667, ans=0.2 2023-12-22 03:28:56,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=378640.0, ans=0.125 2023-12-22 03:29:00,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=378640.0, ans=0.125 2023-12-22 03:29:14,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=378773.3333333333, ans=0.0 2023-12-22 03:29:19,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=378773.3333333333, ans=0.2 2023-12-22 03:29:23,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=378773.3333333333, ans=0.0 2023-12-22 03:29:25,512 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.422e+01 2.769e+01 2.901e+01 3.056e+01 3.737e+01, threshold=5.802e+01, percent-clipped=0.0 2023-12-22 03:29:25,539 INFO [train.py:886] (1/4) Epoch 12, batch 4400, loss[loss=0.01597, audio_tagging_loss=0.01597, over 24750.00 frames. ], tot_loss[loss=0.01493, audio_tagging_loss=0.01493, over 4944921.90 frames. ], batch size: 99, lr: 8.96e-03, grad_scale: 64.0 2023-12-22 03:29:38,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.99 vs. limit=15.0 2023-12-22 03:29:43,520 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.69 vs. limit=10.0 2023-12-22 03:29:47,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=378973.3333333333, ans=15.0 2023-12-22 03:29:48,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=378973.3333333333, ans=0.07 2023-12-22 03:30:01,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=379040.0, ans=0.125 2023-12-22 03:30:17,497 INFO [train.py:886] (1/4) Epoch 12, batch 4450, loss[loss=0.01438, audio_tagging_loss=0.01438, over 25000.00 frames. ], tot_loss[loss=0.01496, audio_tagging_loss=0.01496, over 4944140.10 frames. ], batch size: 100, lr: 8.96e-03, grad_scale: 64.0 2023-12-22 03:30:17,920 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.06 vs. limit=12.0 2023-12-22 03:30:31,493 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 03:30:49,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=379373.3333333333, ans=0.1 2023-12-22 03:31:09,037 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+01 2.635e+01 2.785e+01 2.931e+01 3.625e+01, threshold=5.571e+01, percent-clipped=0.0 2023-12-22 03:31:09,063 INFO [train.py:886] (1/4) Epoch 12, batch 4500, loss[loss=0.0177, audio_tagging_loss=0.0177, over 22292.00 frames. ], tot_loss[loss=0.0148, audio_tagging_loss=0.0148, over 4939326.85 frames. ], batch size: 107, lr: 8.96e-03, grad_scale: 64.0 2023-12-22 03:31:13,228 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.47 vs. limit=22.5 2023-12-22 03:31:14,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=379506.6666666667, ans=0.125 2023-12-22 03:31:22,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=379573.3333333333, ans=0.125 2023-12-22 03:31:27,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=379573.3333333333, ans=0.2 2023-12-22 03:31:28,735 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.34 vs. limit=15.0 2023-12-22 03:31:40,824 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.00 vs. limit=15.0 2023-12-22 03:31:46,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=379706.6666666667, ans=0.125 2023-12-22 03:31:55,829 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.27 vs. limit=15.0 2023-12-22 03:32:00,482 INFO [train.py:886] (1/4) Epoch 12, batch 4550, loss[loss=0.01724, audio_tagging_loss=0.01724, over 25000.00 frames. ], tot_loss[loss=0.01481, audio_tagging_loss=0.01481, over 4945349.86 frames. ], batch size: 100, lr: 8.95e-03, grad_scale: 64.0 2023-12-22 03:32:10,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=379906.6666666667, ans=0.125 2023-12-22 03:32:15,465 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 03:32:23,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=379973.3333333333, ans=0.125 2023-12-22 03:32:32,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=380040.0, ans=0.05 2023-12-22 03:32:32,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.53 vs. limit=15.0 2023-12-22 03:32:39,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=380040.0, ans=0.05 2023-12-22 03:32:51,247 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.389e+01 2.640e+01 2.744e+01 2.893e+01 3.340e+01, threshold=5.488e+01, percent-clipped=0.0 2023-12-22 03:32:51,272 INFO [train.py:886] (1/4) Epoch 12, batch 4600, loss[loss=0.01317, audio_tagging_loss=0.01317, over 25000.00 frames. ], tot_loss[loss=0.01476, audio_tagging_loss=0.01476, over 4948645.16 frames. ], batch size: 100, lr: 8.95e-03, grad_scale: 64.0 2023-12-22 03:32:53,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=380173.3333333333, ans=0.0 2023-12-22 03:32:57,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=380173.3333333333, ans=0.0 2023-12-22 03:33:23,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=380373.3333333333, ans=0.0 2023-12-22 03:33:26,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=15.0 2023-12-22 03:33:34,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=380440.0, ans=0.125 2023-12-22 03:33:42,650 INFO [train.py:886] (1/4) Epoch 12, batch 4650, loss[loss=0.01533, audio_tagging_loss=0.01533, over 25000.00 frames. ], tot_loss[loss=0.01481, audio_tagging_loss=0.01481, over 4956736.83 frames. ], batch size: 100, lr: 8.94e-03, grad_scale: 64.0 2023-12-22 03:33:55,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=380573.3333333333, ans=0.125 2023-12-22 03:34:10,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=380640.0, ans=0.1 2023-12-22 03:34:20,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=380706.6666666667, ans=0.125 2023-12-22 03:34:28,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=380773.3333333333, ans=10.0 2023-12-22 03:34:32,596 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.367e+01 2.686e+01 2.822e+01 3.003e+01 3.524e+01, threshold=5.643e+01, percent-clipped=0.0 2023-12-22 03:34:32,621 INFO [train.py:886] (1/4) Epoch 12, batch 4700, loss[loss=0.01213, audio_tagging_loss=0.01213, over 24750.00 frames. ], tot_loss[loss=0.0149, audio_tagging_loss=0.0149, over 4957515.30 frames. ], batch size: 99, lr: 8.94e-03, grad_scale: 64.0 2023-12-22 03:35:03,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=381040.0, ans=0.125 2023-12-22 03:35:20,081 INFO [train.py:886] (1/4) Epoch 12, batch 4750, loss[loss=0.01262, audio_tagging_loss=0.01262, over 24750.00 frames. ], tot_loss[loss=0.01499, audio_tagging_loss=0.01499, over 4956791.85 frames. ], batch size: 99, lr: 8.94e-03, grad_scale: 64.0 2023-12-22 03:35:57,808 INFO [train.py:886] (1/4) Epoch 13, batch 0, loss[loss=0.02873, audio_tagging_loss=0.02873, over 24084.00 frames. ], tot_loss[loss=0.02873, audio_tagging_loss=0.02873, over 24084.00 frames. ], batch size: 100, lr: 8.59e-03, grad_scale: 32.0 2023-12-22 03:35:57,808 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 03:36:18,412 INFO [train.py:917] (1/4) Epoch 13, validation: loss=0.03383, audio_tagging_loss=0.03383, over 3737520.00 frames. 2023-12-22 03:36:18,412 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 03:36:18,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=381280.0, ans=0.05 2023-12-22 03:36:22,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=381280.0, ans=0.0 2023-12-22 03:36:36,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=381346.6666666667, ans=0.0 2023-12-22 03:36:36,332 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.39 vs. limit=15.0 2023-12-22 03:36:44,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=381413.3333333333, ans=0.95 2023-12-22 03:36:45,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=381413.3333333333, ans=0.0 2023-12-22 03:36:50,896 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.56 vs. limit=15.0 2023-12-22 03:36:54,915 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.369e+01 2.829e+01 3.058e+01 3.866e+01 8.495e+01, threshold=6.115e+01, percent-clipped=6.0 2023-12-22 03:36:55,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=381480.0, ans=0.0 2023-12-22 03:37:10,121 INFO [train.py:886] (1/4) Epoch 13, batch 50, loss[loss=0.02434, audio_tagging_loss=0.02434, over 25000.00 frames. ], tot_loss[loss=0.02354, audio_tagging_loss=0.02354, over 1115477.59 frames. ], batch size: 100, lr: 8.58e-03, grad_scale: 32.0 2023-12-22 03:37:16,547 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.91 vs. limit=15.0 2023-12-22 03:37:27,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.70 vs. limit=22.5 2023-12-22 03:37:30,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=381680.0, ans=0.0 2023-12-22 03:37:32,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=381746.6666666667, ans=0.0 2023-12-22 03:37:32,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=381746.6666666667, ans=0.035 2023-12-22 03:37:34,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=381746.6666666667, ans=0.125 2023-12-22 03:37:42,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=381813.3333333333, ans=0.125 2023-12-22 03:37:53,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=381880.0, ans=0.1 2023-12-22 03:37:55,722 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.47 vs. limit=22.5 2023-12-22 03:37:57,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=381880.0, ans=0.04949747468305833 2023-12-22 03:38:02,622 INFO [train.py:886] (1/4) Epoch 13, batch 100, loss[loss=0.01733, audio_tagging_loss=0.01733, over 25000.00 frames. ], tot_loss[loss=0.02024, audio_tagging_loss=0.02024, over 1970944.77 frames. ], batch size: 100, lr: 8.58e-03, grad_scale: 32.0 2023-12-22 03:38:14,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=382013.3333333333, ans=0.125 2023-12-22 03:38:19,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=382013.3333333333, ans=0.125 2023-12-22 03:38:24,222 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.24 vs. limit=15.0 2023-12-22 03:38:39,109 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.608e+01 2.907e+01 3.118e+01 3.285e+01 3.851e+01, threshold=6.236e+01, percent-clipped=0.0 2023-12-22 03:38:42,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=382146.6666666667, ans=0.1 2023-12-22 03:38:47,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=382213.3333333333, ans=0.125 2023-12-22 03:38:49,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=382213.3333333333, ans=0.125 2023-12-22 03:38:54,836 INFO [train.py:886] (1/4) Epoch 13, batch 150, loss[loss=0.0171, audio_tagging_loss=0.0171, over 25000.00 frames. ], tot_loss[loss=0.01839, audio_tagging_loss=0.01839, over 2637505.60 frames. ], batch size: 100, lr: 8.58e-03, grad_scale: 32.0 2023-12-22 03:38:55,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=382280.0, ans=0.125 2023-12-22 03:38:57,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=382280.0, ans=0.07 2023-12-22 03:39:09,165 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.56 vs. limit=12.0 2023-12-22 03:39:14,873 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.64 vs. limit=22.5 2023-12-22 03:39:27,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=382480.0, ans=0.125 2023-12-22 03:39:46,377 INFO [train.py:886] (1/4) Epoch 13, batch 200, loss[loss=0.01631, audio_tagging_loss=0.01631, over 25000.00 frames. ], tot_loss[loss=0.01729, audio_tagging_loss=0.01729, over 3157891.13 frames. ], batch size: 100, lr: 8.57e-03, grad_scale: 32.0 2023-12-22 03:39:58,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=382680.0, ans=0.125 2023-12-22 03:40:05,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=382680.0, ans=0.2 2023-12-22 03:40:11,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=382746.6666666667, ans=0.125 2023-12-22 03:40:12,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=382746.6666666667, ans=0.1 2023-12-22 03:40:15,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=382746.6666666667, ans=0.1 2023-12-22 03:40:15,691 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.964e-01 2023-12-22 03:40:22,765 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+01 2.728e+01 2.862e+01 2.980e+01 3.546e+01, threshold=5.723e+01, percent-clipped=0.0 2023-12-22 03:40:25,282 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.10 vs. limit=22.5 2023-12-22 03:40:38,326 INFO [train.py:886] (1/4) Epoch 13, batch 250, loss[loss=0.01611, audio_tagging_loss=0.01611, over 25000.00 frames. ], tot_loss[loss=0.01667, audio_tagging_loss=0.01667, over 3563429.82 frames. ], batch size: 100, lr: 8.57e-03, grad_scale: 32.0 2023-12-22 03:40:48,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=383013.3333333333, ans=0.1 2023-12-22 03:41:03,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=383080.0, ans=0.1 2023-12-22 03:41:08,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=383146.6666666667, ans=0.125 2023-12-22 03:41:17,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=383146.6666666667, ans=0.125 2023-12-22 03:41:18,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=383146.6666666667, ans=0.0 2023-12-22 03:41:29,766 INFO [train.py:886] (1/4) Epoch 13, batch 300, loss[loss=0.01659, audio_tagging_loss=0.01659, over 24750.00 frames. ], tot_loss[loss=0.01632, audio_tagging_loss=0.01632, over 3866467.94 frames. ], batch size: 99, lr: 8.56e-03, grad_scale: 32.0 2023-12-22 03:41:42,139 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=15.0 2023-12-22 03:41:43,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=383346.6666666667, ans=0.0 2023-12-22 03:41:47,605 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.75 vs. limit=15.0 2023-12-22 03:42:06,353 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.359e+01 2.665e+01 2.854e+01 3.043e+01 3.614e+01, threshold=5.708e+01, percent-clipped=0.0 2023-12-22 03:42:22,007 INFO [train.py:886] (1/4) Epoch 13, batch 350, loss[loss=0.01328, audio_tagging_loss=0.01328, over 24750.00 frames. ], tot_loss[loss=0.01591, audio_tagging_loss=0.01591, over 4102602.18 frames. ], batch size: 99, lr: 8.56e-03, grad_scale: 32.0 2023-12-22 03:42:26,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=383613.3333333333, ans=0.05 2023-12-22 03:43:06,095 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.42 vs. limit=15.0 2023-12-22 03:43:14,609 INFO [train.py:886] (1/4) Epoch 13, batch 400, loss[loss=0.01661, audio_tagging_loss=0.01661, over 24750.00 frames. ], tot_loss[loss=0.01552, audio_tagging_loss=0.01552, over 4289906.12 frames. ], batch size: 99, lr: 8.56e-03, grad_scale: 32.0 2023-12-22 03:43:14,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=383946.6666666667, ans=0.0 2023-12-22 03:43:21,506 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=3.410e-01 2023-12-22 03:43:37,951 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2023-12-22 03:43:38,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=384080.0, ans=0.125 2023-12-22 03:43:42,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=384080.0, ans=0.125 2023-12-22 03:43:51,105 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.379e+01 2.676e+01 2.770e+01 2.913e+01 3.430e+01, threshold=5.541e+01, percent-clipped=0.0 2023-12-22 03:44:02,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=384213.3333333333, ans=0.125 2023-12-22 03:44:02,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=384213.3333333333, ans=0.2 2023-12-22 03:44:05,956 INFO [train.py:886] (1/4) Epoch 13, batch 450, loss[loss=0.01015, audio_tagging_loss=0.01015, over 25000.00 frames. ], tot_loss[loss=0.01517, audio_tagging_loss=0.01517, over 4441078.39 frames. ], batch size: 100, lr: 8.55e-03, grad_scale: 32.0 2023-12-22 03:44:17,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=384346.6666666667, ans=0.125 2023-12-22 03:44:36,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=384480.0, ans=0.07 2023-12-22 03:44:38,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=384480.0, ans=0.0 2023-12-22 03:44:47,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=384546.6666666667, ans=0.125 2023-12-22 03:44:58,224 INFO [train.py:886] (1/4) Epoch 13, batch 500, loss[loss=0.01212, audio_tagging_loss=0.01212, over 25000.00 frames. ], tot_loss[loss=0.01502, audio_tagging_loss=0.01502, over 4553530.15 frames. ], batch size: 100, lr: 8.55e-03, grad_scale: 32.0 2023-12-22 03:44:58,619 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.98 vs. limit=6.0 2023-12-22 03:44:58,891 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.46 vs. limit=15.0 2023-12-22 03:44:59,649 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.33 vs. limit=15.0 2023-12-22 03:45:22,390 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.59 vs. limit=10.0 2023-12-22 03:45:28,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=384813.3333333333, ans=0.0 2023-12-22 03:45:30,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=384813.3333333333, ans=0.2 2023-12-22 03:45:34,435 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+01 2.645e+01 2.803e+01 2.992e+01 3.772e+01, threshold=5.607e+01, percent-clipped=0.0 2023-12-22 03:45:45,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=384880.0, ans=0.05 2023-12-22 03:45:50,094 INFO [train.py:886] (1/4) Epoch 13, batch 550, loss[loss=0.01462, audio_tagging_loss=0.01462, over 25000.00 frames. ], tot_loss[loss=0.01501, audio_tagging_loss=0.01501, over 4646652.16 frames. ], batch size: 100, lr: 8.55e-03, grad_scale: 32.0 2023-12-22 03:45:53,082 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.93 vs. limit=15.0 2023-12-22 03:45:54,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=384946.6666666667, ans=0.1 2023-12-22 03:46:02,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=385013.3333333333, ans=0.125 2023-12-22 03:46:09,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=385013.3333333333, ans=0.125 2023-12-22 03:46:33,938 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.58 vs. limit=15.0 2023-12-22 03:46:41,742 INFO [train.py:886] (1/4) Epoch 13, batch 600, loss[loss=0.01329, audio_tagging_loss=0.01329, over 24750.00 frames. ], tot_loss[loss=0.0151, audio_tagging_loss=0.0151, over 4714893.20 frames. ], batch size: 99, lr: 8.54e-03, grad_scale: 32.0 2023-12-22 03:46:48,871 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.10 vs. limit=15.0 2023-12-22 03:47:03,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=385413.3333333333, ans=0.125 2023-12-22 03:47:12,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=385480.0, ans=0.2 2023-12-22 03:47:14,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=385480.0, ans=0.125 2023-12-22 03:47:17,930 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.342e+01 2.658e+01 2.809e+01 2.976e+01 3.464e+01, threshold=5.617e+01, percent-clipped=0.0 2023-12-22 03:47:22,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=385546.6666666667, ans=0.125 2023-12-22 03:47:27,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=385546.6666666667, ans=0.125 2023-12-22 03:47:31,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.39 vs. limit=15.0 2023-12-22 03:47:33,626 INFO [train.py:886] (1/4) Epoch 13, batch 650, loss[loss=0.01407, audio_tagging_loss=0.01407, over 24750.00 frames. ], tot_loss[loss=0.01504, audio_tagging_loss=0.01504, over 4761718.98 frames. ], batch size: 99, lr: 8.54e-03, grad_scale: 32.0 2023-12-22 03:47:38,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=385613.3333333333, ans=0.125 2023-12-22 03:47:42,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=385613.3333333333, ans=0.125 2023-12-22 03:47:49,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=385680.0, ans=0.025 2023-12-22 03:47:58,277 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.84 vs. limit=22.5 2023-12-22 03:48:12,393 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.28 vs. limit=10.0 2023-12-22 03:48:13,901 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.38 vs. limit=15.0 2023-12-22 03:48:18,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=385880.0, ans=10.0 2023-12-22 03:48:21,634 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.95 vs. limit=15.0 2023-12-22 03:48:23,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=385946.6666666667, ans=0.125 2023-12-22 03:48:24,010 INFO [train.py:886] (1/4) Epoch 13, batch 700, loss[loss=0.01443, audio_tagging_loss=0.01443, over 25000.00 frames. ], tot_loss[loss=0.01503, audio_tagging_loss=0.01503, over 4798132.40 frames. ], batch size: 100, lr: 8.53e-03, grad_scale: 32.0 2023-12-22 03:48:34,180 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.52 vs. limit=15.0 2023-12-22 03:48:43,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=386013.3333333333, ans=0.125 2023-12-22 03:48:45,735 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.44 vs. limit=22.5 2023-12-22 03:48:46,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=386080.0, ans=0.1 2023-12-22 03:48:53,450 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.67 vs. limit=15.0 2023-12-22 03:49:01,128 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.365e+01 2.667e+01 2.765e+01 2.953e+01 3.565e+01, threshold=5.530e+01, percent-clipped=0.0 2023-12-22 03:49:02,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=386146.6666666667, ans=0.0 2023-12-22 03:49:10,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=386213.3333333333, ans=0.125 2023-12-22 03:49:17,657 INFO [train.py:886] (1/4) Epoch 13, batch 750, loss[loss=0.01366, audio_tagging_loss=0.01366, over 25000.00 frames. ], tot_loss[loss=0.01487, audio_tagging_loss=0.01487, over 4834791.45 frames. ], batch size: 100, lr: 8.53e-03, grad_scale: 32.0 2023-12-22 03:49:58,829 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.97 vs. limit=15.0 2023-12-22 03:50:04,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=386546.6666666667, ans=0.125 2023-12-22 03:50:08,447 INFO [train.py:886] (1/4) Epoch 13, batch 800, loss[loss=0.01599, audio_tagging_loss=0.01599, over 25000.00 frames. ], tot_loss[loss=0.01482, audio_tagging_loss=0.01482, over 4861423.23 frames. ], batch size: 100, lr: 8.53e-03, grad_scale: 32.0 2023-12-22 03:50:17,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=386613.3333333333, ans=0.125 2023-12-22 03:50:17,889 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.38 vs. limit=22.5 2023-12-22 03:50:23,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=386680.0, ans=0.2 2023-12-22 03:50:31,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=386746.6666666667, ans=0.125 2023-12-22 03:50:34,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=386746.6666666667, ans=0.125 2023-12-22 03:50:45,491 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.332e+01 2.686e+01 2.811e+01 2.962e+01 3.601e+01, threshold=5.623e+01, percent-clipped=0.0 2023-12-22 03:51:01,121 INFO [train.py:886] (1/4) Epoch 13, batch 850, loss[loss=0.01244, audio_tagging_loss=0.01244, over 22085.00 frames. ], tot_loss[loss=0.01481, audio_tagging_loss=0.01481, over 4873149.43 frames. ], batch size: 107, lr: 8.52e-03, grad_scale: 32.0 2023-12-22 03:51:01,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=386946.6666666667, ans=0.125 2023-12-22 03:51:02,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=386946.6666666667, ans=0.125 2023-12-22 03:51:04,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=386946.6666666667, ans=0.025 2023-12-22 03:51:15,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=387013.3333333333, ans=0.125 2023-12-22 03:51:24,279 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.11 vs. limit=15.0 2023-12-22 03:51:31,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=387146.6666666667, ans=0.0 2023-12-22 03:51:36,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=387146.6666666667, ans=0.125 2023-12-22 03:51:51,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=387280.0, ans=0.0 2023-12-22 03:51:52,656 INFO [train.py:886] (1/4) Epoch 13, batch 900, loss[loss=0.0181, audio_tagging_loss=0.0181, over 24750.00 frames. ], tot_loss[loss=0.0148, audio_tagging_loss=0.0148, over 4891749.52 frames. ], batch size: 99, lr: 8.52e-03, grad_scale: 32.0 2023-12-22 03:52:00,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=387280.0, ans=0.1 2023-12-22 03:52:12,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=387413.3333333333, ans=0.0 2023-12-22 03:52:26,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=387480.0, ans=0.0 2023-12-22 03:52:29,460 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+01 2.739e+01 2.831e+01 3.018e+01 3.521e+01, threshold=5.662e+01, percent-clipped=0.0 2023-12-22 03:52:44,294 INFO [train.py:886] (1/4) Epoch 13, batch 950, loss[loss=0.01633, audio_tagging_loss=0.01633, over 24949.00 frames. ], tot_loss[loss=0.01493, audio_tagging_loss=0.01493, over 4900904.52 frames. ], batch size: 100, lr: 8.52e-03, grad_scale: 32.0 2023-12-22 03:53:30,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=387880.0, ans=0.125 2023-12-22 03:53:32,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=387880.0, ans=0.125 2023-12-22 03:53:36,364 INFO [train.py:886] (1/4) Epoch 13, batch 1000, loss[loss=0.01311, audio_tagging_loss=0.01311, over 24750.00 frames. ], tot_loss[loss=0.01487, audio_tagging_loss=0.01487, over 4912321.94 frames. ], batch size: 99, lr: 8.51e-03, grad_scale: 32.0 2023-12-22 03:53:50,147 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.33 vs. limit=22.5 2023-12-22 03:53:52,058 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.73 vs. limit=15.0 2023-12-22 03:54:00,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=388080.0, ans=0.1 2023-12-22 03:54:12,661 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.361e+01 2.670e+01 2.865e+01 3.020e+01 3.813e+01, threshold=5.731e+01, percent-clipped=0.0 2023-12-22 03:54:14,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=388146.6666666667, ans=0.125 2023-12-22 03:54:16,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=388213.3333333333, ans=0.125 2023-12-22 03:54:28,331 INFO [train.py:886] (1/4) Epoch 13, batch 1050, loss[loss=0.01246, audio_tagging_loss=0.01246, over 25000.00 frames. ], tot_loss[loss=0.01472, audio_tagging_loss=0.01472, over 4915261.85 frames. ], batch size: 100, lr: 8.51e-03, grad_scale: 32.0 2023-12-22 03:54:45,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=388346.6666666667, ans=0.2 2023-12-22 03:54:46,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=388346.6666666667, ans=0.125 2023-12-22 03:54:51,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=388413.3333333333, ans=0.125 2023-12-22 03:55:09,463 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.08 vs. limit=12.0 2023-12-22 03:55:20,176 INFO [train.py:886] (1/4) Epoch 13, batch 1100, loss[loss=0.01493, audio_tagging_loss=0.01493, over 25000.00 frames. ], tot_loss[loss=0.01459, audio_tagging_loss=0.01459, over 4926721.86 frames. ], batch size: 100, lr: 8.51e-03, grad_scale: 32.0 2023-12-22 03:55:38,020 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.65 vs. limit=15.0 2023-12-22 03:55:43,720 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.78 vs. limit=15.0 2023-12-22 03:55:56,138 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.370e+01 2.628e+01 2.785e+01 2.893e+01 3.439e+01, threshold=5.569e+01, percent-clipped=0.0 2023-12-22 03:56:11,810 INFO [train.py:886] (1/4) Epoch 13, batch 1150, loss[loss=0.01378, audio_tagging_loss=0.01378, over 24750.00 frames. ], tot_loss[loss=0.01455, audio_tagging_loss=0.01455, over 4936559.19 frames. ], batch size: 99, lr: 8.50e-03, grad_scale: 32.0 2023-12-22 03:56:30,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=389013.3333333333, ans=0.0 2023-12-22 03:56:40,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=389080.0, ans=0.1 2023-12-22 03:56:51,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.25 vs. limit=12.0 2023-12-22 03:56:55,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=389213.3333333333, ans=0.0 2023-12-22 03:57:02,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=389280.0, ans=0.1 2023-12-22 03:57:02,760 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=6.381e-02 2023-12-22 03:57:04,162 INFO [train.py:886] (1/4) Epoch 13, batch 1200, loss[loss=0.01523, audio_tagging_loss=0.01523, over 25000.00 frames. ], tot_loss[loss=0.0145, audio_tagging_loss=0.0145, over 4943812.34 frames. ], batch size: 100, lr: 8.50e-03, grad_scale: 32.0 2023-12-22 03:57:05,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.89 vs. limit=12.0 2023-12-22 03:57:11,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=389280.0, ans=0.09899494936611666 2023-12-22 03:57:12,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=389280.0, ans=0.0 2023-12-22 03:57:14,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=389346.6666666667, ans=0.1 2023-12-22 03:57:22,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.95 vs. limit=22.5 2023-12-22 03:57:41,005 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.433e+01 2.655e+01 2.812e+01 2.956e+01 4.274e+01, threshold=5.623e+01, percent-clipped=0.0 2023-12-22 03:57:41,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=389480.0, ans=0.0 2023-12-22 03:57:55,917 INFO [train.py:886] (1/4) Epoch 13, batch 1250, loss[loss=0.01918, audio_tagging_loss=0.01918, over 24946.00 frames. ], tot_loss[loss=0.01475, audio_tagging_loss=0.01475, over 4946023.05 frames. ], batch size: 100, lr: 8.50e-03, grad_scale: 32.0 2023-12-22 03:57:57,311 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.36 vs. limit=12.0 2023-12-22 03:57:58,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=389613.3333333333, ans=0.125 2023-12-22 03:58:07,304 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.97 vs. limit=22.5 2023-12-22 03:58:09,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=389680.0, ans=0.0 2023-12-22 03:58:10,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=389680.0, ans=15.0 2023-12-22 03:58:14,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.41 vs. limit=15.0 2023-12-22 03:58:20,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=389746.6666666667, ans=0.125 2023-12-22 03:58:20,364 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.27 vs. limit=15.0 2023-12-22 03:58:23,840 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.88 vs. limit=22.5 2023-12-22 03:58:25,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=389746.6666666667, ans=0.0 2023-12-22 03:58:27,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=389813.3333333333, ans=0.2 2023-12-22 03:58:29,631 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=5.94 vs. limit=15.0 2023-12-22 03:58:42,257 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=12.26 vs. limit=12.0 2023-12-22 03:58:45,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=389880.0, ans=0.125 2023-12-22 03:58:48,335 INFO [train.py:886] (1/4) Epoch 13, batch 1300, loss[loss=0.01453, audio_tagging_loss=0.01453, over 25000.00 frames. ], tot_loss[loss=0.01487, audio_tagging_loss=0.01487, over 4943819.43 frames. ], batch size: 100, lr: 8.49e-03, grad_scale: 32.0 2023-12-22 03:58:56,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=389946.6666666667, ans=0.1 2023-12-22 03:59:02,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.76 vs. limit=15.0 2023-12-22 03:59:05,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=390013.3333333333, ans=0.95 2023-12-22 03:59:11,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=390080.0, ans=0.02 2023-12-22 03:59:12,794 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=12.0 2023-12-22 03:59:24,296 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.505e+01 2.761e+01 2.854e+01 2.985e+01 3.406e+01, threshold=5.709e+01, percent-clipped=0.0 2023-12-22 03:59:33,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=390213.3333333333, ans=0.2 2023-12-22 03:59:34,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=390213.3333333333, ans=0.1 2023-12-22 03:59:39,924 INFO [train.py:886] (1/4) Epoch 13, batch 1350, loss[loss=0.01634, audio_tagging_loss=0.01634, over 25000.00 frames. ], tot_loss[loss=0.0148, audio_tagging_loss=0.0148, over 4947902.60 frames. ], batch size: 100, lr: 8.49e-03, grad_scale: 32.0 2023-12-22 03:59:45,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=61.07 vs. limit=15.0 2023-12-22 03:59:54,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=390346.6666666667, ans=0.125 2023-12-22 04:00:25,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=390546.6666666667, ans=0.2 2023-12-22 04:00:27,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=390546.6666666667, ans=0.125 2023-12-22 04:00:32,529 INFO [train.py:886] (1/4) Epoch 13, batch 1400, loss[loss=0.01313, audio_tagging_loss=0.01313, over 25000.00 frames. ], tot_loss[loss=0.01468, audio_tagging_loss=0.01468, over 4942238.70 frames. ], batch size: 100, lr: 8.48e-03, grad_scale: 32.0 2023-12-22 04:00:42,730 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.695e-03 2023-12-22 04:00:56,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=390746.6666666667, ans=0.125 2023-12-22 04:01:00,314 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.82 vs. limit=6.0 2023-12-22 04:01:08,894 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.401e+01 2.605e+01 2.750e+01 2.989e+01 3.413e+01, threshold=5.499e+01, percent-clipped=0.0 2023-12-22 04:01:24,377 INFO [train.py:886] (1/4) Epoch 13, batch 1450, loss[loss=0.01536, audio_tagging_loss=0.01536, over 25000.00 frames. ], tot_loss[loss=0.0146, audio_tagging_loss=0.0146, over 4947812.52 frames. ], batch size: 100, lr: 8.48e-03, grad_scale: 32.0 2023-12-22 04:01:25,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=390946.6666666667, ans=0.125 2023-12-22 04:01:36,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=391013.3333333333, ans=0.125 2023-12-22 04:01:54,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=391146.6666666667, ans=0.0 2023-12-22 04:01:56,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=391146.6666666667, ans=0.1 2023-12-22 04:02:02,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=391146.6666666667, ans=0.125 2023-12-22 04:02:02,538 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.03 vs. limit=15.0 2023-12-22 04:02:04,287 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.12 vs. limit=15.0 2023-12-22 04:02:09,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=391213.3333333333, ans=0.125 2023-12-22 04:02:15,791 INFO [train.py:886] (1/4) Epoch 13, batch 1500, loss[loss=0.01277, audio_tagging_loss=0.01277, over 25000.00 frames. ], tot_loss[loss=0.01469, audio_tagging_loss=0.01469, over 4957261.19 frames. ], batch size: 100, lr: 8.48e-03, grad_scale: 32.0 2023-12-22 04:02:19,001 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.14 vs. limit=6.0 2023-12-22 04:02:22,069 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.53 vs. limit=15.0 2023-12-22 04:02:24,892 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=12.0 2023-12-22 04:02:40,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=391413.3333333333, ans=0.07 2023-12-22 04:02:44,780 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.962e-02 2023-12-22 04:02:47,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=391480.0, ans=0.125 2023-12-22 04:02:51,066 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.423e+01 2.726e+01 2.853e+01 3.010e+01 3.854e+01, threshold=5.706e+01, percent-clipped=0.0 2023-12-22 04:03:06,652 INFO [train.py:886] (1/4) Epoch 13, batch 1550, loss[loss=0.01709, audio_tagging_loss=0.01709, over 24750.00 frames. ], tot_loss[loss=0.01474, audio_tagging_loss=0.01474, over 4955458.07 frames. ], batch size: 99, lr: 8.47e-03, grad_scale: 32.0 2023-12-22 04:03:14,477 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.45 vs. limit=15.0 2023-12-22 04:03:24,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=391680.0, ans=0.125 2023-12-22 04:03:33,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=391746.6666666667, ans=0.125 2023-12-22 04:03:57,487 INFO [train.py:886] (1/4) Epoch 13, batch 1600, loss[loss=0.01492, audio_tagging_loss=0.01492, over 24750.00 frames. ], tot_loss[loss=0.01475, audio_tagging_loss=0.01475, over 4951636.39 frames. ], batch size: 99, lr: 8.47e-03, grad_scale: 32.0 2023-12-22 04:04:01,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=391946.6666666667, ans=0.125 2023-12-22 04:04:34,944 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.459e+01 2.766e+01 2.914e+01 3.092e+01 3.457e+01, threshold=5.827e+01, percent-clipped=0.0 2023-12-22 04:04:50,748 INFO [train.py:886] (1/4) Epoch 13, batch 1650, loss[loss=0.01345, audio_tagging_loss=0.01345, over 25000.00 frames. ], tot_loss[loss=0.01469, audio_tagging_loss=0.01469, over 4949485.53 frames. ], batch size: 100, lr: 8.47e-03, grad_scale: 32.0 2023-12-22 04:04:56,826 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.48 vs. limit=6.0 2023-12-22 04:05:12,303 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.92 vs. limit=15.0 2023-12-22 04:05:21,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=392480.0, ans=0.2 2023-12-22 04:05:27,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=392480.0, ans=0.04949747468305833 2023-12-22 04:05:31,773 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.60 vs. limit=15.0 2023-12-22 04:05:42,339 INFO [train.py:886] (1/4) Epoch 13, batch 1700, loss[loss=0.0145, audio_tagging_loss=0.0145, over 25000.00 frames. ], tot_loss[loss=0.01456, audio_tagging_loss=0.01456, over 4945763.18 frames. ], batch size: 100, lr: 8.46e-03, grad_scale: 32.0 2023-12-22 04:05:43,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=392613.3333333333, ans=0.125 2023-12-22 04:05:43,849 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.44 vs. limit=15.0 2023-12-22 04:05:58,799 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.75 vs. limit=22.5 2023-12-22 04:06:14,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=392813.3333333333, ans=0.125 2023-12-22 04:06:18,233 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.384e+01 2.671e+01 2.847e+01 3.058e+01 3.677e+01, threshold=5.694e+01, percent-clipped=0.0 2023-12-22 04:06:33,338 INFO [train.py:886] (1/4) Epoch 13, batch 1750, loss[loss=0.01221, audio_tagging_loss=0.01221, over 25000.00 frames. ], tot_loss[loss=0.01453, audio_tagging_loss=0.01453, over 4953170.26 frames. ], batch size: 100, lr: 8.46e-03, grad_scale: 32.0 2023-12-22 04:06:40,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=392946.6666666667, ans=0.1 2023-12-22 04:06:57,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=393080.0, ans=0.125 2023-12-22 04:07:02,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=393080.0, ans=0.125 2023-12-22 04:07:13,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=393146.6666666667, ans=0.05 2023-12-22 04:07:25,439 INFO [train.py:886] (1/4) Epoch 13, batch 1800, loss[loss=0.01753, audio_tagging_loss=0.01753, over 25000.00 frames. ], tot_loss[loss=0.01454, audio_tagging_loss=0.01454, over 4957647.87 frames. ], batch size: 100, lr: 8.46e-03, grad_scale: 32.0 2023-12-22 04:07:47,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=393413.3333333333, ans=0.1 2023-12-22 04:07:47,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=393413.3333333333, ans=15.0 2023-12-22 04:07:52,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=393413.3333333333, ans=0.125 2023-12-22 04:08:02,359 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.405e+01 2.697e+01 2.832e+01 3.014e+01 3.728e+01, threshold=5.665e+01, percent-clipped=0.0 2023-12-22 04:08:06,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=393546.6666666667, ans=0.125 2023-12-22 04:08:17,476 INFO [train.py:886] (1/4) Epoch 13, batch 1850, loss[loss=0.01412, audio_tagging_loss=0.01412, over 23985.00 frames. ], tot_loss[loss=0.01455, audio_tagging_loss=0.01455, over 4951118.56 frames. ], batch size: 100, lr: 8.45e-03, grad_scale: 32.0 2023-12-22 04:08:18,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=393613.3333333333, ans=0.0 2023-12-22 04:08:20,750 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.19 vs. limit=15.0 2023-12-22 04:08:38,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=393746.6666666667, ans=0.0 2023-12-22 04:09:06,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=393880.0, ans=0.0 2023-12-22 04:09:07,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=393880.0, ans=0.1 2023-12-22 04:09:09,393 INFO [train.py:886] (1/4) Epoch 13, batch 1900, loss[loss=0.0131, audio_tagging_loss=0.0131, over 24750.00 frames. ], tot_loss[loss=0.01467, audio_tagging_loss=0.01467, over 4945227.18 frames. ], batch size: 99, lr: 8.45e-03, grad_scale: 32.0 2023-12-22 04:09:09,576 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 04:09:12,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=393946.6666666667, ans=0.125 2023-12-22 04:09:17,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=393946.6666666667, ans=0.0 2023-12-22 04:09:31,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=394080.0, ans=0.125 2023-12-22 04:09:35,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=394080.0, ans=0.0 2023-12-22 04:09:45,143 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.423e+01 2.721e+01 2.913e+01 3.026e+01 3.443e+01, threshold=5.825e+01, percent-clipped=0.0 2023-12-22 04:09:47,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=394146.6666666667, ans=0.1 2023-12-22 04:09:55,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=394213.3333333333, ans=0.1 2023-12-22 04:10:00,979 INFO [train.py:886] (1/4) Epoch 13, batch 1950, loss[loss=0.01524, audio_tagging_loss=0.01524, over 25000.00 frames. ], tot_loss[loss=0.01475, audio_tagging_loss=0.01475, over 4941159.65 frames. ], batch size: 100, lr: 8.45e-03, grad_scale: 32.0 2023-12-22 04:10:25,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=394413.3333333333, ans=0.125 2023-12-22 04:10:32,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=394480.0, ans=0.2 2023-12-22 04:10:32,435 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.52 vs. limit=22.5 2023-12-22 04:10:41,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=394546.6666666667, ans=0.125 2023-12-22 04:10:50,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=394546.6666666667, ans=0.125 2023-12-22 04:10:51,755 INFO [train.py:886] (1/4) Epoch 13, batch 2000, loss[loss=0.01189, audio_tagging_loss=0.01189, over 25000.00 frames. ], tot_loss[loss=0.01463, audio_tagging_loss=0.01463, over 4945217.54 frames. ], batch size: 100, lr: 8.44e-03, grad_scale: 64.0 2023-12-22 04:10:53,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=394613.3333333333, ans=0.125 2023-12-22 04:11:06,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=394680.0, ans=0.04949747468305833 2023-12-22 04:11:09,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=394680.0, ans=0.125 2023-12-22 04:11:20,728 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.81 vs. limit=22.5 2023-12-22 04:11:22,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=394813.3333333333, ans=0.125 2023-12-22 04:11:28,348 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.365e+01 2.715e+01 2.846e+01 3.050e+01 3.536e+01, threshold=5.692e+01, percent-clipped=0.0 2023-12-22 04:11:44,782 INFO [train.py:886] (1/4) Epoch 13, batch 2050, loss[loss=0.0161, audio_tagging_loss=0.0161, over 24750.00 frames. ], tot_loss[loss=0.01464, audio_tagging_loss=0.01464, over 4948088.80 frames. ], batch size: 99, lr: 8.44e-03, grad_scale: 64.0 2023-12-22 04:11:50,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=394946.6666666667, ans=0.0 2023-12-22 04:12:02,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=395013.3333333333, ans=0.0 2023-12-22 04:12:12,010 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.815e-02 2023-12-22 04:12:25,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=395213.3333333333, ans=0.125 2023-12-22 04:12:32,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=395213.3333333333, ans=0.125 2023-12-22 04:12:35,672 INFO [train.py:886] (1/4) Epoch 13, batch 2100, loss[loss=0.01396, audio_tagging_loss=0.01396, over 25000.00 frames. ], tot_loss[loss=0.01458, audio_tagging_loss=0.01458, over 4952684.61 frames. ], batch size: 100, lr: 8.44e-03, grad_scale: 64.0 2023-12-22 04:12:36,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=395280.0, ans=0.0 2023-12-22 04:12:45,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=395346.6666666667, ans=0.2 2023-12-22 04:12:50,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=395346.6666666667, ans=0.125 2023-12-22 04:12:54,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.30 vs. limit=22.5 2023-12-22 04:12:59,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=395413.3333333333, ans=0.0 2023-12-22 04:13:12,416 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.430e+01 2.725e+01 2.922e+01 3.047e+01 3.389e+01, threshold=5.843e+01, percent-clipped=0.0 2023-12-22 04:13:13,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=395480.0, ans=0.1 2023-12-22 04:13:27,952 INFO [train.py:886] (1/4) Epoch 13, batch 2150, loss[loss=0.01579, audio_tagging_loss=0.01579, over 24750.00 frames. ], tot_loss[loss=0.01471, audio_tagging_loss=0.01471, over 4952636.25 frames. ], batch size: 99, lr: 8.43e-03, grad_scale: 64.0 2023-12-22 04:14:02,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=395813.3333333333, ans=0.125 2023-12-22 04:14:03,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=395813.3333333333, ans=0.0 2023-12-22 04:14:07,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=395880.0, ans=0.125 2023-12-22 04:14:16,033 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.97 vs. limit=15.0 2023-12-22 04:14:19,319 INFO [train.py:886] (1/4) Epoch 13, batch 2200, loss[loss=0.01359, audio_tagging_loss=0.01359, over 24750.00 frames. ], tot_loss[loss=0.0148, audio_tagging_loss=0.0148, over 4950328.13 frames. ], batch size: 99, lr: 8.43e-03, grad_scale: 64.0 2023-12-22 04:14:42,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=396080.0, ans=0.2 2023-12-22 04:14:44,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=396080.0, ans=0.2 2023-12-22 04:14:56,168 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.295e+01 2.696e+01 2.863e+01 2.990e+01 3.495e+01, threshold=5.726e+01, percent-clipped=0.0 2023-12-22 04:14:58,711 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.32 vs. limit=15.0 2023-12-22 04:15:02,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=396213.3333333333, ans=0.125 2023-12-22 04:15:06,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=396213.3333333333, ans=0.0 2023-12-22 04:15:08,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=396213.3333333333, ans=0.125 2023-12-22 04:15:11,038 INFO [train.py:886] (1/4) Epoch 13, batch 2250, loss[loss=0.01607, audio_tagging_loss=0.01607, over 24022.00 frames. ], tot_loss[loss=0.0149, audio_tagging_loss=0.0149, over 4942767.61 frames. ], batch size: 100, lr: 8.42e-03, grad_scale: 64.0 2023-12-22 04:15:38,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.30 vs. limit=15.0 2023-12-22 04:15:44,671 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.07 vs. limit=15.0 2023-12-22 04:15:48,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=396480.0, ans=0.125 2023-12-22 04:15:52,494 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 04:15:58,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=396546.6666666667, ans=0.07 2023-12-22 04:16:03,311 INFO [train.py:886] (1/4) Epoch 13, batch 2300, loss[loss=0.01509, audio_tagging_loss=0.01509, over 25000.00 frames. ], tot_loss[loss=0.01481, audio_tagging_loss=0.01481, over 4940249.29 frames. ], batch size: 100, lr: 8.42e-03, grad_scale: 64.0 2023-12-22 04:16:10,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=396613.3333333333, ans=0.2 2023-12-22 04:16:26,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=396746.6666666667, ans=0.2 2023-12-22 04:16:34,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=396813.3333333333, ans=0.2 2023-12-22 04:16:39,346 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.203e+01 2.666e+01 2.800e+01 2.964e+01 3.386e+01, threshold=5.601e+01, percent-clipped=0.0 2023-12-22 04:16:55,669 INFO [train.py:886] (1/4) Epoch 13, batch 2350, loss[loss=0.01226, audio_tagging_loss=0.01226, over 24750.00 frames. ], tot_loss[loss=0.01476, audio_tagging_loss=0.01476, over 4945265.16 frames. ], batch size: 99, lr: 8.42e-03, grad_scale: 64.0 2023-12-22 04:17:05,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=397013.3333333333, ans=0.0 2023-12-22 04:17:30,182 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.93 vs. limit=10.0 2023-12-22 04:17:33,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=397146.6666666667, ans=0.0 2023-12-22 04:17:37,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=397213.3333333333, ans=0.2 2023-12-22 04:17:41,411 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.72 vs. limit=10.0 2023-12-22 04:17:46,558 INFO [train.py:886] (1/4) Epoch 13, batch 2400, loss[loss=0.01337, audio_tagging_loss=0.01337, over 25000.00 frames. ], tot_loss[loss=0.01475, audio_tagging_loss=0.01475, over 4950650.43 frames. ], batch size: 100, lr: 8.41e-03, grad_scale: 64.0 2023-12-22 04:17:49,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=397280.0, ans=0.0 2023-12-22 04:17:52,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=397280.0, ans=0.0 2023-12-22 04:18:23,443 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.270e+01 2.690e+01 2.809e+01 2.996e+01 3.440e+01, threshold=5.617e+01, percent-clipped=0.0 2023-12-22 04:18:25,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=397480.0, ans=0.125 2023-12-22 04:18:29,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=397546.6666666667, ans=0.07 2023-12-22 04:18:39,168 INFO [train.py:886] (1/4) Epoch 13, batch 2450, loss[loss=0.01522, audio_tagging_loss=0.01522, over 25000.00 frames. ], tot_loss[loss=0.01476, audio_tagging_loss=0.01476, over 4951742.27 frames. ], batch size: 100, lr: 8.41e-03, grad_scale: 64.0 2023-12-22 04:18:49,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=397680.0, ans=0.0 2023-12-22 04:18:52,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=397680.0, ans=0.2 2023-12-22 04:19:14,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=397813.3333333333, ans=0.0 2023-12-22 04:19:16,236 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.67 vs. limit=22.5 2023-12-22 04:19:19,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=397880.0, ans=0.05 2023-12-22 04:19:30,698 INFO [train.py:886] (1/4) Epoch 13, batch 2500, loss[loss=0.01504, audio_tagging_loss=0.01504, over 25000.00 frames. ], tot_loss[loss=0.01477, audio_tagging_loss=0.01477, over 4948573.93 frames. ], batch size: 100, lr: 8.41e-03, grad_scale: 64.0 2023-12-22 04:19:47,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=398013.3333333333, ans=0.125 2023-12-22 04:19:50,934 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 04:19:55,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=398080.0, ans=0.2 2023-12-22 04:20:07,965 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.462e+01 2.733e+01 2.874e+01 2.990e+01 3.625e+01, threshold=5.747e+01, percent-clipped=0.0 2023-12-22 04:20:08,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=398146.6666666667, ans=0.2 2023-12-22 04:20:23,088 INFO [train.py:886] (1/4) Epoch 13, batch 2550, loss[loss=0.01614, audio_tagging_loss=0.01614, over 24750.00 frames. ], tot_loss[loss=0.01488, audio_tagging_loss=0.01488, over 4944287.03 frames. ], batch size: 99, lr: 8.40e-03, grad_scale: 64.0 2023-12-22 04:20:38,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=398346.6666666667, ans=0.125 2023-12-22 04:20:41,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=398346.6666666667, ans=0.0 2023-12-22 04:20:43,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=398413.3333333333, ans=15.0 2023-12-22 04:20:49,146 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.05 vs. limit=15.0 2023-12-22 04:20:51,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=398413.3333333333, ans=0.0 2023-12-22 04:20:55,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=398480.0, ans=0.0 2023-12-22 04:21:00,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=398480.0, ans=0.125 2023-12-22 04:21:13,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=398546.6666666667, ans=0.1 2023-12-22 04:21:15,687 INFO [train.py:886] (1/4) Epoch 13, batch 2600, loss[loss=0.01471, audio_tagging_loss=0.01471, over 24750.00 frames. ], tot_loss[loss=0.0148, audio_tagging_loss=0.0148, over 4944932.09 frames. ], batch size: 99, lr: 8.40e-03, grad_scale: 64.0 2023-12-22 04:21:18,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=398613.3333333333, ans=0.125 2023-12-22 04:21:31,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=398680.0, ans=0.1 2023-12-22 04:21:45,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=398813.3333333333, ans=0.1 2023-12-22 04:21:47,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=398813.3333333333, ans=0.2 2023-12-22 04:21:51,209 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.481e+01 2.732e+01 2.842e+01 3.020e+01 3.869e+01, threshold=5.684e+01, percent-clipped=0.0 2023-12-22 04:21:51,707 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.45 vs. limit=15.0 2023-12-22 04:21:57,058 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.08 vs. limit=15.0 2023-12-22 04:22:06,044 INFO [train.py:886] (1/4) Epoch 13, batch 2650, loss[loss=0.01528, audio_tagging_loss=0.01528, over 25000.00 frames. ], tot_loss[loss=0.01469, audio_tagging_loss=0.01469, over 4949927.69 frames. ], batch size: 100, lr: 8.40e-03, grad_scale: 64.0 2023-12-22 04:22:10,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=398946.6666666667, ans=0.2 2023-12-22 04:22:58,325 INFO [train.py:886] (1/4) Epoch 13, batch 2700, loss[loss=0.01352, audio_tagging_loss=0.01352, over 25000.00 frames. ], tot_loss[loss=0.01457, audio_tagging_loss=0.01457, over 4950072.57 frames. ], batch size: 100, lr: 8.39e-03, grad_scale: 64.0 2023-12-22 04:23:12,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=399346.6666666667, ans=0.05 2023-12-22 04:23:25,296 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.85 vs. limit=15.0 2023-12-22 04:23:28,857 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.98 vs. limit=15.0 2023-12-22 04:23:30,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=399480.0, ans=0.125 2023-12-22 04:23:33,775 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.401e+01 2.672e+01 2.778e+01 2.950e+01 3.329e+01, threshold=5.555e+01, percent-clipped=0.0 2023-12-22 04:23:37,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=399546.6666666667, ans=0.125 2023-12-22 04:23:38,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=399546.6666666667, ans=0.0 2023-12-22 04:23:40,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=399546.6666666667, ans=0.125 2023-12-22 04:23:45,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=399546.6666666667, ans=0.125 2023-12-22 04:23:48,675 INFO [train.py:886] (1/4) Epoch 13, batch 2750, loss[loss=0.01509, audio_tagging_loss=0.01509, over 25000.00 frames. ], tot_loss[loss=0.01459, audio_tagging_loss=0.01459, over 4950400.45 frames. ], batch size: 100, lr: 8.39e-03, grad_scale: 64.0 2023-12-22 04:23:59,448 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.01 vs. limit=12.0 2023-12-22 04:24:05,745 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.12 vs. limit=12.0 2023-12-22 04:24:11,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=399746.6666666667, ans=0.125 2023-12-22 04:24:13,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=399746.6666666667, ans=0.0 2023-12-22 04:24:26,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=399813.3333333333, ans=0.1 2023-12-22 04:24:32,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=399880.0, ans=0.2 2023-12-22 04:24:32,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=399880.0, ans=0.125 2023-12-22 04:24:34,141 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.36 vs. limit=12.0 2023-12-22 04:24:34,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=399880.0, ans=0.2 2023-12-22 04:24:40,136 INFO [train.py:886] (1/4) Epoch 13, batch 2800, loss[loss=0.01649, audio_tagging_loss=0.01649, over 24750.00 frames. ], tot_loss[loss=0.01462, audio_tagging_loss=0.01462, over 4949437.80 frames. ], batch size: 99, lr: 8.39e-03, grad_scale: 64.0 2023-12-22 04:24:54,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=400013.3333333333, ans=0.125 2023-12-22 04:25:18,179 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.363e+01 2.709e+01 2.865e+01 2.998e+01 3.603e+01, threshold=5.731e+01, percent-clipped=0.0 2023-12-22 04:25:33,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=400280.0, ans=0.04949747468305833 2023-12-22 04:25:34,329 INFO [train.py:886] (1/4) Epoch 13, batch 2850, loss[loss=0.01268, audio_tagging_loss=0.01268, over 24750.00 frames. ], tot_loss[loss=0.01469, audio_tagging_loss=0.01469, over 4939742.73 frames. ], batch size: 99, lr: 8.38e-03, grad_scale: 64.0 2023-12-22 04:25:35,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.65 vs. limit=22.5 2023-12-22 04:25:54,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=400413.3333333333, ans=0.125 2023-12-22 04:25:56,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=400413.3333333333, ans=0.07 2023-12-22 04:26:13,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=400480.0, ans=0.125 2023-12-22 04:26:16,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=400546.6666666667, ans=0.0 2023-12-22 04:26:25,175 INFO [train.py:886] (1/4) Epoch 13, batch 2900, loss[loss=0.01838, audio_tagging_loss=0.01838, over 24750.00 frames. ], tot_loss[loss=0.01459, audio_tagging_loss=0.01459, over 4943135.31 frames. ], batch size: 99, lr: 8.38e-03, grad_scale: 64.0 2023-12-22 04:26:26,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=400613.3333333333, ans=0.0 2023-12-22 04:26:28,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=400613.3333333333, ans=0.2 2023-12-22 04:26:57,790 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.26 vs. limit=15.0 2023-12-22 04:26:59,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=400813.3333333333, ans=0.1 2023-12-22 04:27:01,893 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.432e+01 2.689e+01 2.815e+01 2.998e+01 3.858e+01, threshold=5.630e+01, percent-clipped=0.0 2023-12-22 04:27:03,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=400813.3333333333, ans=0.2 2023-12-22 04:27:03,387 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.45 vs. limit=15.0 2023-12-22 04:27:13,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=400880.0, ans=0.0 2023-12-22 04:27:14,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=400880.0, ans=0.1 2023-12-22 04:27:16,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=400946.6666666667, ans=0.125 2023-12-22 04:27:17,537 INFO [train.py:886] (1/4) Epoch 13, batch 2950, loss[loss=0.01436, audio_tagging_loss=0.01436, over 24018.00 frames. ], tot_loss[loss=0.01453, audio_tagging_loss=0.01453, over 4947223.29 frames. ], batch size: 100, lr: 8.38e-03, grad_scale: 64.0 2023-12-22 04:27:20,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=400946.6666666667, ans=0.1 2023-12-22 04:27:24,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=400946.6666666667, ans=0.125 2023-12-22 04:28:07,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=401280.0, ans=0.0 2023-12-22 04:28:07,809 INFO [train.py:886] (1/4) Epoch 13, batch 3000, loss[loss=0.01388, audio_tagging_loss=0.01388, over 25000.00 frames. ], tot_loss[loss=0.01445, audio_tagging_loss=0.01445, over 4946300.34 frames. ], batch size: 100, lr: 8.37e-03, grad_scale: 64.0 2023-12-22 04:28:07,810 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 04:28:28,505 INFO [train.py:917] (1/4) Epoch 13, validation: loss=0.03396, audio_tagging_loss=0.03396, over 3737520.00 frames. 2023-12-22 04:28:28,506 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 04:28:31,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=401280.0, ans=0.125 2023-12-22 04:28:48,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=401413.3333333333, ans=10.0 2023-12-22 04:29:04,379 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.309e+01 2.650e+01 2.784e+01 2.965e+01 3.758e+01, threshold=5.568e+01, percent-clipped=0.0 2023-12-22 04:29:11,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=401546.6666666667, ans=0.0 2023-12-22 04:29:12,637 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.40 vs. limit=15.0 2023-12-22 04:29:20,116 INFO [train.py:886] (1/4) Epoch 13, batch 3050, loss[loss=0.01586, audio_tagging_loss=0.01586, over 25000.00 frames. ], tot_loss[loss=0.01442, audio_tagging_loss=0.01442, over 4948871.65 frames. ], batch size: 100, lr: 8.37e-03, grad_scale: 64.0 2023-12-22 04:29:24,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=401613.3333333333, ans=0.125 2023-12-22 04:29:28,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=401680.0, ans=0.125 2023-12-22 04:29:38,194 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.83 vs. limit=6.0 2023-12-22 04:29:42,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=401746.6666666667, ans=0.0 2023-12-22 04:29:48,302 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.68 vs. limit=15.0 2023-12-22 04:29:50,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=401813.3333333333, ans=0.0 2023-12-22 04:29:54,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.48 vs. limit=22.5 2023-12-22 04:29:58,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=401813.3333333333, ans=0.1 2023-12-22 04:30:10,224 INFO [train.py:886] (1/4) Epoch 13, batch 3100, loss[loss=0.01502, audio_tagging_loss=0.01502, over 24750.00 frames. ], tot_loss[loss=0.01449, audio_tagging_loss=0.01449, over 4945754.39 frames. ], batch size: 99, lr: 8.37e-03, grad_scale: 64.0 2023-12-22 04:30:16,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=401946.6666666667, ans=0.5 2023-12-22 04:30:28,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=402013.3333333333, ans=0.125 2023-12-22 04:30:40,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=402146.6666666667, ans=0.125 2023-12-22 04:30:42,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=402146.6666666667, ans=0.09899494936611666 2023-12-22 04:30:45,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=402146.6666666667, ans=0.0 2023-12-22 04:30:46,091 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.337e+01 2.729e+01 2.844e+01 2.954e+01 3.475e+01, threshold=5.688e+01, percent-clipped=0.0 2023-12-22 04:30:46,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=402146.6666666667, ans=0.125 2023-12-22 04:31:01,792 INFO [train.py:886] (1/4) Epoch 13, batch 3150, loss[loss=0.01281, audio_tagging_loss=0.01281, over 24047.00 frames. ], tot_loss[loss=0.01452, audio_tagging_loss=0.01452, over 4939548.09 frames. ], batch size: 100, lr: 8.36e-03, grad_scale: 64.0 2023-12-22 04:31:06,971 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.17 vs. limit=15.0 2023-12-22 04:31:11,463 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.03 vs. limit=15.0 2023-12-22 04:31:30,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=402413.3333333333, ans=0.125 2023-12-22 04:31:39,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=402480.0, ans=0.125 2023-12-22 04:31:52,772 INFO [train.py:886] (1/4) Epoch 13, batch 3200, loss[loss=0.01585, audio_tagging_loss=0.01585, over 25000.00 frames. ], tot_loss[loss=0.0147, audio_tagging_loss=0.0147, over 4938419.47 frames. ], batch size: 100, lr: 8.36e-03, grad_scale: 64.0 2023-12-22 04:31:55,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=402613.3333333333, ans=0.0 2023-12-22 04:31:56,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=402613.3333333333, ans=0.0 2023-12-22 04:32:03,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=402680.0, ans=0.1 2023-12-22 04:32:08,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=402680.0, ans=0.0 2023-12-22 04:32:18,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=402746.6666666667, ans=0.125 2023-12-22 04:32:27,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=402813.3333333333, ans=10.0 2023-12-22 04:32:29,653 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.382e+01 2.680e+01 2.806e+01 3.000e+01 3.606e+01, threshold=5.611e+01, percent-clipped=0.0 2023-12-22 04:32:35,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=402880.0, ans=0.2 2023-12-22 04:32:45,305 INFO [train.py:886] (1/4) Epoch 13, batch 3250, loss[loss=0.01157, audio_tagging_loss=0.01157, over 25000.00 frames. ], tot_loss[loss=0.01463, audio_tagging_loss=0.01463, over 4939106.19 frames. ], batch size: 100, lr: 8.36e-03, grad_scale: 64.0 2023-12-22 04:32:46,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.72 vs. limit=15.0 2023-12-22 04:32:51,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=402946.6666666667, ans=0.07 2023-12-22 04:32:56,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=403013.3333333333, ans=0.0 2023-12-22 04:32:56,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=403013.3333333333, ans=0.2 2023-12-22 04:33:07,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=403080.0, ans=0.125 2023-12-22 04:33:18,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=403146.6666666667, ans=0.125 2023-12-22 04:33:37,345 INFO [train.py:886] (1/4) Epoch 13, batch 3300, loss[loss=0.01519, audio_tagging_loss=0.01519, over 25000.00 frames. ], tot_loss[loss=0.01451, audio_tagging_loss=0.01451, over 4940927.07 frames. ], batch size: 100, lr: 8.35e-03, grad_scale: 64.0 2023-12-22 04:33:37,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=403280.0, ans=0.125 2023-12-22 04:33:52,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=403346.6666666667, ans=0.125 2023-12-22 04:34:02,691 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.03 vs. limit=15.0 2023-12-22 04:34:11,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=403480.0, ans=0.125 2023-12-22 04:34:13,616 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.315e+01 2.672e+01 2.806e+01 3.015e+01 3.427e+01, threshold=5.612e+01, percent-clipped=0.0 2023-12-22 04:34:29,312 INFO [train.py:886] (1/4) Epoch 13, batch 3350, loss[loss=0.0153, audio_tagging_loss=0.0153, over 25000.00 frames. ], tot_loss[loss=0.01458, audio_tagging_loss=0.01458, over 4948691.02 frames. ], batch size: 100, lr: 8.35e-03, grad_scale: 64.0 2023-12-22 04:34:29,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=403613.3333333333, ans=0.035 2023-12-22 04:34:47,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=403680.0, ans=0.0 2023-12-22 04:34:54,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=403746.6666666667, ans=0.1 2023-12-22 04:35:04,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=403813.3333333333, ans=0.0 2023-12-22 04:35:20,733 INFO [train.py:886] (1/4) Epoch 13, batch 3400, loss[loss=0.01192, audio_tagging_loss=0.01192, over 25000.00 frames. ], tot_loss[loss=0.01455, audio_tagging_loss=0.01455, over 4954642.16 frames. ], batch size: 100, lr: 8.35e-03, grad_scale: 64.0 2023-12-22 04:35:46,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=404080.0, ans=0.125 2023-12-22 04:35:47,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=404080.0, ans=22.5 2023-12-22 04:35:56,796 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.420e+01 2.740e+01 2.906e+01 3.028e+01 3.470e+01, threshold=5.811e+01, percent-clipped=0.0 2023-12-22 04:36:01,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=404213.3333333333, ans=0.0 2023-12-22 04:36:11,668 INFO [train.py:886] (1/4) Epoch 13, batch 3450, loss[loss=0.01547, audio_tagging_loss=0.01547, over 24750.00 frames. ], tot_loss[loss=0.01463, audio_tagging_loss=0.01463, over 4950988.17 frames. ], batch size: 99, lr: 8.34e-03, grad_scale: 64.0 2023-12-22 04:36:23,172 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.59 vs. limit=15.0 2023-12-22 04:36:30,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=404346.6666666667, ans=0.0 2023-12-22 04:36:32,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=404413.3333333333, ans=0.09899494936611666 2023-12-22 04:36:32,963 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 04:36:54,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=404546.6666666667, ans=0.125 2023-12-22 04:37:01,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=404546.6666666667, ans=10.0 2023-12-22 04:37:03,794 INFO [train.py:886] (1/4) Epoch 13, batch 3500, loss[loss=0.01353, audio_tagging_loss=0.01353, over 22719.00 frames. ], tot_loss[loss=0.01461, audio_tagging_loss=0.01461, over 4948782.49 frames. ], batch size: 107, lr: 8.34e-03, grad_scale: 64.0 2023-12-22 04:37:04,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=404613.3333333333, ans=0.0 2023-12-22 04:37:07,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=404613.3333333333, ans=0.125 2023-12-22 04:37:25,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=404746.6666666667, ans=0.0 2023-12-22 04:37:32,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=404746.6666666667, ans=0.0 2023-12-22 04:37:39,808 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+01 2.687e+01 2.886e+01 3.053e+01 3.392e+01, threshold=5.771e+01, percent-clipped=0.0 2023-12-22 04:37:40,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=404813.3333333333, ans=0.07 2023-12-22 04:37:55,465 INFO [train.py:886] (1/4) Epoch 13, batch 3550, loss[loss=0.01413, audio_tagging_loss=0.01413, over 25000.00 frames. ], tot_loss[loss=0.01453, audio_tagging_loss=0.01453, over 4945261.80 frames. ], batch size: 100, lr: 8.34e-03, grad_scale: 64.0 2023-12-22 04:38:17,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=405080.0, ans=10.0 2023-12-22 04:38:23,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=405080.0, ans=0.0 2023-12-22 04:38:23,514 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2023-12-22 04:38:25,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=405146.6666666667, ans=0.1 2023-12-22 04:38:26,425 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.72 vs. limit=10.0 2023-12-22 04:38:47,567 INFO [train.py:886] (1/4) Epoch 13, batch 3600, loss[loss=0.01662, audio_tagging_loss=0.01662, over 25000.00 frames. ], tot_loss[loss=0.01447, audio_tagging_loss=0.01447, over 4950059.58 frames. ], batch size: 100, lr: 8.33e-03, grad_scale: 64.0 2023-12-22 04:38:54,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=405280.0, ans=0.0 2023-12-22 04:38:57,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=405346.6666666667, ans=0.0 2023-12-22 04:38:57,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=405346.6666666667, ans=0.125 2023-12-22 04:39:19,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=405480.0, ans=0.125 2023-12-22 04:39:20,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=405480.0, ans=0.125 2023-12-22 04:39:23,612 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.413e+01 2.719e+01 2.868e+01 3.005e+01 3.537e+01, threshold=5.736e+01, percent-clipped=0.0 2023-12-22 04:39:34,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=405546.6666666667, ans=0.1 2023-12-22 04:39:39,986 INFO [train.py:886] (1/4) Epoch 13, batch 3650, loss[loss=0.01385, audio_tagging_loss=0.01385, over 24750.00 frames. ], tot_loss[loss=0.0144, audio_tagging_loss=0.0144, over 4948131.26 frames. ], batch size: 99, lr: 8.33e-03, grad_scale: 64.0 2023-12-22 04:39:58,210 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.92 vs. limit=15.0 2023-12-22 04:40:02,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=405746.6666666667, ans=0.1 2023-12-22 04:40:16,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=405813.3333333333, ans=0.125 2023-12-22 04:40:17,523 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.89 vs. limit=15.0 2023-12-22 04:40:20,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=405880.0, ans=0.0 2023-12-22 04:40:26,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=405880.0, ans=0.1 2023-12-22 04:40:27,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=405880.0, ans=0.2 2023-12-22 04:40:30,634 INFO [train.py:886] (1/4) Epoch 13, batch 3700, loss[loss=0.01573, audio_tagging_loss=0.01573, over 25000.00 frames. ], tot_loss[loss=0.01447, audio_tagging_loss=0.01447, over 4953670.75 frames. ], batch size: 100, lr: 8.33e-03, grad_scale: 64.0 2023-12-22 04:40:51,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=406080.0, ans=0.125 2023-12-22 04:41:06,771 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.499e+01 2.738e+01 2.839e+01 2.971e+01 4.016e+01, threshold=5.677e+01, percent-clipped=0.0 2023-12-22 04:41:19,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=406213.3333333333, ans=0.1 2023-12-22 04:41:22,592 INFO [train.py:886] (1/4) Epoch 13, batch 3750, loss[loss=0.01535, audio_tagging_loss=0.01535, over 24750.00 frames. ], tot_loss[loss=0.0145, audio_tagging_loss=0.0145, over 4952702.53 frames. ], batch size: 99, lr: 8.32e-03, grad_scale: 64.0 2023-12-22 04:41:36,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=406346.6666666667, ans=0.125 2023-12-22 04:42:09,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=406546.6666666667, ans=0.1 2023-12-22 04:42:12,668 INFO [train.py:886] (1/4) Epoch 13, batch 3800, loss[loss=0.01338, audio_tagging_loss=0.01338, over 24750.00 frames. ], tot_loss[loss=0.01454, audio_tagging_loss=0.01454, over 4945148.75 frames. ], batch size: 99, lr: 8.32e-03, grad_scale: 64.0 2023-12-22 04:42:42,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=406813.3333333333, ans=0.2 2023-12-22 04:42:48,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=406813.3333333333, ans=0.0 2023-12-22 04:42:49,594 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.378e+01 2.729e+01 2.855e+01 2.947e+01 3.530e+01, threshold=5.711e+01, percent-clipped=0.0 2023-12-22 04:42:56,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=406880.0, ans=0.0 2023-12-22 04:43:02,296 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.48 vs. limit=15.0 2023-12-22 04:43:05,325 INFO [train.py:886] (1/4) Epoch 13, batch 3850, loss[loss=0.01372, audio_tagging_loss=0.01372, over 24750.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4944043.89 frames. ], batch size: 99, lr: 8.32e-03, grad_scale: 64.0 2023-12-22 04:43:10,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=406946.6666666667, ans=0.0 2023-12-22 04:43:10,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=406946.6666666667, ans=0.125 2023-12-22 04:43:13,446 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.60 vs. limit=15.0 2023-12-22 04:43:14,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=407013.3333333333, ans=0.0 2023-12-22 04:43:17,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=407013.3333333333, ans=0.125 2023-12-22 04:43:30,939 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.31 vs. limit=15.0 2023-12-22 04:43:46,516 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.96 vs. limit=15.0 2023-12-22 04:43:57,671 INFO [train.py:886] (1/4) Epoch 13, batch 3900, loss[loss=0.01284, audio_tagging_loss=0.01284, over 25000.00 frames. ], tot_loss[loss=0.01446, audio_tagging_loss=0.01446, over 4949152.13 frames. ], batch size: 100, lr: 8.31e-03, grad_scale: 64.0 2023-12-22 04:44:03,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=407280.0, ans=0.2 2023-12-22 04:44:19,706 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.39 vs. limit=22.5 2023-12-22 04:44:24,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=407413.3333333333, ans=0.0 2023-12-22 04:44:32,250 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.76 vs. limit=6.0 2023-12-22 04:44:32,901 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 04:44:33,214 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.49 vs. limit=15.0 2023-12-22 04:44:33,657 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.421e+01 2.742e+01 2.835e+01 2.926e+01 3.497e+01, threshold=5.671e+01, percent-clipped=0.0 2023-12-22 04:44:34,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=407480.0, ans=0.125 2023-12-22 04:44:35,066 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.50 vs. limit=15.0 2023-12-22 04:44:39,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=407546.6666666667, ans=0.1 2023-12-22 04:44:41,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=407546.6666666667, ans=0.1 2023-12-22 04:44:48,708 INFO [train.py:886] (1/4) Epoch 13, batch 3950, loss[loss=0.0165, audio_tagging_loss=0.0165, over 24750.00 frames. ], tot_loss[loss=0.01444, audio_tagging_loss=0.01444, over 4954190.70 frames. ], batch size: 99, lr: 8.31e-03, grad_scale: 64.0 2023-12-22 04:44:49,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=407613.3333333333, ans=0.5 2023-12-22 04:44:50,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=407613.3333333333, ans=0.1 2023-12-22 04:44:51,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=407613.3333333333, ans=0.125 2023-12-22 04:44:56,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=407613.3333333333, ans=0.0 2023-12-22 04:45:02,027 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.33 vs. limit=10.0 2023-12-22 04:45:12,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=407746.6666666667, ans=0.1 2023-12-22 04:45:41,177 INFO [train.py:886] (1/4) Epoch 13, batch 4000, loss[loss=0.01277, audio_tagging_loss=0.01277, over 22286.00 frames. ], tot_loss[loss=0.0144, audio_tagging_loss=0.0144, over 4951523.46 frames. ], batch size: 107, lr: 8.31e-03, grad_scale: 128.0 2023-12-22 04:46:18,849 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.329e+01 2.770e+01 2.894e+01 3.015e+01 3.399e+01, threshold=5.788e+01, percent-clipped=0.0 2023-12-22 04:46:32,157 INFO [train.py:886] (1/4) Epoch 13, batch 4050, loss[loss=0.01463, audio_tagging_loss=0.01463, over 24750.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4950782.02 frames. ], batch size: 99, lr: 8.30e-03, grad_scale: 64.0 2023-12-22 04:46:38,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=408280.0, ans=0.0 2023-12-22 04:46:49,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=408346.6666666667, ans=0.1 2023-12-22 04:46:56,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=408413.3333333333, ans=0.125 2023-12-22 04:46:56,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=408413.3333333333, ans=15.0 2023-12-22 04:46:57,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=408413.3333333333, ans=0.125 2023-12-22 04:47:02,398 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 04:47:05,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=408480.0, ans=0.125 2023-12-22 04:47:07,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=408480.0, ans=0.125 2023-12-22 04:47:24,283 INFO [train.py:886] (1/4) Epoch 13, batch 4100, loss[loss=0.01634, audio_tagging_loss=0.01634, over 24938.00 frames. ], tot_loss[loss=0.01461, audio_tagging_loss=0.01461, over 4947453.48 frames. ], batch size: 100, lr: 8.30e-03, grad_scale: 64.0 2023-12-22 04:47:35,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=408680.0, ans=0.2 2023-12-22 04:47:43,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=408746.6666666667, ans=0.0 2023-12-22 04:47:43,359 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.03 vs. limit=15.0 2023-12-22 04:47:53,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=408813.3333333333, ans=0.0 2023-12-22 04:48:00,115 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.473e+01 2.711e+01 2.874e+01 3.032e+01 3.484e+01, threshold=5.749e+01, percent-clipped=0.0 2023-12-22 04:48:12,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=408880.0, ans=0.2 2023-12-22 04:48:13,966 INFO [train.py:886] (1/4) Epoch 13, batch 4150, loss[loss=0.01128, audio_tagging_loss=0.01128, over 24750.00 frames. ], tot_loss[loss=0.01456, audio_tagging_loss=0.01456, over 4945141.13 frames. ], batch size: 99, lr: 8.30e-03, grad_scale: 64.0 2023-12-22 04:48:17,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=408946.6666666667, ans=0.2 2023-12-22 04:48:31,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=409013.3333333333, ans=0.2 2023-12-22 04:49:03,487 INFO [train.py:886] (1/4) Epoch 13, batch 4200, loss[loss=0.01292, audio_tagging_loss=0.01292, over 24750.00 frames. ], tot_loss[loss=0.01455, audio_tagging_loss=0.01455, over 4943789.44 frames. ], batch size: 99, lr: 8.29e-03, grad_scale: 64.0 2023-12-22 04:49:28,746 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.10 vs. limit=15.0 2023-12-22 04:49:33,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.01 vs. limit=15.0 2023-12-22 04:49:39,681 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.224e+01 2.633e+01 2.827e+01 2.966e+01 3.504e+01, threshold=5.654e+01, percent-clipped=0.0 2023-12-22 04:49:41,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=409480.0, ans=0.2 2023-12-22 04:49:55,221 INFO [train.py:886] (1/4) Epoch 13, batch 4250, loss[loss=0.01378, audio_tagging_loss=0.01378, over 25000.00 frames. ], tot_loss[loss=0.01446, audio_tagging_loss=0.01446, over 4953507.00 frames. ], batch size: 100, lr: 8.29e-03, grad_scale: 64.0 2023-12-22 04:50:45,481 INFO [train.py:886] (1/4) Epoch 13, batch 4300, loss[loss=0.01529, audio_tagging_loss=0.01529, over 25000.00 frames. ], tot_loss[loss=0.01441, audio_tagging_loss=0.01441, over 4956206.84 frames. ], batch size: 100, lr: 8.29e-03, grad_scale: 64.0 2023-12-22 04:51:01,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=410013.3333333333, ans=0.1 2023-12-22 04:51:04,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=410013.3333333333, ans=0.2 2023-12-22 04:51:07,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=410080.0, ans=10.0 2023-12-22 04:51:20,614 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.78 vs. limit=22.5 2023-12-22 04:51:21,870 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.351e+01 2.664e+01 2.847e+01 3.002e+01 3.748e+01, threshold=5.694e+01, percent-clipped=0.0 2023-12-22 04:51:31,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=410213.3333333333, ans=0.0 2023-12-22 04:51:35,986 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 04:51:36,651 INFO [train.py:886] (1/4) Epoch 13, batch 4350, loss[loss=0.01431, audio_tagging_loss=0.01431, over 24750.00 frames. ], tot_loss[loss=0.01445, audio_tagging_loss=0.01445, over 4957726.36 frames. ], batch size: 99, lr: 8.28e-03, grad_scale: 64.0 2023-12-22 04:51:56,326 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.22 vs. limit=22.5 2023-12-22 04:52:07,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=410480.0, ans=0.125 2023-12-22 04:52:10,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=410480.0, ans=0.125 2023-12-22 04:52:16,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=410546.6666666667, ans=0.1 2023-12-22 04:52:26,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=410613.3333333333, ans=0.5 2023-12-22 04:52:27,235 INFO [train.py:886] (1/4) Epoch 13, batch 4400, loss[loss=0.01414, audio_tagging_loss=0.01414, over 24750.00 frames. ], tot_loss[loss=0.01467, audio_tagging_loss=0.01467, over 4952502.58 frames. ], batch size: 99, lr: 8.28e-03, grad_scale: 64.0 2023-12-22 04:52:28,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=410613.3333333333, ans=0.125 2023-12-22 04:52:43,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=410680.0, ans=0.125 2023-12-22 04:52:52,285 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.35 vs. limit=15.0 2023-12-22 04:53:05,343 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.440e+01 2.774e+01 2.948e+01 3.077e+01 3.816e+01, threshold=5.896e+01, percent-clipped=0.0 2023-12-22 04:53:10,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=410880.0, ans=0.125 2023-12-22 04:53:19,284 INFO [train.py:886] (1/4) Epoch 13, batch 4450, loss[loss=0.01273, audio_tagging_loss=0.01273, over 22527.00 frames. ], tot_loss[loss=0.01472, audio_tagging_loss=0.01472, over 4947592.92 frames. ], batch size: 107, lr: 8.28e-03, grad_scale: 64.0 2023-12-22 04:53:39,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=411080.0, ans=0.125 2023-12-22 04:53:48,309 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.90 vs. limit=15.0 2023-12-22 04:53:51,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=411146.6666666667, ans=0.0 2023-12-22 04:54:07,745 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=15.0 2023-12-22 04:54:10,066 INFO [train.py:886] (1/4) Epoch 13, batch 4500, loss[loss=0.0174, audio_tagging_loss=0.0174, over 24750.00 frames. ], tot_loss[loss=0.01464, audio_tagging_loss=0.01464, over 4950288.00 frames. ], batch size: 99, lr: 8.27e-03, grad_scale: 64.0 2023-12-22 04:54:17,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=411280.0, ans=0.125 2023-12-22 04:54:19,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=411346.6666666667, ans=0.125 2023-12-22 04:54:30,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=411413.3333333333, ans=0.2 2023-12-22 04:54:45,540 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.85 vs. limit=15.0 2023-12-22 04:54:47,747 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.270e+01 2.626e+01 2.809e+01 2.913e+01 3.485e+01, threshold=5.618e+01, percent-clipped=0.0 2023-12-22 04:55:02,380 INFO [train.py:886] (1/4) Epoch 13, batch 4550, loss[loss=0.01514, audio_tagging_loss=0.01514, over 25000.00 frames. ], tot_loss[loss=0.01456, audio_tagging_loss=0.01456, over 4951786.09 frames. ], batch size: 100, lr: 8.27e-03, grad_scale: 64.0 2023-12-22 04:55:05,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=411613.3333333333, ans=0.04949747468305833 2023-12-22 04:55:12,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=411680.0, ans=0.125 2023-12-22 04:55:35,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=411813.3333333333, ans=0.1 2023-12-22 04:55:41,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=411813.3333333333, ans=0.125 2023-12-22 04:55:41,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=411813.3333333333, ans=0.2 2023-12-22 04:55:44,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=411880.0, ans=0.0 2023-12-22 04:55:45,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=411880.0, ans=0.0 2023-12-22 04:55:49,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=411880.0, ans=0.0 2023-12-22 04:55:53,114 INFO [train.py:886] (1/4) Epoch 13, batch 4600, loss[loss=0.01505, audio_tagging_loss=0.01505, over 24039.00 frames. ], tot_loss[loss=0.01455, audio_tagging_loss=0.01455, over 4949033.85 frames. ], batch size: 100, lr: 8.27e-03, grad_scale: 64.0 2023-12-22 04:55:57,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=411946.6666666667, ans=0.1 2023-12-22 04:56:15,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=412080.0, ans=0.2 2023-12-22 04:56:15,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=412080.0, ans=0.1 2023-12-22 04:56:17,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=412080.0, ans=0.125 2023-12-22 04:56:20,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=412080.0, ans=0.125 2023-12-22 04:56:30,632 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.383e+01 2.702e+01 2.812e+01 2.973e+01 3.804e+01, threshold=5.623e+01, percent-clipped=0.0 2023-12-22 04:56:33,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=412213.3333333333, ans=0.125 2023-12-22 04:56:39,954 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.67 vs. limit=15.0 2023-12-22 04:56:45,867 INFO [train.py:886] (1/4) Epoch 13, batch 4650, loss[loss=0.01527, audio_tagging_loss=0.01527, over 25000.00 frames. ], tot_loss[loss=0.01457, audio_tagging_loss=0.01457, over 4955003.39 frames. ], batch size: 100, lr: 8.26e-03, grad_scale: 64.0 2023-12-22 04:56:50,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=412280.0, ans=0.5 2023-12-22 04:57:01,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=412346.6666666667, ans=0.1 2023-12-22 04:57:21,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=412480.0, ans=0.0 2023-12-22 04:57:24,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=412480.0, ans=0.1 2023-12-22 04:57:35,747 INFO [train.py:886] (1/4) Epoch 13, batch 4700, loss[loss=0.01809, audio_tagging_loss=0.01809, over 24750.00 frames. ], tot_loss[loss=0.01472, audio_tagging_loss=0.01472, over 4952656.07 frames. ], batch size: 99, lr: 8.26e-03, grad_scale: 64.0 2023-12-22 04:57:35,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=412613.3333333333, ans=0.125 2023-12-22 04:57:50,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=412680.0, ans=0.125 2023-12-22 04:58:03,453 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.83 vs. limit=15.0 2023-12-22 04:58:10,163 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.333e+01 2.759e+01 2.926e+01 3.098e+01 3.768e+01, threshold=5.852e+01, percent-clipped=0.0 2023-12-22 04:58:18,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=412880.0, ans=0.125 2023-12-22 04:58:21,226 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.51 vs. limit=15.0 2023-12-22 04:58:23,359 INFO [train.py:886] (1/4) Epoch 13, batch 4750, loss[loss=0.01777, audio_tagging_loss=0.01777, over 25000.00 frames. ], tot_loss[loss=0.01479, audio_tagging_loss=0.01479, over 4947997.50 frames. ], batch size: 100, lr: 8.26e-03, grad_scale: 64.0 2023-12-22 04:58:23,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=412946.6666666667, ans=0.2 2023-12-22 04:58:31,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=413013.3333333333, ans=0.0 2023-12-22 04:59:00,169 INFO [train.py:886] (1/4) Epoch 14, batch 0, loss[loss=0.02963, audio_tagging_loss=0.02963, over 23983.00 frames. ], tot_loss[loss=0.02963, audio_tagging_loss=0.02963, over 23983.00 frames. ], batch size: 100, lr: 7.95e-03, grad_scale: 64.0 2023-12-22 04:59:00,170 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 04:59:20,852 INFO [train.py:917] (1/4) Epoch 14, validation: loss=0.0333, audio_tagging_loss=0.0333, over 3737520.00 frames. 2023-12-22 04:59:20,852 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 04:59:24,142 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.41 vs. limit=15.0 2023-12-22 04:59:26,020 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.03 vs. limit=6.0 2023-12-22 04:59:28,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=413053.3333333333, ans=0.125 2023-12-22 04:59:49,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=413186.6666666667, ans=0.125 2023-12-22 04:59:55,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=413253.3333333333, ans=0.0 2023-12-22 05:00:14,001 INFO [train.py:886] (1/4) Epoch 14, batch 50, loss[loss=0.01869, audio_tagging_loss=0.01869, over 25000.00 frames. ], tot_loss[loss=0.02258, audio_tagging_loss=0.02258, over 1116840.30 frames. ], batch size: 100, lr: 7.95e-03, grad_scale: 64.0 2023-12-22 05:00:17,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=413386.6666666667, ans=0.125 2023-12-22 05:00:22,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=413386.6666666667, ans=0.125 2023-12-22 05:00:30,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=413453.3333333333, ans=0.0 2023-12-22 05:00:33,567 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.460e+01 2.957e+01 3.426e+01 4.066e+01 1.021e+02, threshold=6.852e+01, percent-clipped=7.0 2023-12-22 05:00:39,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=413520.0, ans=0.125 2023-12-22 05:00:39,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=413520.0, ans=0.125 2023-12-22 05:00:55,387 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.14 vs. limit=15.0 2023-12-22 05:01:04,579 INFO [train.py:886] (1/4) Epoch 14, batch 100, loss[loss=0.0173, audio_tagging_loss=0.0173, over 25000.00 frames. ], tot_loss[loss=0.01975, audio_tagging_loss=0.01975, over 1970010.41 frames. ], batch size: 100, lr: 7.95e-03, grad_scale: 64.0 2023-12-22 05:01:12,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=413720.0, ans=0.125 2023-12-22 05:01:20,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=413786.6666666667, ans=0.05 2023-12-22 05:01:33,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=413853.3333333333, ans=0.0 2023-12-22 05:01:51,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=413986.6666666667, ans=0.1 2023-12-22 05:01:56,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=414053.3333333333, ans=0.125 2023-12-22 05:01:56,690 INFO [train.py:886] (1/4) Epoch 14, batch 150, loss[loss=0.0145, audio_tagging_loss=0.0145, over 25000.00 frames. ], tot_loss[loss=0.01808, audio_tagging_loss=0.01808, over 2640409.27 frames. ], batch size: 100, lr: 7.94e-03, grad_scale: 64.0 2023-12-22 05:02:01,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=414053.3333333333, ans=0.0 2023-12-22 05:02:15,949 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.35 vs. limit=15.0 2023-12-22 05:02:16,262 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.522e+01 2.861e+01 3.025e+01 3.235e+01 3.410e+01, threshold=6.050e+01, percent-clipped=0.0 2023-12-22 05:02:37,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=414320.0, ans=0.0 2023-12-22 05:02:41,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=414320.0, ans=0.1 2023-12-22 05:02:43,158 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.14 vs. limit=22.5 2023-12-22 05:02:45,900 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.85 vs. limit=15.0 2023-12-22 05:02:47,419 INFO [train.py:886] (1/4) Epoch 14, batch 200, loss[loss=0.01486, audio_tagging_loss=0.01486, over 25000.00 frames. ], tot_loss[loss=0.01697, audio_tagging_loss=0.01697, over 3155442.59 frames. ], batch size: 100, lr: 7.94e-03, grad_scale: 64.0 2023-12-22 05:02:59,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=414453.3333333333, ans=0.0 2023-12-22 05:03:02,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=414453.3333333333, ans=0.0 2023-12-22 05:03:02,291 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.60 vs. limit=12.0 2023-12-22 05:03:04,281 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.45 vs. limit=22.5 2023-12-22 05:03:04,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=414453.3333333333, ans=0.2 2023-12-22 05:03:05,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=414453.3333333333, ans=0.0 2023-12-22 05:03:36,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.62 vs. limit=22.5 2023-12-22 05:03:38,399 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.84 vs. limit=15.0 2023-12-22 05:03:40,532 INFO [train.py:886] (1/4) Epoch 14, batch 250, loss[loss=0.01291, audio_tagging_loss=0.01291, over 21890.00 frames. ], tot_loss[loss=0.01629, audio_tagging_loss=0.01629, over 3553900.87 frames. ], batch size: 107, lr: 7.94e-03, grad_scale: 64.0 2023-12-22 05:03:51,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=414786.6666666667, ans=0.125 2023-12-22 05:04:00,052 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.07 vs. limit=15.0 2023-12-22 05:04:00,269 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.478e+01 2.722e+01 2.891e+01 3.020e+01 3.428e+01, threshold=5.782e+01, percent-clipped=0.0 2023-12-22 05:04:13,520 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.53 vs. limit=15.0 2023-12-22 05:04:18,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=414920.0, ans=0.0 2023-12-22 05:04:29,263 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.38 vs. limit=15.0 2023-12-22 05:04:29,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=414986.6666666667, ans=0.0 2023-12-22 05:04:31,611 INFO [train.py:886] (1/4) Epoch 14, batch 300, loss[loss=0.01624, audio_tagging_loss=0.01624, over 24750.00 frames. ], tot_loss[loss=0.0159, audio_tagging_loss=0.0159, over 3863085.19 frames. ], batch size: 99, lr: 7.93e-03, grad_scale: 64.0 2023-12-22 05:04:36,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=415053.3333333333, ans=0.2 2023-12-22 05:04:41,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=415120.0, ans=0.0 2023-12-22 05:04:44,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=415120.0, ans=0.5 2023-12-22 05:05:22,295 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.12 vs. limit=15.0 2023-12-22 05:05:23,731 INFO [train.py:886] (1/4) Epoch 14, batch 350, loss[loss=0.01583, audio_tagging_loss=0.01583, over 24750.00 frames. ], tot_loss[loss=0.01572, audio_tagging_loss=0.01572, over 4103734.11 frames. ], batch size: 99, lr: 7.93e-03, grad_scale: 64.0 2023-12-22 05:05:29,097 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=15.0 2023-12-22 05:05:41,801 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.80 vs. limit=15.0 2023-12-22 05:05:44,923 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+01 2.760e+01 2.874e+01 3.045e+01 3.726e+01, threshold=5.747e+01, percent-clipped=0.0 2023-12-22 05:05:56,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=415586.6666666667, ans=0.07 2023-12-22 05:06:01,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=415586.6666666667, ans=0.125 2023-12-22 05:06:05,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=415653.3333333333, ans=0.2 2023-12-22 05:06:14,921 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.42 vs. limit=15.0 2023-12-22 05:06:15,484 INFO [train.py:886] (1/4) Epoch 14, batch 400, loss[loss=0.01418, audio_tagging_loss=0.01418, over 25000.00 frames. ], tot_loss[loss=0.01537, audio_tagging_loss=0.01537, over 4289087.25 frames. ], batch size: 100, lr: 7.93e-03, grad_scale: 64.0 2023-12-22 05:06:38,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=415853.3333333333, ans=0.04949747468305833 2023-12-22 05:06:43,103 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 05:06:52,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=415920.0, ans=0.125 2023-12-22 05:06:52,101 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 05:07:07,896 INFO [train.py:886] (1/4) Epoch 14, batch 450, loss[loss=0.01207, audio_tagging_loss=0.01207, over 25000.00 frames. ], tot_loss[loss=0.01509, audio_tagging_loss=0.01509, over 4436993.58 frames. ], batch size: 100, lr: 7.93e-03, grad_scale: 64.0 2023-12-22 05:07:17,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=416120.0, ans=0.1 2023-12-22 05:07:21,227 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=7.013e-01 2023-12-22 05:07:28,353 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.342e+01 2.661e+01 2.801e+01 2.949e+01 3.337e+01, threshold=5.602e+01, percent-clipped=0.0 2023-12-22 05:07:59,879 INFO [train.py:886] (1/4) Epoch 14, batch 500, loss[loss=0.01427, audio_tagging_loss=0.01427, over 21383.00 frames. ], tot_loss[loss=0.01492, audio_tagging_loss=0.01492, over 4550770.45 frames. ], batch size: 107, lr: 7.92e-03, grad_scale: 64.0 2023-12-22 05:08:06,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=416386.6666666667, ans=0.0 2023-12-22 05:08:07,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=416386.6666666667, ans=0.0 2023-12-22 05:08:08,263 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.35 vs. limit=15.0 2023-12-22 05:08:08,317 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.91 vs. limit=22.5 2023-12-22 05:08:16,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=416453.3333333333, ans=0.125 2023-12-22 05:08:24,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=416520.0, ans=0.0 2023-12-22 05:08:30,029 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.31 vs. limit=15.0 2023-12-22 05:08:45,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=416653.3333333333, ans=0.2 2023-12-22 05:08:47,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=416653.3333333333, ans=0.0 2023-12-22 05:08:51,494 INFO [train.py:886] (1/4) Epoch 14, batch 550, loss[loss=0.01587, audio_tagging_loss=0.01587, over 25000.00 frames. ], tot_loss[loss=0.01484, audio_tagging_loss=0.01484, over 4641457.84 frames. ], batch size: 100, lr: 7.92e-03, grad_scale: 64.0 2023-12-22 05:08:56,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=416720.0, ans=0.015 2023-12-22 05:08:59,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=416720.0, ans=0.1 2023-12-22 05:09:09,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=416786.6666666667, ans=0.125 2023-12-22 05:09:11,911 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.240e+01 2.686e+01 2.777e+01 2.975e+01 3.433e+01, threshold=5.554e+01, percent-clipped=0.0 2023-12-22 05:09:21,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=416920.0, ans=0.0 2023-12-22 05:09:22,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=416920.0, ans=0.125 2023-12-22 05:09:22,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=416920.0, ans=0.0 2023-12-22 05:09:43,198 INFO [train.py:886] (1/4) Epoch 14, batch 600, loss[loss=0.01492, audio_tagging_loss=0.01492, over 25000.00 frames. ], tot_loss[loss=0.01486, audio_tagging_loss=0.01486, over 4708435.98 frames. ], batch size: 100, lr: 7.92e-03, grad_scale: 64.0 2023-12-22 05:09:55,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=417120.0, ans=0.0 2023-12-22 05:10:04,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=417186.6666666667, ans=0.0 2023-12-22 05:10:09,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=417186.6666666667, ans=0.125 2023-12-22 05:10:17,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=417253.3333333333, ans=0.125 2023-12-22 05:10:18,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=417253.3333333333, ans=0.125 2023-12-22 05:10:26,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=417320.0, ans=0.0 2023-12-22 05:10:32,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=417320.0, ans=0.125 2023-12-22 05:10:34,854 INFO [train.py:886] (1/4) Epoch 14, batch 650, loss[loss=0.01287, audio_tagging_loss=0.01287, over 24750.00 frames. ], tot_loss[loss=0.01484, audio_tagging_loss=0.01484, over 4758239.78 frames. ], batch size: 99, lr: 7.91e-03, grad_scale: 64.0 2023-12-22 05:10:45,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=417453.3333333333, ans=0.07 2023-12-22 05:10:56,102 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.469e+01 2.731e+01 2.888e+01 3.018e+01 3.671e+01, threshold=5.777e+01, percent-clipped=0.0 2023-12-22 05:11:19,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=417653.3333333333, ans=0.2 2023-12-22 05:11:27,248 INFO [train.py:886] (1/4) Epoch 14, batch 700, loss[loss=0.0125, audio_tagging_loss=0.0125, over 24750.00 frames. ], tot_loss[loss=0.01474, audio_tagging_loss=0.01474, over 4795848.04 frames. ], batch size: 99, lr: 7.91e-03, grad_scale: 64.0 2023-12-22 05:11:37,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=417786.6666666667, ans=0.2 2023-12-22 05:11:40,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=417786.6666666667, ans=0.1 2023-12-22 05:11:41,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=417786.6666666667, ans=0.125 2023-12-22 05:11:51,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=417853.3333333333, ans=0.1 2023-12-22 05:12:06,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=417920.0, ans=0.125 2023-12-22 05:12:09,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=417986.6666666667, ans=0.125 2023-12-22 05:12:11,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=417986.6666666667, ans=0.0 2023-12-22 05:12:13,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=417986.6666666667, ans=0.1 2023-12-22 05:12:16,468 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.82 vs. limit=15.0 2023-12-22 05:12:18,783 INFO [train.py:886] (1/4) Epoch 14, batch 750, loss[loss=0.01337, audio_tagging_loss=0.01337, over 25000.00 frames. ], tot_loss[loss=0.01457, audio_tagging_loss=0.01457, over 4830515.90 frames. ], batch size: 100, lr: 7.91e-03, grad_scale: 64.0 2023-12-22 05:12:21,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=418053.3333333333, ans=0.2 2023-12-22 05:12:25,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=418053.3333333333, ans=0.125 2023-12-22 05:12:39,815 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.423e+01 2.688e+01 2.814e+01 2.981e+01 3.514e+01, threshold=5.628e+01, percent-clipped=0.0 2023-12-22 05:12:40,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=418186.6666666667, ans=0.0 2023-12-22 05:12:50,177 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 05:12:59,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=418320.0, ans=0.125 2023-12-22 05:13:11,012 INFO [train.py:886] (1/4) Epoch 14, batch 800, loss[loss=0.0141, audio_tagging_loss=0.0141, over 24750.00 frames. ], tot_loss[loss=0.01456, audio_tagging_loss=0.01456, over 4856873.62 frames. ], batch size: 99, lr: 7.90e-03, grad_scale: 64.0 2023-12-22 05:13:40,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=418520.0, ans=0.125 2023-12-22 05:14:02,733 INFO [train.py:886] (1/4) Epoch 14, batch 850, loss[loss=0.01533, audio_tagging_loss=0.01533, over 25000.00 frames. ], tot_loss[loss=0.0146, audio_tagging_loss=0.0146, over 4880609.68 frames. ], batch size: 100, lr: 7.90e-03, grad_scale: 64.0 2023-12-22 05:14:04,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=418720.0, ans=15.0 2023-12-22 05:14:23,683 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.411e+01 2.718e+01 2.822e+01 2.961e+01 3.534e+01, threshold=5.644e+01, percent-clipped=0.0 2023-12-22 05:14:34,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=418920.0, ans=0.2 2023-12-22 05:14:50,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=418986.6666666667, ans=0.125 2023-12-22 05:14:51,057 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.41 vs. limit=22.5 2023-12-22 05:14:54,314 INFO [train.py:886] (1/4) Epoch 14, batch 900, loss[loss=0.01354, audio_tagging_loss=0.01354, over 24750.00 frames. ], tot_loss[loss=0.01456, audio_tagging_loss=0.01456, over 4899310.88 frames. ], batch size: 99, lr: 7.90e-03, grad_scale: 64.0 2023-12-22 05:14:56,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=419053.3333333333, ans=0.125 2023-12-22 05:15:22,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=419186.6666666667, ans=0.125 2023-12-22 05:15:26,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=419253.3333333333, ans=0.125 2023-12-22 05:15:29,948 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.88 vs. limit=15.0 2023-12-22 05:15:34,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=419253.3333333333, ans=0.1 2023-12-22 05:15:41,006 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.43 vs. limit=15.0 2023-12-22 05:15:46,880 INFO [train.py:886] (1/4) Epoch 14, batch 950, loss[loss=0.01355, audio_tagging_loss=0.01355, over 24750.00 frames. ], tot_loss[loss=0.01469, audio_tagging_loss=0.01469, over 4903556.48 frames. ], batch size: 99, lr: 7.89e-03, grad_scale: 64.0 2023-12-22 05:16:07,289 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.477e+01 2.744e+01 2.862e+01 3.032e+01 3.515e+01, threshold=5.724e+01, percent-clipped=0.0 2023-12-22 05:16:07,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=419520.0, ans=0.0 2023-12-22 05:16:12,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=419520.0, ans=0.02 2023-12-22 05:16:17,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=419586.6666666667, ans=0.0 2023-12-22 05:16:26,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=419586.6666666667, ans=0.125 2023-12-22 05:16:30,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=419653.3333333333, ans=0.1 2023-12-22 05:16:38,376 INFO [train.py:886] (1/4) Epoch 14, batch 1000, loss[loss=0.01427, audio_tagging_loss=0.01427, over 24750.00 frames. ], tot_loss[loss=0.01459, audio_tagging_loss=0.01459, over 4911517.93 frames. ], batch size: 99, lr: 7.89e-03, grad_scale: 64.0 2023-12-22 05:16:40,703 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.71 vs. limit=22.5 2023-12-22 05:16:46,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=419720.0, ans=0.0 2023-12-22 05:16:48,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=419786.6666666667, ans=0.125 2023-12-22 05:16:49,567 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=15.0 2023-12-22 05:16:50,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=419786.6666666667, ans=0.125 2023-12-22 05:16:54,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=419786.6666666667, ans=12.0 2023-12-22 05:16:57,067 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.49 vs. limit=15.0 2023-12-22 05:17:09,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=419920.0, ans=0.1 2023-12-22 05:17:16,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=419920.0, ans=0.125 2023-12-22 05:17:19,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=419986.6666666667, ans=0.0 2023-12-22 05:17:25,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=419986.6666666667, ans=0.125 2023-12-22 05:17:25,930 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.10 vs. limit=10.0 2023-12-22 05:17:30,179 INFO [train.py:886] (1/4) Epoch 14, batch 1050, loss[loss=0.01176, audio_tagging_loss=0.01176, over 25000.00 frames. ], tot_loss[loss=0.01447, audio_tagging_loss=0.01447, over 4921861.82 frames. ], batch size: 100, lr: 7.89e-03, grad_scale: 64.0 2023-12-22 05:17:39,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=420120.0, ans=0.125 2023-12-22 05:17:40,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=420120.0, ans=0.2 2023-12-22 05:17:42,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=420120.0, ans=0.125 2023-12-22 05:17:51,093 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.262e+01 2.700e+01 2.859e+01 3.033e+01 3.573e+01, threshold=5.717e+01, percent-clipped=0.0 2023-12-22 05:17:52,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=420186.6666666667, ans=0.125 2023-12-22 05:17:53,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=420186.6666666667, ans=0.125 2023-12-22 05:18:15,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=420320.0, ans=0.0 2023-12-22 05:18:21,437 INFO [train.py:886] (1/4) Epoch 14, batch 1100, loss[loss=0.01482, audio_tagging_loss=0.01482, over 24750.00 frames. ], tot_loss[loss=0.0144, audio_tagging_loss=0.0144, over 4929114.09 frames. ], batch size: 99, lr: 7.89e-03, grad_scale: 64.0 2023-12-22 05:18:34,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=420453.3333333333, ans=0.0 2023-12-22 05:18:43,022 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.03 vs. limit=22.5 2023-12-22 05:18:43,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=420520.0, ans=0.1 2023-12-22 05:18:47,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=420520.0, ans=0.125 2023-12-22 05:18:52,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=420586.6666666667, ans=0.1 2023-12-22 05:18:59,312 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.74 vs. limit=15.0 2023-12-22 05:19:11,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=420653.3333333333, ans=0.1 2023-12-22 05:19:11,489 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.01 vs. limit=15.0 2023-12-22 05:19:13,594 INFO [train.py:886] (1/4) Epoch 14, batch 1150, loss[loss=0.01336, audio_tagging_loss=0.01336, over 25000.00 frames. ], tot_loss[loss=0.01431, audio_tagging_loss=0.01431, over 4935528.96 frames. ], batch size: 100, lr: 7.88e-03, grad_scale: 64.0 2023-12-22 05:19:20,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=420720.0, ans=0.125 2023-12-22 05:19:34,586 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.331e+01 2.729e+01 2.852e+01 2.990e+01 3.450e+01, threshold=5.704e+01, percent-clipped=0.0 2023-12-22 05:19:42,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=420853.3333333333, ans=0.05 2023-12-22 05:19:46,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=420920.0, ans=0.125 2023-12-22 05:20:02,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=420986.6666666667, ans=0.125 2023-12-22 05:20:05,399 INFO [train.py:886] (1/4) Epoch 14, batch 1200, loss[loss=0.01471, audio_tagging_loss=0.01471, over 25000.00 frames. ], tot_loss[loss=0.01442, audio_tagging_loss=0.01442, over 4946349.76 frames. ], batch size: 100, lr: 7.88e-03, grad_scale: 64.0 2023-12-22 05:20:10,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=421053.3333333333, ans=0.125 2023-12-22 05:20:15,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=421120.0, ans=0.0 2023-12-22 05:20:15,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=421120.0, ans=0.5 2023-12-22 05:20:22,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=421120.0, ans=0.0 2023-12-22 05:20:34,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=421186.6666666667, ans=0.025 2023-12-22 05:20:42,257 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 05:20:57,217 INFO [train.py:886] (1/4) Epoch 14, batch 1250, loss[loss=0.01348, audio_tagging_loss=0.01348, over 24750.00 frames. ], tot_loss[loss=0.0145, audio_tagging_loss=0.0145, over 4942602.35 frames. ], batch size: 99, lr: 7.88e-03, grad_scale: 128.0 2023-12-22 05:21:04,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=421386.6666666667, ans=0.2 2023-12-22 05:21:04,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=421386.6666666667, ans=0.125 2023-12-22 05:21:14,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=421453.3333333333, ans=0.0 2023-12-22 05:21:19,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=421520.0, ans=0.0 2023-12-22 05:21:20,037 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+01 2.730e+01 2.921e+01 3.084e+01 3.740e+01, threshold=5.843e+01, percent-clipped=0.0 2023-12-22 05:21:22,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=421520.0, ans=0.0 2023-12-22 05:21:50,383 INFO [train.py:886] (1/4) Epoch 14, batch 1300, loss[loss=0.0129, audio_tagging_loss=0.0129, over 24750.00 frames. ], tot_loss[loss=0.01468, audio_tagging_loss=0.01468, over 4940044.22 frames. ], batch size: 99, lr: 7.87e-03, grad_scale: 64.0 2023-12-22 05:21:50,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=421720.0, ans=0.125 2023-12-22 05:22:00,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=421786.6666666667, ans=0.125 2023-12-22 05:22:25,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=421920.0, ans=0.0 2023-12-22 05:22:32,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=421986.6666666667, ans=0.125 2023-12-22 05:22:40,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=422053.3333333333, ans=0.125 2023-12-22 05:22:40,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=422053.3333333333, ans=0.125 2023-12-22 05:22:41,360 INFO [train.py:886] (1/4) Epoch 14, batch 1350, loss[loss=0.01601, audio_tagging_loss=0.01601, over 25000.00 frames. ], tot_loss[loss=0.01461, audio_tagging_loss=0.01461, over 4942886.39 frames. ], batch size: 100, lr: 7.87e-03, grad_scale: 64.0 2023-12-22 05:22:42,711 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.96 vs. limit=15.0 2023-12-22 05:22:46,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=422053.3333333333, ans=0.125 2023-12-22 05:22:55,222 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.76 vs. limit=22.5 2023-12-22 05:22:56,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.42 vs. limit=10.0 2023-12-22 05:22:57,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=422120.0, ans=0.125 2023-12-22 05:22:58,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=422120.0, ans=0.125 2023-12-22 05:23:02,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=422186.6666666667, ans=0.0 2023-12-22 05:23:03,255 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.431e+01 2.742e+01 2.848e+01 3.000e+01 3.738e+01, threshold=5.695e+01, percent-clipped=0.0 2023-12-22 05:23:22,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=422320.0, ans=0.0 2023-12-22 05:23:32,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=422386.6666666667, ans=0.125 2023-12-22 05:23:32,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=422386.6666666667, ans=0.04949747468305833 2023-12-22 05:23:33,440 INFO [train.py:886] (1/4) Epoch 14, batch 1400, loss[loss=0.01207, audio_tagging_loss=0.01207, over 25000.00 frames. ], tot_loss[loss=0.01459, audio_tagging_loss=0.01459, over 4949361.11 frames. ], batch size: 100, lr: 7.87e-03, grad_scale: 64.0 2023-12-22 05:23:33,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=422386.6666666667, ans=0.125 2023-12-22 05:24:01,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=422520.0, ans=0.125 2023-12-22 05:24:03,404 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.88 vs. limit=10.0 2023-12-22 05:24:06,531 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.68 vs. limit=22.5 2023-12-22 05:24:11,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=422586.6666666667, ans=0.1 2023-12-22 05:24:13,125 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.78 vs. limit=15.0 2023-12-22 05:24:21,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=422653.3333333333, ans=0.0 2023-12-22 05:24:25,650 INFO [train.py:886] (1/4) Epoch 14, batch 1450, loss[loss=0.01742, audio_tagging_loss=0.01742, over 25000.00 frames. ], tot_loss[loss=0.01457, audio_tagging_loss=0.01457, over 4952045.50 frames. ], batch size: 100, lr: 7.86e-03, grad_scale: 64.0 2023-12-22 05:24:38,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=422786.6666666667, ans=0.125 2023-12-22 05:24:42,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=422786.6666666667, ans=15.0 2023-12-22 05:24:47,337 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.450e+01 2.660e+01 2.819e+01 2.961e+01 3.390e+01, threshold=5.637e+01, percent-clipped=0.0 2023-12-22 05:25:00,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=422920.0, ans=0.125 2023-12-22 05:25:09,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=422986.6666666667, ans=10.0 2023-12-22 05:25:16,642 INFO [train.py:886] (1/4) Epoch 14, batch 1500, loss[loss=0.01767, audio_tagging_loss=0.01767, over 24750.00 frames. ], tot_loss[loss=0.01453, audio_tagging_loss=0.01453, over 4954233.01 frames. ], batch size: 99, lr: 7.86e-03, grad_scale: 64.0 2023-12-22 05:25:47,297 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.029e-01 2023-12-22 05:25:56,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=423253.3333333333, ans=0.09899494936611666 2023-12-22 05:26:09,172 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2023-12-22 05:26:09,766 INFO [train.py:886] (1/4) Epoch 14, batch 1550, loss[loss=0.01661, audio_tagging_loss=0.01661, over 24750.00 frames. ], tot_loss[loss=0.01456, audio_tagging_loss=0.01456, over 4949825.98 frames. ], batch size: 99, lr: 7.86e-03, grad_scale: 64.0 2023-12-22 05:26:10,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=423386.6666666667, ans=0.2 2023-12-22 05:26:15,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=423386.6666666667, ans=0.0 2023-12-22 05:26:16,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=423386.6666666667, ans=0.0 2023-12-22 05:26:19,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=423453.3333333333, ans=0.0 2023-12-22 05:26:20,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=423453.3333333333, ans=0.2 2023-12-22 05:26:28,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=423520.0, ans=0.0 2023-12-22 05:26:30,349 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.459e+01 2.731e+01 2.887e+01 3.049e+01 4.641e+01, threshold=5.774e+01, percent-clipped=0.0 2023-12-22 05:26:32,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=423520.0, ans=0.125 2023-12-22 05:26:35,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=423520.0, ans=0.125 2023-12-22 05:26:36,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.77 vs. limit=15.0 2023-12-22 05:26:45,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=423586.6666666667, ans=0.125 2023-12-22 05:26:48,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=423586.6666666667, ans=0.0 2023-12-22 05:26:54,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=423653.3333333333, ans=0.125 2023-12-22 05:27:00,508 INFO [train.py:886] (1/4) Epoch 14, batch 1600, loss[loss=0.01233, audio_tagging_loss=0.01233, over 24750.00 frames. ], tot_loss[loss=0.0147, audio_tagging_loss=0.0147, over 4945015.66 frames. ], batch size: 99, lr: 7.85e-03, grad_scale: 64.0 2023-12-22 05:27:07,310 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.08 vs. limit=15.0 2023-12-22 05:27:18,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=423786.6666666667, ans=0.0 2023-12-22 05:27:31,995 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.88 vs. limit=15.0 2023-12-22 05:27:44,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=423986.6666666667, ans=10.0 2023-12-22 05:27:48,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=423986.6666666667, ans=0.2 2023-12-22 05:27:49,551 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.67 vs. limit=22.5 2023-12-22 05:27:52,895 INFO [train.py:886] (1/4) Epoch 14, batch 1650, loss[loss=0.01164, audio_tagging_loss=0.01164, over 25000.00 frames. ], tot_loss[loss=0.01454, audio_tagging_loss=0.01454, over 4945188.90 frames. ], batch size: 100, lr: 7.85e-03, grad_scale: 64.0 2023-12-22 05:27:53,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=424053.3333333333, ans=0.09899494936611666 2023-12-22 05:27:53,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=424053.3333333333, ans=0.125 2023-12-22 05:28:04,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=424120.0, ans=0.125 2023-12-22 05:28:14,932 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.440e+01 2.709e+01 2.898e+01 3.040e+01 3.602e+01, threshold=5.796e+01, percent-clipped=0.0 2023-12-22 05:28:16,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=424186.6666666667, ans=0.125 2023-12-22 05:28:19,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=424186.6666666667, ans=0.2 2023-12-22 05:28:44,646 INFO [train.py:886] (1/4) Epoch 14, batch 1700, loss[loss=0.01478, audio_tagging_loss=0.01478, over 25000.00 frames. ], tot_loss[loss=0.0145, audio_tagging_loss=0.0145, over 4946990.17 frames. ], batch size: 100, lr: 7.85e-03, grad_scale: 64.0 2023-12-22 05:28:53,304 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.78 vs. limit=12.0 2023-12-22 05:29:19,154 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.61 vs. limit=12.0 2023-12-22 05:29:37,004 INFO [train.py:886] (1/4) Epoch 14, batch 1750, loss[loss=0.01516, audio_tagging_loss=0.01516, over 25000.00 frames. ], tot_loss[loss=0.01443, audio_tagging_loss=0.01443, over 4954617.42 frames. ], batch size: 100, lr: 7.85e-03, grad_scale: 64.0 2023-12-22 05:29:46,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=424786.6666666667, ans=0.0 2023-12-22 05:29:59,041 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.432e+01 2.657e+01 2.813e+01 2.952e+01 3.749e+01, threshold=5.627e+01, percent-clipped=0.0 2023-12-22 05:30:07,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=424920.0, ans=0.125 2023-12-22 05:30:27,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=424986.6666666667, ans=0.95 2023-12-22 05:30:28,729 INFO [train.py:886] (1/4) Epoch 14, batch 1800, loss[loss=0.01896, audio_tagging_loss=0.01896, over 25000.00 frames. ], tot_loss[loss=0.01435, audio_tagging_loss=0.01435, over 4951939.05 frames. ], batch size: 100, lr: 7.84e-03, grad_scale: 64.0 2023-12-22 05:30:37,366 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.39 vs. limit=10.0 2023-12-22 05:30:38,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=425120.0, ans=0.125 2023-12-22 05:30:40,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=425120.0, ans=0.0 2023-12-22 05:30:51,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=425186.6666666667, ans=0.125 2023-12-22 05:31:16,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=425320.0, ans=0.2 2023-12-22 05:31:20,761 INFO [train.py:886] (1/4) Epoch 14, batch 1850, loss[loss=0.01481, audio_tagging_loss=0.01481, over 24750.00 frames. ], tot_loss[loss=0.01437, audio_tagging_loss=0.01437, over 4945808.81 frames. ], batch size: 99, lr: 7.84e-03, grad_scale: 64.0 2023-12-22 05:31:20,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=425386.6666666667, ans=0.0 2023-12-22 05:31:23,920 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.17 vs. limit=15.0 2023-12-22 05:31:27,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=425386.6666666667, ans=0.125 2023-12-22 05:31:41,297 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.85 vs. limit=22.5 2023-12-22 05:31:42,480 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.475e+01 2.703e+01 2.874e+01 3.050e+01 3.710e+01, threshold=5.749e+01, percent-clipped=0.0 2023-12-22 05:31:50,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=425520.0, ans=0.125 2023-12-22 05:31:55,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=425586.6666666667, ans=0.0 2023-12-22 05:31:57,753 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.56 vs. limit=10.0 2023-12-22 05:32:03,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=425653.3333333333, ans=0.1 2023-12-22 05:32:12,711 INFO [train.py:886] (1/4) Epoch 14, batch 1900, loss[loss=0.01568, audio_tagging_loss=0.01568, over 24750.00 frames. ], tot_loss[loss=0.0145, audio_tagging_loss=0.0145, over 4939926.45 frames. ], batch size: 99, lr: 7.84e-03, grad_scale: 64.0 2023-12-22 05:32:35,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=425853.3333333333, ans=0.125 2023-12-22 05:32:36,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=425853.3333333333, ans=0.0 2023-12-22 05:32:54,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=425986.6666666667, ans=0.2 2023-12-22 05:32:59,758 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.62 vs. limit=15.0 2023-12-22 05:33:04,808 INFO [train.py:886] (1/4) Epoch 14, batch 1950, loss[loss=0.01355, audio_tagging_loss=0.01355, over 24750.00 frames. ], tot_loss[loss=0.01444, audio_tagging_loss=0.01444, over 4936453.97 frames. ], batch size: 99, lr: 7.83e-03, grad_scale: 64.0 2023-12-22 05:33:20,958 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2023-12-22 05:33:26,065 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+01 2.762e+01 2.921e+01 3.064e+01 3.650e+01, threshold=5.841e+01, percent-clipped=0.0 2023-12-22 05:33:40,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=426253.3333333333, ans=0.125 2023-12-22 05:33:56,132 INFO [train.py:886] (1/4) Epoch 14, batch 2000, loss[loss=0.01353, audio_tagging_loss=0.01353, over 24041.00 frames. ], tot_loss[loss=0.01439, audio_tagging_loss=0.01439, over 4937527.05 frames. ], batch size: 100, lr: 7.83e-03, grad_scale: 64.0 2023-12-22 05:34:01,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=426386.6666666667, ans=0.0 2023-12-22 05:34:03,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=426386.6666666667, ans=0.125 2023-12-22 05:34:28,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=426586.6666666667, ans=0.125 2023-12-22 05:34:44,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=426653.3333333333, ans=0.0 2023-12-22 05:34:48,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=426720.0, ans=0.1 2023-12-22 05:34:49,246 INFO [train.py:886] (1/4) Epoch 14, batch 2050, loss[loss=0.0144, audio_tagging_loss=0.0144, over 25000.00 frames. ], tot_loss[loss=0.0143, audio_tagging_loss=0.0143, over 4943403.00 frames. ], batch size: 100, lr: 7.83e-03, grad_scale: 64.0 2023-12-22 05:34:55,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=426720.0, ans=0.1 2023-12-22 05:35:00,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=426786.6666666667, ans=0.125 2023-12-22 05:35:11,159 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.288e+01 2.714e+01 2.866e+01 3.005e+01 3.462e+01, threshold=5.732e+01, percent-clipped=0.0 2023-12-22 05:35:11,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=426853.3333333333, ans=0.1 2023-12-22 05:35:23,697 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.95 vs. limit=22.5 2023-12-22 05:35:27,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=426920.0, ans=0.125 2023-12-22 05:35:29,093 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.36 vs. limit=22.5 2023-12-22 05:35:34,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=426986.6666666667, ans=0.125 2023-12-22 05:35:41,255 INFO [train.py:886] (1/4) Epoch 14, batch 2100, loss[loss=0.016, audio_tagging_loss=0.016, over 25000.00 frames. ], tot_loss[loss=0.01435, audio_tagging_loss=0.01435, over 4948748.72 frames. ], batch size: 100, lr: 7.82e-03, grad_scale: 64.0 2023-12-22 05:35:49,331 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.13 vs. limit=15.0 2023-12-22 05:36:32,197 INFO [train.py:886] (1/4) Epoch 14, batch 2150, loss[loss=0.01756, audio_tagging_loss=0.01756, over 25000.00 frames. ], tot_loss[loss=0.01436, audio_tagging_loss=0.01436, over 4948510.17 frames. ], batch size: 100, lr: 7.82e-03, grad_scale: 64.0 2023-12-22 05:36:54,875 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.418e+01 2.697e+01 2.884e+01 3.038e+01 3.535e+01, threshold=5.769e+01, percent-clipped=0.0 2023-12-22 05:37:05,771 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.21 vs. limit=15.0 2023-12-22 05:37:18,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=427653.3333333333, ans=0.125 2023-12-22 05:37:20,500 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.70 vs. limit=22.5 2023-12-22 05:37:24,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=427720.0, ans=0.0 2023-12-22 05:37:25,205 INFO [train.py:886] (1/4) Epoch 14, batch 2200, loss[loss=0.01205, audio_tagging_loss=0.01205, over 24750.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4948240.96 frames. ], batch size: 99, lr: 7.82e-03, grad_scale: 64.0 2023-12-22 05:37:28,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=427720.0, ans=0.0 2023-12-22 05:37:40,089 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 05:37:57,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=427920.0, ans=0.0 2023-12-22 05:38:01,355 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.94 vs. limit=15.0 2023-12-22 05:38:04,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=427986.6666666667, ans=0.0 2023-12-22 05:38:10,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=427986.6666666667, ans=0.125 2023-12-22 05:38:17,243 INFO [train.py:886] (1/4) Epoch 14, batch 2250, loss[loss=0.01476, audio_tagging_loss=0.01476, over 24750.00 frames. ], tot_loss[loss=0.01456, audio_tagging_loss=0.01456, over 4942344.98 frames. ], batch size: 99, lr: 7.82e-03, grad_scale: 64.0 2023-12-22 05:38:31,751 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.71 vs. limit=15.0 2023-12-22 05:38:36,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=428186.6666666667, ans=0.125 2023-12-22 05:38:37,736 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.498e+01 2.726e+01 2.835e+01 3.020e+01 3.346e+01, threshold=5.670e+01, percent-clipped=0.0 2023-12-22 05:38:46,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=428253.3333333333, ans=0.0 2023-12-22 05:38:55,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=428253.3333333333, ans=0.125 2023-12-22 05:38:59,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=428320.0, ans=0.125 2023-12-22 05:39:01,920 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.85 vs. limit=22.5 2023-12-22 05:39:02,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=428320.0, ans=0.0 2023-12-22 05:39:06,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=428386.6666666667, ans=0.125 2023-12-22 05:39:07,322 INFO [train.py:886] (1/4) Epoch 14, batch 2300, loss[loss=0.01267, audio_tagging_loss=0.01267, over 24750.00 frames. ], tot_loss[loss=0.01446, audio_tagging_loss=0.01446, over 4944383.35 frames. ], batch size: 99, lr: 7.81e-03, grad_scale: 64.0 2023-12-22 05:39:15,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=428386.6666666667, ans=0.0 2023-12-22 05:39:27,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=428453.3333333333, ans=0.2 2023-12-22 05:39:47,125 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.31 vs. limit=15.0 2023-12-22 05:39:54,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=428653.3333333333, ans=0.125 2023-12-22 05:39:58,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=428653.3333333333, ans=0.125 2023-12-22 05:40:00,308 INFO [train.py:886] (1/4) Epoch 14, batch 2350, loss[loss=0.01581, audio_tagging_loss=0.01581, over 25000.00 frames. ], tot_loss[loss=0.01437, audio_tagging_loss=0.01437, over 4945154.78 frames. ], batch size: 100, lr: 7.81e-03, grad_scale: 64.0 2023-12-22 05:40:21,522 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.328e+01 2.672e+01 2.834e+01 2.975e+01 3.666e+01, threshold=5.667e+01, percent-clipped=0.0 2023-12-22 05:40:28,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=428853.3333333333, ans=0.125 2023-12-22 05:40:51,854 INFO [train.py:886] (1/4) Epoch 14, batch 2400, loss[loss=0.01328, audio_tagging_loss=0.01328, over 25000.00 frames. ], tot_loss[loss=0.01431, audio_tagging_loss=0.01431, over 4945776.06 frames. ], batch size: 100, lr: 7.81e-03, grad_scale: 64.0 2023-12-22 05:40:54,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=429053.3333333333, ans=0.0 2023-12-22 05:40:55,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=429053.3333333333, ans=0.1 2023-12-22 05:41:07,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=429120.0, ans=0.2 2023-12-22 05:41:22,391 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.85 vs. limit=12.0 2023-12-22 05:41:31,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=429253.3333333333, ans=0.125 2023-12-22 05:41:32,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=429320.0, ans=0.5 2023-12-22 05:41:44,339 INFO [train.py:886] (1/4) Epoch 14, batch 2450, loss[loss=0.01667, audio_tagging_loss=0.01667, over 25000.00 frames. ], tot_loss[loss=0.01433, audio_tagging_loss=0.01433, over 4948138.21 frames. ], batch size: 100, lr: 7.80e-03, grad_scale: 64.0 2023-12-22 05:41:54,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=429453.3333333333, ans=0.1 2023-12-22 05:42:05,322 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.21 vs. limit=15.0 2023-12-22 05:42:05,686 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.312e+01 2.732e+01 2.868e+01 2.998e+01 3.609e+01, threshold=5.736e+01, percent-clipped=0.0 2023-12-22 05:42:28,112 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.48 vs. limit=15.0 2023-12-22 05:42:35,902 INFO [train.py:886] (1/4) Epoch 14, batch 2500, loss[loss=0.01687, audio_tagging_loss=0.01687, over 25000.00 frames. ], tot_loss[loss=0.01445, audio_tagging_loss=0.01445, over 4946208.72 frames. ], batch size: 100, lr: 7.80e-03, grad_scale: 64.0 2023-12-22 05:42:40,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=429720.0, ans=0.1 2023-12-22 05:42:48,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=429786.6666666667, ans=0.0 2023-12-22 05:42:50,146 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.99 vs. limit=15.0 2023-12-22 05:42:50,979 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.21 vs. limit=22.5 2023-12-22 05:42:54,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=429786.6666666667, ans=0.125 2023-12-22 05:42:56,927 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2023-12-22 05:43:01,439 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.49 vs. limit=6.0 2023-12-22 05:43:27,555 INFO [train.py:886] (1/4) Epoch 14, batch 2550, loss[loss=0.01309, audio_tagging_loss=0.01309, over 24750.00 frames. ], tot_loss[loss=0.01451, audio_tagging_loss=0.01451, over 4945338.16 frames. ], batch size: 99, lr: 7.80e-03, grad_scale: 64.0 2023-12-22 05:43:34,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=430053.3333333333, ans=0.125 2023-12-22 05:43:40,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=430120.0, ans=0.0 2023-12-22 05:43:43,663 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.16 vs. limit=15.0 2023-12-22 05:43:49,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=430186.6666666667, ans=0.125 2023-12-22 05:43:50,204 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.337e+01 2.742e+01 2.892e+01 3.076e+01 3.372e+01, threshold=5.784e+01, percent-clipped=0.0 2023-12-22 05:44:15,171 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.27 vs. limit=15.0 2023-12-22 05:44:20,781 INFO [train.py:886] (1/4) Epoch 14, batch 2600, loss[loss=0.0129, audio_tagging_loss=0.0129, over 25000.00 frames. ], tot_loss[loss=0.01445, audio_tagging_loss=0.01445, over 4948348.92 frames. ], batch size: 100, lr: 7.79e-03, grad_scale: 64.0 2023-12-22 05:44:32,691 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.83 vs. limit=15.0 2023-12-22 05:44:44,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=430520.0, ans=0.0 2023-12-22 05:44:46,560 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=15.0 2023-12-22 05:44:52,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=430586.6666666667, ans=0.0 2023-12-22 05:44:54,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=430586.6666666667, ans=0.1 2023-12-22 05:45:01,074 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.590e-03 2023-12-22 05:45:11,166 INFO [train.py:886] (1/4) Epoch 14, batch 2650, loss[loss=0.01092, audio_tagging_loss=0.01092, over 25000.00 frames. ], tot_loss[loss=0.01438, audio_tagging_loss=0.01438, over 4953994.72 frames. ], batch size: 100, lr: 7.79e-03, grad_scale: 64.0 2023-12-22 05:45:11,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=430720.0, ans=0.07 2023-12-22 05:45:15,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=430720.0, ans=0.1 2023-12-22 05:45:25,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=430786.6666666667, ans=0.125 2023-12-22 05:45:27,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=430786.6666666667, ans=0.2 2023-12-22 05:45:33,100 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.439e+01 2.737e+01 2.885e+01 3.025e+01 3.317e+01, threshold=5.770e+01, percent-clipped=0.0 2023-12-22 05:45:58,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=430986.6666666667, ans=0.0 2023-12-22 05:46:03,593 INFO [train.py:886] (1/4) Epoch 14, batch 2700, loss[loss=0.0112, audio_tagging_loss=0.0112, over 25000.00 frames. ], tot_loss[loss=0.01426, audio_tagging_loss=0.01426, over 4956437.89 frames. ], batch size: 100, lr: 7.79e-03, grad_scale: 64.0 2023-12-22 05:46:04,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=431053.3333333333, ans=0.2 2023-12-22 05:46:15,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=431120.0, ans=0.125 2023-12-22 05:46:15,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=431120.0, ans=0.1 2023-12-22 05:46:27,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=431186.6666666667, ans=0.125 2023-12-22 05:46:27,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=431186.6666666667, ans=0.125 2023-12-22 05:46:33,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=431186.6666666667, ans=0.0 2023-12-22 05:46:36,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=431253.3333333333, ans=0.09899494936611666 2023-12-22 05:46:42,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=431253.3333333333, ans=0.05 2023-12-22 05:46:47,248 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=5.051e-03 2023-12-22 05:46:49,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=431320.0, ans=0.125 2023-12-22 05:46:55,327 INFO [train.py:886] (1/4) Epoch 14, batch 2750, loss[loss=0.01554, audio_tagging_loss=0.01554, over 24750.00 frames. ], tot_loss[loss=0.01427, audio_tagging_loss=0.01427, over 4951336.44 frames. ], batch size: 99, lr: 7.79e-03, grad_scale: 64.0 2023-12-22 05:47:03,082 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.84 vs. limit=15.0 2023-12-22 05:47:04,208 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.77 vs. limit=22.5 2023-12-22 05:47:17,162 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.362e+01 2.705e+01 2.817e+01 2.978e+01 3.595e+01, threshold=5.635e+01, percent-clipped=0.0 2023-12-22 05:47:23,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=431520.0, ans=0.125 2023-12-22 05:47:30,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.whiten.whitening_limit, batch_count=431586.6666666667, ans=12.0 2023-12-22 05:47:43,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=431653.3333333333, ans=0.125 2023-12-22 05:47:44,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=431653.3333333333, ans=0.1 2023-12-22 05:47:45,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=431653.3333333333, ans=0.125 2023-12-22 05:47:46,633 INFO [train.py:886] (1/4) Epoch 14, batch 2800, loss[loss=0.015, audio_tagging_loss=0.015, over 24750.00 frames. ], tot_loss[loss=0.01437, audio_tagging_loss=0.01437, over 4955358.77 frames. ], batch size: 99, lr: 7.78e-03, grad_scale: 64.0 2023-12-22 05:47:50,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.48 vs. limit=6.0 2023-12-22 05:48:07,420 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.36 vs. limit=10.0 2023-12-22 05:48:09,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=431853.3333333333, ans=0.0 2023-12-22 05:48:13,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.42 vs. limit=15.0 2023-12-22 05:48:22,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=431920.0, ans=0.0 2023-12-22 05:48:29,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=431986.6666666667, ans=0.0 2023-12-22 05:48:35,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=431986.6666666667, ans=0.125 2023-12-22 05:48:37,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=431986.6666666667, ans=0.0 2023-12-22 05:48:39,049 INFO [train.py:886] (1/4) Epoch 14, batch 2850, loss[loss=0.01607, audio_tagging_loss=0.01607, over 24750.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4949604.80 frames. ], batch size: 99, lr: 7.78e-03, grad_scale: 64.0 2023-12-22 05:48:39,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=432053.3333333333, ans=0.0 2023-12-22 05:48:40,424 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.96 vs. limit=15.0 2023-12-22 05:49:00,969 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.373e+01 2.725e+01 2.911e+01 3.000e+01 3.725e+01, threshold=5.822e+01, percent-clipped=0.0 2023-12-22 05:49:02,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=432186.6666666667, ans=0.125 2023-12-22 05:49:13,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=432253.3333333333, ans=0.1 2023-12-22 05:49:31,048 INFO [train.py:886] (1/4) Epoch 14, batch 2900, loss[loss=0.01523, audio_tagging_loss=0.01523, over 25000.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4947520.71 frames. ], batch size: 100, lr: 7.78e-03, grad_scale: 64.0 2023-12-22 05:49:38,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=432386.6666666667, ans=0.2 2023-12-22 05:49:40,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=432453.3333333333, ans=0.125 2023-12-22 05:49:49,133 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.97 vs. limit=15.0 2023-12-22 05:50:07,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=432586.6666666667, ans=0.2 2023-12-22 05:50:08,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=432586.6666666667, ans=0.125 2023-12-22 05:50:15,926 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.09 vs. limit=15.0 2023-12-22 05:50:17,780 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.33 vs. limit=15.0 2023-12-22 05:50:20,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=432653.3333333333, ans=0.0 2023-12-22 05:50:22,855 INFO [train.py:886] (1/4) Epoch 14, batch 2950, loss[loss=0.01597, audio_tagging_loss=0.01597, over 24750.00 frames. ], tot_loss[loss=0.01435, audio_tagging_loss=0.01435, over 4951547.33 frames. ], batch size: 99, lr: 7.77e-03, grad_scale: 64.0 2023-12-22 05:50:35,099 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.90 vs. limit=15.0 2023-12-22 05:50:43,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=432853.3333333333, ans=0.2 2023-12-22 05:50:44,825 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+01 2.760e+01 2.879e+01 3.048e+01 3.785e+01, threshold=5.759e+01, percent-clipped=0.0 2023-12-22 05:50:48,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=432853.3333333333, ans=0.1 2023-12-22 05:51:03,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=432986.6666666667, ans=0.0 2023-12-22 05:51:14,401 INFO [train.py:886] (1/4) Epoch 14, batch 3000, loss[loss=0.0156, audio_tagging_loss=0.0156, over 25000.00 frames. ], tot_loss[loss=0.01425, audio_tagging_loss=0.01425, over 4956912.24 frames. ], batch size: 100, lr: 7.77e-03, grad_scale: 64.0 2023-12-22 05:51:14,402 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 05:51:35,572 INFO [train.py:917] (1/4) Epoch 14, validation: loss=0.03344, audio_tagging_loss=0.03344, over 3737520.00 frames. 2023-12-22 05:51:35,573 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 05:51:44,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=433120.0, ans=0.1 2023-12-22 05:51:44,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=433120.0, ans=0.125 2023-12-22 05:51:50,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=433120.0, ans=0.125 2023-12-22 05:51:51,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=433120.0, ans=0.125 2023-12-22 05:52:26,664 INFO [train.py:886] (1/4) Epoch 14, batch 3050, loss[loss=0.01499, audio_tagging_loss=0.01499, over 25000.00 frames. ], tot_loss[loss=0.01421, audio_tagging_loss=0.01421, over 4964415.85 frames. ], batch size: 100, lr: 7.77e-03, grad_scale: 64.0 2023-12-22 05:52:37,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=433453.3333333333, ans=0.1 2023-12-22 05:52:49,191 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.383e+01 2.748e+01 2.853e+01 3.040e+01 4.569e+01, threshold=5.706e+01, percent-clipped=0.0 2023-12-22 05:52:49,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=433520.0, ans=0.125 2023-12-22 05:53:19,729 INFO [train.py:886] (1/4) Epoch 14, batch 3100, loss[loss=0.01205, audio_tagging_loss=0.01205, over 24750.00 frames. ], tot_loss[loss=0.01429, audio_tagging_loss=0.01429, over 4963504.00 frames. ], batch size: 99, lr: 7.76e-03, grad_scale: 64.0 2023-12-22 05:53:27,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=433720.0, ans=0.1 2023-12-22 05:53:50,340 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 05:53:51,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=433920.0, ans=0.0 2023-12-22 05:53:53,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=433920.0, ans=0.0 2023-12-22 05:54:04,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=433986.6666666667, ans=0.125 2023-12-22 05:54:09,526 INFO [train.py:886] (1/4) Epoch 14, batch 3150, loss[loss=0.01505, audio_tagging_loss=0.01505, over 24750.00 frames. ], tot_loss[loss=0.01452, audio_tagging_loss=0.01452, over 4950518.38 frames. ], batch size: 99, lr: 7.76e-03, grad_scale: 64.0 2023-12-22 05:54:10,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.54 vs. limit=15.0 2023-12-22 05:54:23,474 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.98 vs. limit=12.0 2023-12-22 05:54:24,389 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.84 vs. limit=22.5 2023-12-22 05:54:30,685 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.378e+01 2.796e+01 2.957e+01 3.108e+01 3.516e+01, threshold=5.914e+01, percent-clipped=0.0 2023-12-22 05:54:36,921 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.30 vs. limit=22.5 2023-12-22 05:54:48,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=434253.3333333333, ans=0.1 2023-12-22 05:54:57,961 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.94 vs. limit=15.0 2023-12-22 05:55:01,151 INFO [train.py:886] (1/4) Epoch 14, batch 3200, loss[loss=0.01412, audio_tagging_loss=0.01412, over 24750.00 frames. ], tot_loss[loss=0.0145, audio_tagging_loss=0.0145, over 4947616.59 frames. ], batch size: 99, lr: 7.76e-03, grad_scale: 64.0 2023-12-22 05:55:53,581 INFO [train.py:886] (1/4) Epoch 14, batch 3250, loss[loss=0.01316, audio_tagging_loss=0.01316, over 25000.00 frames. ], tot_loss[loss=0.01451, audio_tagging_loss=0.01451, over 4944245.79 frames. ], batch size: 100, lr: 7.76e-03, grad_scale: 64.0 2023-12-22 05:55:54,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=434720.0, ans=0.125 2023-12-22 05:55:54,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=434720.0, ans=0.05 2023-12-22 05:56:11,826 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2023-12-22 05:56:14,174 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+01 2.695e+01 2.854e+01 3.040e+01 5.272e+01, threshold=5.707e+01, percent-clipped=0.0 2023-12-22 05:56:35,592 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.23 vs. limit=15.0 2023-12-22 05:56:42,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=434986.6666666667, ans=0.07 2023-12-22 05:56:44,363 INFO [train.py:886] (1/4) Epoch 14, batch 3300, loss[loss=0.01369, audio_tagging_loss=0.01369, over 25000.00 frames. ], tot_loss[loss=0.01441, audio_tagging_loss=0.01441, over 4941467.31 frames. ], batch size: 100, lr: 7.75e-03, grad_scale: 64.0 2023-12-22 05:56:44,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=435053.3333333333, ans=0.0 2023-12-22 05:56:52,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=435053.3333333333, ans=0.125 2023-12-22 05:57:08,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=435186.6666666667, ans=0.2 2023-12-22 05:57:12,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=435186.6666666667, ans=0.125 2023-12-22 05:57:14,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=435253.3333333333, ans=0.05 2023-12-22 05:57:24,116 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.23 vs. limit=10.0 2023-12-22 05:57:29,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=435320.0, ans=0.0 2023-12-22 05:57:37,375 INFO [train.py:886] (1/4) Epoch 14, batch 3350, loss[loss=0.01612, audio_tagging_loss=0.01612, over 25000.00 frames. ], tot_loss[loss=0.01441, audio_tagging_loss=0.01441, over 4937939.47 frames. ], batch size: 100, lr: 7.75e-03, grad_scale: 64.0 2023-12-22 05:57:59,784 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.339e+01 2.712e+01 2.830e+01 3.003e+01 3.619e+01, threshold=5.660e+01, percent-clipped=0.0 2023-12-22 05:58:21,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=435653.3333333333, ans=0.0 2023-12-22 05:58:27,779 INFO [train.py:886] (1/4) Epoch 14, batch 3400, loss[loss=0.01581, audio_tagging_loss=0.01581, over 24939.00 frames. ], tot_loss[loss=0.01449, audio_tagging_loss=0.01449, over 4946121.29 frames. ], batch size: 100, lr: 7.75e-03, grad_scale: 64.0 2023-12-22 05:58:35,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=435720.0, ans=0.125 2023-12-22 05:58:40,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=435786.6666666667, ans=0.2 2023-12-22 05:58:53,638 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 05:59:01,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=435920.0, ans=0.0 2023-12-22 05:59:03,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=435920.0, ans=0.125 2023-12-22 05:59:06,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=435920.0, ans=0.1 2023-12-22 05:59:15,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=435986.6666666667, ans=0.125 2023-12-22 05:59:20,167 INFO [train.py:886] (1/4) Epoch 14, batch 3450, loss[loss=0.01386, audio_tagging_loss=0.01386, over 24750.00 frames. ], tot_loss[loss=0.01451, audio_tagging_loss=0.01451, over 4945977.06 frames. ], batch size: 99, lr: 7.74e-03, grad_scale: 64.0 2023-12-22 05:59:23,351 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.14 vs. limit=15.0 2023-12-22 05:59:24,642 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.44 vs. limit=22.5 2023-12-22 05:59:26,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=436053.3333333333, ans=0.125 2023-12-22 05:59:43,842 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.529e+01 2.742e+01 2.883e+01 3.018e+01 3.834e+01, threshold=5.765e+01, percent-clipped=0.0 2023-12-22 05:59:54,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=436253.3333333333, ans=0.0 2023-12-22 06:00:05,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=436320.0, ans=0.2 2023-12-22 06:00:05,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=436320.0, ans=0.125 2023-12-22 06:00:05,889 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.69 vs. limit=15.0 2023-12-22 06:00:09,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=436320.0, ans=0.125 2023-12-22 06:00:11,863 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.50 vs. limit=6.0 2023-12-22 06:00:13,205 INFO [train.py:886] (1/4) Epoch 14, batch 3500, loss[loss=0.01426, audio_tagging_loss=0.01426, over 24750.00 frames. ], tot_loss[loss=0.01455, audio_tagging_loss=0.01455, over 4940787.68 frames. ], batch size: 99, lr: 7.74e-03, grad_scale: 32.0 2023-12-22 06:00:14,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=436386.6666666667, ans=0.2 2023-12-22 06:00:26,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=436453.3333333333, ans=0.0 2023-12-22 06:00:38,859 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.13 vs. limit=15.0 2023-12-22 06:00:43,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=436586.6666666667, ans=0.0 2023-12-22 06:00:45,276 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.17 vs. limit=22.5 2023-12-22 06:00:47,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=436586.6666666667, ans=10.0 2023-12-22 06:01:02,849 INFO [train.py:886] (1/4) Epoch 14, batch 3550, loss[loss=0.01106, audio_tagging_loss=0.01106, over 25000.00 frames. ], tot_loss[loss=0.01439, audio_tagging_loss=0.01439, over 4938127.97 frames. ], batch size: 100, lr: 7.74e-03, grad_scale: 32.0 2023-12-22 06:01:16,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=436786.6666666667, ans=0.1 2023-12-22 06:01:26,843 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.413e+01 2.689e+01 2.829e+01 3.028e+01 3.560e+01, threshold=5.658e+01, percent-clipped=0.0 2023-12-22 06:01:34,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=436920.0, ans=0.07 2023-12-22 06:01:47,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=436986.6666666667, ans=0.0 2023-12-22 06:01:53,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=437053.3333333333, ans=0.0 2023-12-22 06:01:54,630 INFO [train.py:886] (1/4) Epoch 14, batch 3600, loss[loss=0.01184, audio_tagging_loss=0.01184, over 25000.00 frames. ], tot_loss[loss=0.01433, audio_tagging_loss=0.01433, over 4944169.35 frames. ], batch size: 100, lr: 7.74e-03, grad_scale: 32.0 2023-12-22 06:01:54,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=437053.3333333333, ans=0.125 2023-12-22 06:01:56,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=437053.3333333333, ans=0.125 2023-12-22 06:02:07,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=437120.0, ans=0.0 2023-12-22 06:02:16,323 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.33 vs. limit=22.5 2023-12-22 06:02:22,524 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.38 vs. limit=15.0 2023-12-22 06:02:44,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=437320.0, ans=0.07 2023-12-22 06:02:46,116 INFO [train.py:886] (1/4) Epoch 14, batch 3650, loss[loss=0.01304, audio_tagging_loss=0.01304, over 25000.00 frames. ], tot_loss[loss=0.01427, audio_tagging_loss=0.01427, over 4948762.37 frames. ], batch size: 100, lr: 7.73e-03, grad_scale: 32.0 2023-12-22 06:03:02,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=437453.3333333333, ans=0.0 2023-12-22 06:03:05,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=437520.0, ans=0.0 2023-12-22 06:03:06,081 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.87 vs. limit=15.0 2023-12-22 06:03:08,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=437520.0, ans=0.0 2023-12-22 06:03:09,463 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.394e+01 2.669e+01 2.809e+01 2.947e+01 3.516e+01, threshold=5.618e+01, percent-clipped=0.0 2023-12-22 06:03:09,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=437520.0, ans=0.1 2023-12-22 06:03:16,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=437586.6666666667, ans=0.125 2023-12-22 06:03:26,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=437586.6666666667, ans=0.2 2023-12-22 06:03:26,275 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.43 vs. limit=22.5 2023-12-22 06:03:38,037 INFO [train.py:886] (1/4) Epoch 14, batch 3700, loss[loss=0.01626, audio_tagging_loss=0.01626, over 25000.00 frames. ], tot_loss[loss=0.01423, audio_tagging_loss=0.01423, over 4950023.11 frames. ], batch size: 100, lr: 7.73e-03, grad_scale: 32.0 2023-12-22 06:03:41,535 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2023-12-22 06:03:42,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=437720.0, ans=0.125 2023-12-22 06:03:56,748 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.11 vs. limit=15.0 2023-12-22 06:03:56,895 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.65 vs. limit=15.0 2023-12-22 06:04:01,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=437853.3333333333, ans=0.125 2023-12-22 06:04:04,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=437853.3333333333, ans=0.2 2023-12-22 06:04:07,917 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.08 vs. limit=15.0 2023-12-22 06:04:12,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=437920.0, ans=0.2 2023-12-22 06:04:12,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=437920.0, ans=0.0 2023-12-22 06:04:14,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=437920.0, ans=15.0 2023-12-22 06:04:24,383 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.53 vs. limit=15.0 2023-12-22 06:04:30,270 INFO [train.py:886] (1/4) Epoch 14, batch 3750, loss[loss=0.01257, audio_tagging_loss=0.01257, over 25000.00 frames. ], tot_loss[loss=0.01439, audio_tagging_loss=0.01439, over 4947365.46 frames. ], batch size: 100, lr: 7.73e-03, grad_scale: 32.0 2023-12-22 06:04:52,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=438186.6666666667, ans=0.0 2023-12-22 06:04:54,121 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.413e+01 2.790e+01 2.895e+01 3.050e+01 3.553e+01, threshold=5.791e+01, percent-clipped=0.0 2023-12-22 06:04:57,539 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.26 vs. limit=15.0 2023-12-22 06:04:58,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=438186.6666666667, ans=0.0 2023-12-22 06:05:22,192 INFO [train.py:886] (1/4) Epoch 14, batch 3800, loss[loss=0.0172, audio_tagging_loss=0.0172, over 24750.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4944377.67 frames. ], batch size: 99, lr: 7.72e-03, grad_scale: 32.0 2023-12-22 06:05:25,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=438386.6666666667, ans=0.0 2023-12-22 06:05:26,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=438386.6666666667, ans=0.1 2023-12-22 06:05:32,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=438453.3333333333, ans=0.1 2023-12-22 06:05:39,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=438453.3333333333, ans=0.1 2023-12-22 06:05:39,776 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2023-12-22 06:06:14,379 INFO [train.py:886] (1/4) Epoch 14, batch 3850, loss[loss=0.01463, audio_tagging_loss=0.01463, over 22629.00 frames. ], tot_loss[loss=0.01441, audio_tagging_loss=0.01441, over 4941863.80 frames. ], batch size: 107, lr: 7.72e-03, grad_scale: 32.0 2023-12-22 06:06:25,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=438786.6666666667, ans=0.1 2023-12-22 06:06:33,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=438786.6666666667, ans=0.2 2023-12-22 06:06:38,160 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.442e+01 2.739e+01 2.908e+01 3.099e+01 3.536e+01, threshold=5.817e+01, percent-clipped=0.0 2023-12-22 06:06:43,542 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.51 vs. limit=10.0 2023-12-22 06:07:06,026 INFO [train.py:886] (1/4) Epoch 14, batch 3900, loss[loss=0.0152, audio_tagging_loss=0.0152, over 25000.00 frames. ], tot_loss[loss=0.01434, audio_tagging_loss=0.01434, over 4943768.17 frames. ], batch size: 100, lr: 7.72e-03, grad_scale: 32.0 2023-12-22 06:07:12,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=439053.3333333333, ans=0.2 2023-12-22 06:07:34,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=439186.6666666667, ans=0.1 2023-12-22 06:07:37,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.50 vs. limit=12.0 2023-12-22 06:07:57,865 INFO [train.py:886] (1/4) Epoch 14, batch 3950, loss[loss=0.0116, audio_tagging_loss=0.0116, over 25000.00 frames. ], tot_loss[loss=0.01437, audio_tagging_loss=0.01437, over 4949778.69 frames. ], batch size: 100, lr: 7.72e-03, grad_scale: 32.0 2023-12-22 06:08:01,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=439386.6666666667, ans=0.1 2023-12-22 06:08:09,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=439453.3333333333, ans=0.0 2023-12-22 06:08:17,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.52 vs. limit=15.0 2023-12-22 06:08:19,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=439520.0, ans=0.1 2023-12-22 06:08:22,151 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.407e+01 2.684e+01 2.825e+01 2.982e+01 3.429e+01, threshold=5.651e+01, percent-clipped=0.0 2023-12-22 06:08:31,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=439586.6666666667, ans=0.125 2023-12-22 06:08:39,930 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.54 vs. limit=22.5 2023-12-22 06:08:40,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=439653.3333333333, ans=0.0 2023-12-22 06:08:45,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=439653.3333333333, ans=0.125 2023-12-22 06:08:45,965 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=8.266e-02 2023-12-22 06:08:50,463 INFO [train.py:886] (1/4) Epoch 14, batch 4000, loss[loss=0.01495, audio_tagging_loss=0.01495, over 25000.00 frames. ], tot_loss[loss=0.01438, audio_tagging_loss=0.01438, over 4947957.93 frames. ], batch size: 100, lr: 7.71e-03, grad_scale: 32.0 2023-12-22 06:08:51,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=439720.0, ans=0.0 2023-12-22 06:09:12,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=439853.3333333333, ans=0.2 2023-12-22 06:09:29,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=439920.0, ans=0.1 2023-12-22 06:09:42,997 INFO [train.py:886] (1/4) Epoch 14, batch 4050, loss[loss=0.01497, audio_tagging_loss=0.01497, over 24750.00 frames. ], tot_loss[loss=0.01443, audio_tagging_loss=0.01443, over 4951980.81 frames. ], batch size: 99, lr: 7.71e-03, grad_scale: 32.0 2023-12-22 06:09:46,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=440053.3333333333, ans=0.125 2023-12-22 06:09:55,768 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.27 vs. limit=22.5 2023-12-22 06:09:57,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=440120.0, ans=0.1 2023-12-22 06:10:06,328 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.404e+01 2.790e+01 2.955e+01 3.071e+01 3.568e+01, threshold=5.910e+01, percent-clipped=0.0 2023-12-22 06:10:11,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=440186.6666666667, ans=0.2 2023-12-22 06:10:12,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=440253.3333333333, ans=0.125 2023-12-22 06:10:17,622 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.62 vs. limit=15.0 2023-12-22 06:10:27,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.35 vs. limit=15.0 2023-12-22 06:10:31,182 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.93 vs. limit=15.0 2023-12-22 06:10:33,724 INFO [train.py:886] (1/4) Epoch 14, batch 4100, loss[loss=0.01565, audio_tagging_loss=0.01565, over 24750.00 frames. ], tot_loss[loss=0.0145, audio_tagging_loss=0.0145, over 4949782.46 frames. ], batch size: 99, lr: 7.71e-03, grad_scale: 32.0 2023-12-22 06:10:37,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=440386.6666666667, ans=0.0 2023-12-22 06:10:47,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=440453.3333333333, ans=0.0 2023-12-22 06:11:11,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=440586.6666666667, ans=0.2 2023-12-22 06:11:26,738 INFO [train.py:886] (1/4) Epoch 14, batch 4150, loss[loss=0.01402, audio_tagging_loss=0.01402, over 24750.00 frames. ], tot_loss[loss=0.01449, audio_tagging_loss=0.01449, over 4945533.51 frames. ], batch size: 99, lr: 7.70e-03, grad_scale: 32.0 2023-12-22 06:11:26,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=440720.0, ans=0.125 2023-12-22 06:11:31,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=440720.0, ans=0.0 2023-12-22 06:11:50,566 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.396e+01 2.748e+01 2.880e+01 2.984e+01 4.734e+01, threshold=5.760e+01, percent-clipped=0.0 2023-12-22 06:12:02,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=440920.0, ans=0.125 2023-12-22 06:12:04,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=440920.0, ans=0.0 2023-12-22 06:12:07,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=440986.6666666667, ans=0.0 2023-12-22 06:12:17,772 INFO [train.py:886] (1/4) Epoch 14, batch 4200, loss[loss=0.01728, audio_tagging_loss=0.01728, over 25000.00 frames. ], tot_loss[loss=0.01436, audio_tagging_loss=0.01436, over 4953244.21 frames. ], batch size: 100, lr: 7.70e-03, grad_scale: 32.0 2023-12-22 06:12:21,048 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.84 vs. limit=15.0 2023-12-22 06:12:23,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=441053.3333333333, ans=0.05 2023-12-22 06:12:29,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=441120.0, ans=0.0 2023-12-22 06:12:29,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.56 vs. limit=22.5 2023-12-22 06:12:35,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=441120.0, ans=0.1 2023-12-22 06:12:37,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=441186.6666666667, ans=0.0 2023-12-22 06:12:45,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=441186.6666666667, ans=0.125 2023-12-22 06:12:53,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=441253.3333333333, ans=0.0 2023-12-22 06:13:04,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=441320.0, ans=0.125 2023-12-22 06:13:04,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=441320.0, ans=0.0 2023-12-22 06:13:06,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=441320.0, ans=0.125 2023-12-22 06:13:10,164 INFO [train.py:886] (1/4) Epoch 14, batch 4250, loss[loss=0.01451, audio_tagging_loss=0.01451, over 24750.00 frames. ], tot_loss[loss=0.01429, audio_tagging_loss=0.01429, over 4957567.79 frames. ], batch size: 99, lr: 7.70e-03, grad_scale: 32.0 2023-12-22 06:13:10,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=441386.6666666667, ans=0.1 2023-12-22 06:13:11,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=441386.6666666667, ans=0.0 2023-12-22 06:13:24,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=441453.3333333333, ans=0.125 2023-12-22 06:13:35,109 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.389e+01 2.735e+01 2.849e+01 2.986e+01 3.517e+01, threshold=5.698e+01, percent-clipped=0.0 2023-12-22 06:13:55,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=441653.3333333333, ans=0.125 2023-12-22 06:13:59,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=441653.3333333333, ans=0.125 2023-12-22 06:14:02,685 INFO [train.py:886] (1/4) Epoch 14, batch 4300, loss[loss=0.01467, audio_tagging_loss=0.01467, over 25000.00 frames. ], tot_loss[loss=0.01437, audio_tagging_loss=0.01437, over 4960094.43 frames. ], batch size: 100, lr: 7.69e-03, grad_scale: 32.0 2023-12-22 06:14:13,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=441786.6666666667, ans=0.2 2023-12-22 06:14:23,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=441853.3333333333, ans=0.2 2023-12-22 06:14:27,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=441853.3333333333, ans=0.125 2023-12-22 06:14:45,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=441986.6666666667, ans=0.1 2023-12-22 06:14:46,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=441986.6666666667, ans=0.125 2023-12-22 06:14:48,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=441986.6666666667, ans=0.04949747468305833 2023-12-22 06:14:53,350 INFO [train.py:886] (1/4) Epoch 14, batch 4350, loss[loss=0.01525, audio_tagging_loss=0.01525, over 24750.00 frames. ], tot_loss[loss=0.01447, audio_tagging_loss=0.01447, over 4954767.42 frames. ], batch size: 99, lr: 7.69e-03, grad_scale: 32.0 2023-12-22 06:15:15,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=442186.6666666667, ans=0.0 2023-12-22 06:15:17,198 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.541e+01 2.829e+01 2.968e+01 3.125e+01 3.795e+01, threshold=5.936e+01, percent-clipped=0.0 2023-12-22 06:15:42,413 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.04 vs. limit=15.0 2023-12-22 06:15:44,762 INFO [train.py:886] (1/4) Epoch 14, batch 4400, loss[loss=0.01723, audio_tagging_loss=0.01723, over 25000.00 frames. ], tot_loss[loss=0.01456, audio_tagging_loss=0.01456, over 4952105.99 frames. ], batch size: 100, lr: 7.69e-03, grad_scale: 32.0 2023-12-22 06:15:45,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=442386.6666666667, ans=0.0 2023-12-22 06:15:57,525 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.19 vs. limit=15.0 2023-12-22 06:16:02,044 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.49 vs. limit=15.0 2023-12-22 06:16:10,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=442520.0, ans=0.125 2023-12-22 06:16:12,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=442520.0, ans=0.125 2023-12-22 06:16:20,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=442586.6666666667, ans=0.2 2023-12-22 06:16:25,654 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.29 vs. limit=15.0 2023-12-22 06:16:30,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=442653.3333333333, ans=0.1 2023-12-22 06:16:35,475 INFO [train.py:886] (1/4) Epoch 14, batch 4450, loss[loss=0.01506, audio_tagging_loss=0.01506, over 23952.00 frames. ], tot_loss[loss=0.01457, audio_tagging_loss=0.01457, over 4943152.74 frames. ], batch size: 100, lr: 7.69e-03, grad_scale: 32.0 2023-12-22 06:16:42,925 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.199e-03 2023-12-22 06:16:46,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=442786.6666666667, ans=0.125 2023-12-22 06:16:59,496 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.489e+01 2.711e+01 2.908e+01 3.101e+01 3.699e+01, threshold=5.817e+01, percent-clipped=0.0 2023-12-22 06:17:01,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=442853.3333333333, ans=0.0 2023-12-22 06:17:20,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=442986.6666666667, ans=0.1 2023-12-22 06:17:27,854 INFO [train.py:886] (1/4) Epoch 14, batch 4500, loss[loss=0.01276, audio_tagging_loss=0.01276, over 24750.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4939280.85 frames. ], batch size: 99, lr: 7.68e-03, grad_scale: 32.0 2023-12-22 06:17:52,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=443186.6666666667, ans=0.2 2023-12-22 06:18:03,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=443253.3333333333, ans=0.125 2023-12-22 06:18:20,095 INFO [train.py:886] (1/4) Epoch 14, batch 4550, loss[loss=0.009976, audio_tagging_loss=0.009976, over 22337.00 frames. ], tot_loss[loss=0.01441, audio_tagging_loss=0.01441, over 4934650.83 frames. ], batch size: 107, lr: 7.68e-03, grad_scale: 32.0 2023-12-22 06:18:22,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=443386.6666666667, ans=0.2 2023-12-22 06:18:29,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=443453.3333333333, ans=0.125 2023-12-22 06:18:32,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=443453.3333333333, ans=0.125 2023-12-22 06:18:43,104 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.341e+01 2.709e+01 2.871e+01 3.035e+01 3.525e+01, threshold=5.741e+01, percent-clipped=0.0 2023-12-22 06:18:45,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=443520.0, ans=0.0 2023-12-22 06:18:56,176 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.38 vs. limit=22.5 2023-12-22 06:19:03,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=443653.3333333333, ans=0.1 2023-12-22 06:19:11,007 INFO [train.py:886] (1/4) Epoch 14, batch 4600, loss[loss=0.01742, audio_tagging_loss=0.01742, over 24903.00 frames. ], tot_loss[loss=0.01439, audio_tagging_loss=0.01439, over 4932581.91 frames. ], batch size: 100, lr: 7.68e-03, grad_scale: 32.0 2023-12-22 06:19:23,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=443786.6666666667, ans=0.125 2023-12-22 06:19:36,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=443853.3333333333, ans=0.1 2023-12-22 06:19:49,492 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.76 vs. limit=15.0 2023-12-22 06:20:03,996 INFO [train.py:886] (1/4) Epoch 14, batch 4650, loss[loss=0.01448, audio_tagging_loss=0.01448, over 25000.00 frames. ], tot_loss[loss=0.01443, audio_tagging_loss=0.01443, over 4936609.55 frames. ], batch size: 100, lr: 7.67e-03, grad_scale: 32.0 2023-12-22 06:20:20,411 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.19 vs. limit=15.0 2023-12-22 06:20:27,282 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.504e+01 2.730e+01 2.882e+01 3.030e+01 3.512e+01, threshold=5.764e+01, percent-clipped=0.0 2023-12-22 06:20:29,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=444186.6666666667, ans=0.0 2023-12-22 06:20:53,824 INFO [train.py:886] (1/4) Epoch 14, batch 4700, loss[loss=0.01323, audio_tagging_loss=0.01323, over 25000.00 frames. ], tot_loss[loss=0.0145, audio_tagging_loss=0.0145, over 4935529.50 frames. ], batch size: 100, lr: 7.67e-03, grad_scale: 32.0 2023-12-22 06:21:04,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=444453.3333333333, ans=0.0 2023-12-22 06:21:04,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=444453.3333333333, ans=0.125 2023-12-22 06:21:12,705 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.02 vs. limit=10.0 2023-12-22 06:21:15,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=444520.0, ans=0.1 2023-12-22 06:21:41,572 INFO [train.py:886] (1/4) Epoch 14, batch 4750, loss[loss=0.01485, audio_tagging_loss=0.01485, over 24750.00 frames. ], tot_loss[loss=0.0146, audio_tagging_loss=0.0146, over 4935757.63 frames. ], batch size: 99, lr: 7.67e-03, grad_scale: 32.0 2023-12-22 06:21:44,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=444720.0, ans=0.1 2023-12-22 06:22:18,552 INFO [train.py:886] (1/4) Epoch 15, batch 0, loss[loss=0.0314, audio_tagging_loss=0.0314, over 25000.00 frames. ], tot_loss[loss=0.0314, audio_tagging_loss=0.0314, over 25000.00 frames. ], batch size: 100, lr: 7.41e-03, grad_scale: 32.0 2023-12-22 06:22:18,552 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 06:22:40,002 INFO [train.py:917] (1/4) Epoch 15, validation: loss=0.03275, audio_tagging_loss=0.03275, over 3737520.00 frames. 2023-12-22 06:22:40,003 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 06:22:40,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=444826.6666666667, ans=0.125 2023-12-22 06:22:46,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=444826.6666666667, ans=0.125 2023-12-22 06:22:47,358 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.394e+01 2.805e+01 2.969e+01 3.103e+01 9.102e+01, threshold=5.939e+01, percent-clipped=6.0 2023-12-22 06:23:00,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=444960.0, ans=0.07 2023-12-22 06:23:24,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=445093.3333333333, ans=0.0 2023-12-22 06:23:29,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=445093.3333333333, ans=0.07 2023-12-22 06:23:31,624 INFO [train.py:886] (1/4) Epoch 15, batch 50, loss[loss=0.0187, audio_tagging_loss=0.0187, over 25000.00 frames. ], tot_loss[loss=0.02285, audio_tagging_loss=0.02285, over 1113619.48 frames. ], batch size: 100, lr: 7.40e-03, grad_scale: 32.0 2023-12-22 06:23:53,981 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=15.0 2023-12-22 06:23:55,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=445293.3333333333, ans=0.0 2023-12-22 06:24:11,012 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.35 vs. limit=15.0 2023-12-22 06:24:16,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=445426.6666666667, ans=0.0 2023-12-22 06:24:17,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=445426.6666666667, ans=0.0 2023-12-22 06:24:23,181 INFO [train.py:886] (1/4) Epoch 15, batch 100, loss[loss=0.01521, audio_tagging_loss=0.01521, over 25000.00 frames. ], tot_loss[loss=0.01965, audio_tagging_loss=0.01965, over 1972283.29 frames. ], batch size: 100, lr: 7.40e-03, grad_scale: 32.0 2023-12-22 06:24:27,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=445493.3333333333, ans=0.125 2023-12-22 06:24:30,485 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.756e+01 3.116e+01 3.356e+01 3.817e+01 5.461e+01, threshold=6.711e+01, percent-clipped=0.0 2023-12-22 06:24:44,112 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.43 vs. limit=12.0 2023-12-22 06:24:44,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=445626.6666666667, ans=0.0 2023-12-22 06:24:44,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=445626.6666666667, ans=0.125 2023-12-22 06:24:47,424 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.35 vs. limit=12.0 2023-12-22 06:25:14,550 INFO [train.py:886] (1/4) Epoch 15, batch 150, loss[loss=0.01425, audio_tagging_loss=0.01425, over 25000.00 frames. ], tot_loss[loss=0.01783, audio_tagging_loss=0.01783, over 2642774.14 frames. ], batch size: 100, lr: 7.40e-03, grad_scale: 32.0 2023-12-22 06:25:30,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=445893.3333333333, ans=0.125 2023-12-22 06:25:44,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=445960.0, ans=0.125 2023-12-22 06:25:52,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=446026.6666666667, ans=0.125 2023-12-22 06:26:06,068 INFO [train.py:886] (1/4) Epoch 15, batch 200, loss[loss=0.01178, audio_tagging_loss=0.01178, over 25000.00 frames. ], tot_loss[loss=0.01678, audio_tagging_loss=0.01678, over 3158296.42 frames. ], batch size: 100, lr: 7.40e-03, grad_scale: 32.0 2023-12-22 06:26:13,376 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.586e+01 2.814e+01 2.983e+01 3.104e+01 3.592e+01, threshold=5.965e+01, percent-clipped=0.0 2023-12-22 06:26:24,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=446226.6666666667, ans=0.0 2023-12-22 06:26:53,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=446426.6666666667, ans=0.125 2023-12-22 06:26:56,930 INFO [train.py:886] (1/4) Epoch 15, batch 250, loss[loss=0.01476, audio_tagging_loss=0.01476, over 24070.00 frames. ], tot_loss[loss=0.01606, audio_tagging_loss=0.01606, over 3562063.91 frames. ], batch size: 100, lr: 7.39e-03, grad_scale: 32.0 2023-12-22 06:27:27,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=446693.3333333333, ans=0.0 2023-12-22 06:27:38,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=446760.0, ans=0.0 2023-12-22 06:27:43,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=446760.0, ans=0.2 2023-12-22 06:27:43,767 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.21 vs. limit=15.0 2023-12-22 06:27:50,304 INFO [train.py:886] (1/4) Epoch 15, batch 300, loss[loss=0.0152, audio_tagging_loss=0.0152, over 24750.00 frames. ], tot_loss[loss=0.01568, audio_tagging_loss=0.01568, over 3867814.89 frames. ], batch size: 99, lr: 7.39e-03, grad_scale: 32.0 2023-12-22 06:27:57,055 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.460e+01 2.728e+01 2.846e+01 3.000e+01 3.484e+01, threshold=5.691e+01, percent-clipped=0.0 2023-12-22 06:28:05,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=446893.3333333333, ans=0.1 2023-12-22 06:28:15,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=446960.0, ans=0.0 2023-12-22 06:28:42,187 INFO [train.py:886] (1/4) Epoch 15, batch 350, loss[loss=0.01312, audio_tagging_loss=0.01312, over 24750.00 frames. ], tot_loss[loss=0.01541, audio_tagging_loss=0.01541, over 4097596.31 frames. ], batch size: 99, lr: 7.39e-03, grad_scale: 32.0 2023-12-22 06:28:43,524 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2023-12-22 06:28:51,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=447160.0, ans=0.125 2023-12-22 06:29:00,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=447226.6666666667, ans=0.125 2023-12-22 06:29:27,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=447426.6666666667, ans=0.125 2023-12-22 06:29:29,796 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.29 vs. limit=15.0 2023-12-22 06:29:34,132 INFO [train.py:886] (1/4) Epoch 15, batch 400, loss[loss=0.01356, audio_tagging_loss=0.01356, over 22575.00 frames. ], tot_loss[loss=0.01502, audio_tagging_loss=0.01502, over 4282122.46 frames. ], batch size: 107, lr: 7.38e-03, grad_scale: 32.0 2023-12-22 06:29:36,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=447493.3333333333, ans=0.125 2023-12-22 06:29:38,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=447493.3333333333, ans=0.0 2023-12-22 06:29:41,449 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.371e+01 2.769e+01 2.874e+01 3.024e+01 3.342e+01, threshold=5.748e+01, percent-clipped=0.0 2023-12-22 06:29:56,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=447626.6666666667, ans=0.2 2023-12-22 06:30:04,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=447693.3333333333, ans=0.0 2023-12-22 06:30:11,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=447693.3333333333, ans=0.2 2023-12-22 06:30:24,311 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.14 vs. limit=10.0 2023-12-22 06:30:26,479 INFO [train.py:886] (1/4) Epoch 15, batch 450, loss[loss=0.0141, audio_tagging_loss=0.0141, over 25000.00 frames. ], tot_loss[loss=0.01483, audio_tagging_loss=0.01483, over 4433658.61 frames. ], batch size: 100, lr: 7.38e-03, grad_scale: 32.0 2023-12-22 06:30:29,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=447826.6666666667, ans=0.04949747468305833 2023-12-22 06:30:34,020 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.86 vs. limit=6.0 2023-12-22 06:30:37,571 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.81 vs. limit=6.0 2023-12-22 06:30:53,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=447960.0, ans=0.125 2023-12-22 06:31:01,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=448026.6666666667, ans=0.0 2023-12-22 06:31:06,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=448026.6666666667, ans=0.0 2023-12-22 06:31:18,206 INFO [train.py:886] (1/4) Epoch 15, batch 500, loss[loss=0.01339, audio_tagging_loss=0.01339, over 23970.00 frames. ], tot_loss[loss=0.01467, audio_tagging_loss=0.01467, over 4551182.74 frames. ], batch size: 100, lr: 7.38e-03, grad_scale: 32.0 2023-12-22 06:31:23,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=448160.0, ans=0.0 2023-12-22 06:31:25,362 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.378e+01 2.715e+01 2.862e+01 2.998e+01 3.574e+01, threshold=5.724e+01, percent-clipped=0.0 2023-12-22 06:31:42,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=448293.3333333333, ans=0.0 2023-12-22 06:32:08,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=448426.6666666667, ans=0.2 2023-12-22 06:32:10,779 INFO [train.py:886] (1/4) Epoch 15, batch 550, loss[loss=0.01327, audio_tagging_loss=0.01327, over 25000.00 frames. ], tot_loss[loss=0.01459, audio_tagging_loss=0.01459, over 4645648.02 frames. ], batch size: 100, lr: 7.38e-03, grad_scale: 32.0 2023-12-22 06:32:12,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=448493.3333333333, ans=0.1 2023-12-22 06:32:13,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=448493.3333333333, ans=0.1 2023-12-22 06:32:19,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=448560.0, ans=0.1 2023-12-22 06:32:26,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=448560.0, ans=0.125 2023-12-22 06:32:29,520 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.364e-02 2023-12-22 06:32:34,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=448626.6666666667, ans=0.125 2023-12-22 06:32:36,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=448626.6666666667, ans=0.0 2023-12-22 06:32:38,196 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.03 vs. limit=10.0 2023-12-22 06:32:40,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=448693.3333333333, ans=0.1 2023-12-22 06:32:48,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=448693.3333333333, ans=0.0 2023-12-22 06:32:53,583 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=6.261e-02 2023-12-22 06:33:02,507 INFO [train.py:886] (1/4) Epoch 15, batch 600, loss[loss=0.01348, audio_tagging_loss=0.01348, over 24750.00 frames. ], tot_loss[loss=0.01462, audio_tagging_loss=0.01462, over 4711619.92 frames. ], batch size: 99, lr: 7.37e-03, grad_scale: 32.0 2023-12-22 06:33:09,765 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.489e+01 2.782e+01 2.890e+01 3.094e+01 3.722e+01, threshold=5.781e+01, percent-clipped=0.0 2023-12-22 06:33:12,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=448893.3333333333, ans=0.125 2023-12-22 06:33:18,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=448893.3333333333, ans=0.125 2023-12-22 06:33:44,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=449093.3333333333, ans=0.1 2023-12-22 06:33:54,135 INFO [train.py:886] (1/4) Epoch 15, batch 650, loss[loss=0.01629, audio_tagging_loss=0.01629, over 21933.00 frames. ], tot_loss[loss=0.0147, audio_tagging_loss=0.0147, over 4753782.36 frames. ], batch size: 107, lr: 7.37e-03, grad_scale: 32.0 2023-12-22 06:33:55,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=449160.0, ans=0.125 2023-12-22 06:33:57,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=449160.0, ans=0.125 2023-12-22 06:34:00,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=449160.0, ans=0.1 2023-12-22 06:34:41,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=449426.6666666667, ans=0.2 2023-12-22 06:34:46,667 INFO [train.py:886] (1/4) Epoch 15, batch 700, loss[loss=0.01277, audio_tagging_loss=0.01277, over 24750.00 frames. ], tot_loss[loss=0.0146, audio_tagging_loss=0.0146, over 4793658.83 frames. ], batch size: 99, lr: 7.37e-03, grad_scale: 32.0 2023-12-22 06:34:53,987 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.430e+01 2.765e+01 2.933e+01 3.097e+01 3.905e+01, threshold=5.867e+01, percent-clipped=0.0 2023-12-22 06:34:57,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.95 vs. limit=15.0 2023-12-22 06:35:01,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=449560.0, ans=10.0 2023-12-22 06:35:03,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=449560.0, ans=0.2 2023-12-22 06:35:03,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=449560.0, ans=0.2 2023-12-22 06:35:13,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=449626.6666666667, ans=0.125 2023-12-22 06:35:22,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=449693.3333333333, ans=0.0 2023-12-22 06:35:38,029 INFO [train.py:886] (1/4) Epoch 15, batch 750, loss[loss=0.01425, audio_tagging_loss=0.01425, over 25000.00 frames. ], tot_loss[loss=0.0145, audio_tagging_loss=0.0145, over 4833685.56 frames. ], batch size: 100, lr: 7.37e-03, grad_scale: 64.0 2023-12-22 06:35:38,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=449826.6666666667, ans=0.125 2023-12-22 06:35:38,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=449826.6666666667, ans=0.125 2023-12-22 06:35:43,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=449826.6666666667, ans=0.125 2023-12-22 06:35:50,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=449893.3333333333, ans=0.5 2023-12-22 06:35:52,481 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.61 vs. limit=6.0 2023-12-22 06:35:54,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=449893.3333333333, ans=0.125 2023-12-22 06:35:56,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=449893.3333333333, ans=0.125 2023-12-22 06:36:00,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=449960.0, ans=0.125 2023-12-22 06:36:06,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=449960.0, ans=0.125 2023-12-22 06:36:16,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=450026.6666666667, ans=0.1 2023-12-22 06:36:21,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=450093.3333333333, ans=0.0 2023-12-22 06:36:23,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=450093.3333333333, ans=0.0 2023-12-22 06:36:29,746 INFO [train.py:886] (1/4) Epoch 15, batch 800, loss[loss=0.01491, audio_tagging_loss=0.01491, over 24750.00 frames. ], tot_loss[loss=0.01438, audio_tagging_loss=0.01438, over 4857857.36 frames. ], batch size: 99, lr: 7.36e-03, grad_scale: 64.0 2023-12-22 06:36:35,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=450160.0, ans=0.125 2023-12-22 06:36:36,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=450160.0, ans=0.0 2023-12-22 06:36:37,138 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.442e+01 2.693e+01 2.864e+01 3.009e+01 3.417e+01, threshold=5.729e+01, percent-clipped=0.0 2023-12-22 06:36:39,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.54 vs. limit=15.0 2023-12-22 06:36:39,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=450226.6666666667, ans=0.95 2023-12-22 06:36:50,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=450293.3333333333, ans=0.2 2023-12-22 06:36:53,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=450293.3333333333, ans=0.125 2023-12-22 06:37:10,649 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.21 vs. limit=15.0 2023-12-22 06:37:17,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=450426.6666666667, ans=0.2 2023-12-22 06:37:22,192 INFO [train.py:886] (1/4) Epoch 15, batch 850, loss[loss=0.01273, audio_tagging_loss=0.01273, over 25000.00 frames. ], tot_loss[loss=0.0143, audio_tagging_loss=0.0143, over 4883117.04 frames. ], batch size: 100, lr: 7.36e-03, grad_scale: 64.0 2023-12-22 06:37:36,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=450560.0, ans=10.0 2023-12-22 06:37:47,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=450626.6666666667, ans=0.125 2023-12-22 06:38:14,322 INFO [train.py:886] (1/4) Epoch 15, batch 900, loss[loss=0.01385, audio_tagging_loss=0.01385, over 25000.00 frames. ], tot_loss[loss=0.01437, audio_tagging_loss=0.01437, over 4897431.31 frames. ], batch size: 100, lr: 7.36e-03, grad_scale: 64.0 2023-12-22 06:38:19,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=450826.6666666667, ans=0.1 2023-12-22 06:38:21,712 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.497e+01 2.789e+01 2.921e+01 3.060e+01 3.433e+01, threshold=5.842e+01, percent-clipped=0.0 2023-12-22 06:38:21,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=450826.6666666667, ans=0.2 2023-12-22 06:38:27,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=450893.3333333333, ans=0.0 2023-12-22 06:38:27,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=450893.3333333333, ans=0.125 2023-12-22 06:38:38,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=450960.0, ans=0.1 2023-12-22 06:38:41,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=450960.0, ans=0.05 2023-12-22 06:39:06,216 INFO [train.py:886] (1/4) Epoch 15, batch 950, loss[loss=0.01093, audio_tagging_loss=0.01093, over 24058.00 frames. ], tot_loss[loss=0.01437, audio_tagging_loss=0.01437, over 4898749.69 frames. ], batch size: 100, lr: 7.36e-03, grad_scale: 64.0 2023-12-22 06:39:07,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=451160.0, ans=0.0 2023-12-22 06:39:27,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=451293.3333333333, ans=0.0 2023-12-22 06:39:57,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=451426.6666666667, ans=0.09899494936611666 2023-12-22 06:39:58,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=451493.3333333333, ans=0.95 2023-12-22 06:39:58,845 INFO [train.py:886] (1/4) Epoch 15, batch 1000, loss[loss=0.01588, audio_tagging_loss=0.01588, over 25000.00 frames. ], tot_loss[loss=0.01436, audio_tagging_loss=0.01436, over 4906120.63 frames. ], batch size: 100, lr: 7.35e-03, grad_scale: 64.0 2023-12-22 06:39:59,391 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.80 vs. limit=22.5 2023-12-22 06:40:06,055 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.141e+01 2.747e+01 2.873e+01 3.000e+01 3.786e+01, threshold=5.747e+01, percent-clipped=0.0 2023-12-22 06:40:12,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=451560.0, ans=0.2 2023-12-22 06:40:49,175 INFO [train.py:886] (1/4) Epoch 15, batch 1050, loss[loss=0.0154, audio_tagging_loss=0.0154, over 25000.00 frames. ], tot_loss[loss=0.01422, audio_tagging_loss=0.01422, over 4916704.72 frames. ], batch size: 100, lr: 7.35e-03, grad_scale: 64.0 2023-12-22 06:40:49,438 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 06:41:00,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=451893.3333333333, ans=0.0 2023-12-22 06:41:14,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=451960.0, ans=0.125 2023-12-22 06:41:17,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=451960.0, ans=0.125 2023-12-22 06:41:25,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=452026.6666666667, ans=0.125 2023-12-22 06:41:30,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=452093.3333333333, ans=0.0 2023-12-22 06:41:36,742 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.67 vs. limit=22.5 2023-12-22 06:41:42,796 INFO [train.py:886] (1/4) Epoch 15, batch 1100, loss[loss=0.01046, audio_tagging_loss=0.01046, over 25000.00 frames. ], tot_loss[loss=0.01414, audio_tagging_loss=0.01414, over 4927659.79 frames. ], batch size: 100, lr: 7.35e-03, grad_scale: 64.0 2023-12-22 06:41:45,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=452160.0, ans=0.0 2023-12-22 06:41:49,463 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.467e+01 2.730e+01 2.835e+01 3.021e+01 3.607e+01, threshold=5.671e+01, percent-clipped=0.0 2023-12-22 06:41:50,003 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.69 vs. limit=15.0 2023-12-22 06:41:50,914 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.06 vs. limit=15.0 2023-12-22 06:42:13,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=452360.0, ans=0.05 2023-12-22 06:42:34,430 INFO [train.py:886] (1/4) Epoch 15, batch 1150, loss[loss=0.01773, audio_tagging_loss=0.01773, over 25000.00 frames. ], tot_loss[loss=0.01422, audio_tagging_loss=0.01422, over 4940338.69 frames. ], batch size: 100, lr: 7.34e-03, grad_scale: 64.0 2023-12-22 06:42:37,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=452493.3333333333, ans=0.0 2023-12-22 06:42:40,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=452493.3333333333, ans=0.5 2023-12-22 06:42:55,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=452626.6666666667, ans=0.0 2023-12-22 06:42:56,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=452626.6666666667, ans=0.0 2023-12-22 06:42:58,082 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.95 vs. limit=6.0 2023-12-22 06:42:58,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=452626.6666666667, ans=0.125 2023-12-22 06:43:03,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=452626.6666666667, ans=0.1 2023-12-22 06:43:12,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=452693.3333333333, ans=0.125 2023-12-22 06:43:26,234 INFO [train.py:886] (1/4) Epoch 15, batch 1200, loss[loss=0.0169, audio_tagging_loss=0.0169, over 24940.00 frames. ], tot_loss[loss=0.01435, audio_tagging_loss=0.01435, over 4943274.87 frames. ], batch size: 100, lr: 7.34e-03, grad_scale: 64.0 2023-12-22 06:43:26,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=452826.6666666667, ans=0.0 2023-12-22 06:43:32,798 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.424e+01 2.691e+01 2.862e+01 3.008e+01 3.701e+01, threshold=5.724e+01, percent-clipped=0.0 2023-12-22 06:43:40,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=452893.3333333333, ans=0.0 2023-12-22 06:43:40,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=452893.3333333333, ans=0.0 2023-12-22 06:43:45,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=452893.3333333333, ans=0.125 2023-12-22 06:44:17,646 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.99 vs. limit=15.0 2023-12-22 06:44:18,058 INFO [train.py:886] (1/4) Epoch 15, batch 1250, loss[loss=0.01472, audio_tagging_loss=0.01472, over 24750.00 frames. ], tot_loss[loss=0.01451, audio_tagging_loss=0.01451, over 4941577.97 frames. ], batch size: 99, lr: 7.34e-03, grad_scale: 64.0 2023-12-22 06:44:27,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=453226.6666666667, ans=0.0 2023-12-22 06:44:39,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=453293.3333333333, ans=0.1 2023-12-22 06:44:40,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=453293.3333333333, ans=0.125 2023-12-22 06:44:43,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=453293.3333333333, ans=0.125 2023-12-22 06:45:04,710 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.236e-03 2023-12-22 06:45:11,722 INFO [train.py:886] (1/4) Epoch 15, batch 1300, loss[loss=0.01448, audio_tagging_loss=0.01448, over 25000.00 frames. ], tot_loss[loss=0.01466, audio_tagging_loss=0.01466, over 4944405.86 frames. ], batch size: 100, lr: 7.34e-03, grad_scale: 64.0 2023-12-22 06:45:15,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=453493.3333333333, ans=0.125 2023-12-22 06:45:19,300 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.402e+01 2.803e+01 2.974e+01 3.116e+01 3.587e+01, threshold=5.949e+01, percent-clipped=0.0 2023-12-22 06:45:32,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=453626.6666666667, ans=0.0 2023-12-22 06:45:53,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=453760.0, ans=0.0 2023-12-22 06:45:54,508 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.26 vs. limit=15.0 2023-12-22 06:46:01,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=453760.0, ans=0.125 2023-12-22 06:46:04,215 INFO [train.py:886] (1/4) Epoch 15, batch 1350, loss[loss=0.01531, audio_tagging_loss=0.01531, over 25000.00 frames. ], tot_loss[loss=0.01452, audio_tagging_loss=0.01452, over 4944853.27 frames. ], batch size: 100, lr: 7.33e-03, grad_scale: 64.0 2023-12-22 06:46:05,593 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.82 vs. limit=6.0 2023-12-22 06:46:23,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=453893.3333333333, ans=0.2 2023-12-22 06:46:24,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=453960.0, ans=0.2 2023-12-22 06:46:36,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=454026.6666666667, ans=0.125 2023-12-22 06:46:47,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=454093.3333333333, ans=0.125 2023-12-22 06:46:56,194 INFO [train.py:886] (1/4) Epoch 15, batch 1400, loss[loss=0.01504, audio_tagging_loss=0.01504, over 24750.00 frames. ], tot_loss[loss=0.01435, audio_tagging_loss=0.01435, over 4949436.29 frames. ], batch size: 99, lr: 7.33e-03, grad_scale: 64.0 2023-12-22 06:47:03,511 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+01 2.761e+01 2.896e+01 3.038e+01 4.038e+01, threshold=5.792e+01, percent-clipped=0.0 2023-12-22 06:47:09,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=454226.6666666667, ans=0.125 2023-12-22 06:47:16,948 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.80 vs. limit=22.5 2023-12-22 06:47:25,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=454360.0, ans=0.125 2023-12-22 06:47:32,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=454360.0, ans=0.1 2023-12-22 06:47:33,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=454360.0, ans=0.125 2023-12-22 06:47:43,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.72 vs. limit=12.0 2023-12-22 06:47:47,641 INFO [train.py:886] (1/4) Epoch 15, batch 1450, loss[loss=0.01655, audio_tagging_loss=0.01655, over 21196.00 frames. ], tot_loss[loss=0.01427, audio_tagging_loss=0.01427, over 4952110.37 frames. ], batch size: 107, lr: 7.33e-03, grad_scale: 64.0 2023-12-22 06:48:13,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=454626.6666666667, ans=0.5 2023-12-22 06:48:16,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=454626.6666666667, ans=0.0 2023-12-22 06:48:18,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=454693.3333333333, ans=0.125 2023-12-22 06:48:18,671 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.53 vs. limit=22.5 2023-12-22 06:48:31,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=454760.0, ans=0.125 2023-12-22 06:48:40,098 INFO [train.py:886] (1/4) Epoch 15, batch 1500, loss[loss=0.01496, audio_tagging_loss=0.01496, over 25000.00 frames. ], tot_loss[loss=0.01429, audio_tagging_loss=0.01429, over 4958940.09 frames. ], batch size: 100, lr: 7.33e-03, grad_scale: 64.0 2023-12-22 06:48:43,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=454826.6666666667, ans=0.07 2023-12-22 06:48:46,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=454826.6666666667, ans=0.0 2023-12-22 06:48:47,430 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+01 2.703e+01 2.865e+01 3.021e+01 3.763e+01, threshold=5.730e+01, percent-clipped=0.0 2023-12-22 06:48:54,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=454893.3333333333, ans=0.0 2023-12-22 06:48:55,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=454893.3333333333, ans=0.125 2023-12-22 06:49:09,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=454960.0, ans=0.2 2023-12-22 06:49:31,992 INFO [train.py:886] (1/4) Epoch 15, batch 1550, loss[loss=0.01539, audio_tagging_loss=0.01539, over 24750.00 frames. ], tot_loss[loss=0.01439, audio_tagging_loss=0.01439, over 4951795.22 frames. ], batch size: 99, lr: 7.32e-03, grad_scale: 64.0 2023-12-22 06:49:57,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=455293.3333333333, ans=0.125 2023-12-22 06:50:06,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=455360.0, ans=0.09899494936611666 2023-12-22 06:50:19,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=455426.6666666667, ans=0.1 2023-12-22 06:50:23,335 INFO [train.py:886] (1/4) Epoch 15, batch 1600, loss[loss=0.01299, audio_tagging_loss=0.01299, over 24750.00 frames. ], tot_loss[loss=0.01444, audio_tagging_loss=0.01444, over 4945335.39 frames. ], batch size: 99, lr: 7.32e-03, grad_scale: 64.0 2023-12-22 06:50:25,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=455493.3333333333, ans=0.0 2023-12-22 06:50:26,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=455493.3333333333, ans=0.95 2023-12-22 06:50:29,907 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.369e+01 2.762e+01 2.937e+01 3.082e+01 3.715e+01, threshold=5.874e+01, percent-clipped=0.0 2023-12-22 06:50:33,989 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.63 vs. limit=15.0 2023-12-22 06:50:56,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=455693.3333333333, ans=0.04949747468305833 2023-12-22 06:51:14,884 INFO [train.py:886] (1/4) Epoch 15, batch 1650, loss[loss=0.01624, audio_tagging_loss=0.01624, over 25000.00 frames. ], tot_loss[loss=0.0144, audio_tagging_loss=0.0144, over 4944830.16 frames. ], batch size: 100, lr: 7.32e-03, grad_scale: 64.0 2023-12-22 06:51:15,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=455826.6666666667, ans=0.125 2023-12-22 06:51:18,102 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=15.0 2023-12-22 06:51:18,847 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 06:51:32,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=455893.3333333333, ans=0.125 2023-12-22 06:51:33,060 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.19 vs. limit=15.0 2023-12-22 06:52:01,421 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.54 vs. limit=15.0 2023-12-22 06:52:06,839 INFO [train.py:886] (1/4) Epoch 15, batch 1700, loss[loss=0.01282, audio_tagging_loss=0.01282, over 25000.00 frames. ], tot_loss[loss=0.01431, audio_tagging_loss=0.01431, over 4946038.40 frames. ], batch size: 100, lr: 7.32e-03, grad_scale: 64.0 2023-12-22 06:52:14,102 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.418e+01 2.733e+01 2.857e+01 3.006e+01 3.981e+01, threshold=5.714e+01, percent-clipped=0.0 2023-12-22 06:52:35,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=456293.3333333333, ans=0.125 2023-12-22 06:52:35,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=456293.3333333333, ans=0.125 2023-12-22 06:52:43,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=456360.0, ans=0.125 2023-12-22 06:52:58,980 INFO [train.py:886] (1/4) Epoch 15, batch 1750, loss[loss=0.01358, audio_tagging_loss=0.01358, over 24750.00 frames. ], tot_loss[loss=0.01425, audio_tagging_loss=0.01425, over 4949915.70 frames. ], batch size: 99, lr: 7.31e-03, grad_scale: 64.0 2023-12-22 06:53:02,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=456493.3333333333, ans=0.1 2023-12-22 06:53:04,871 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.62 vs. limit=6.0 2023-12-22 06:53:07,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=456493.3333333333, ans=0.2 2023-12-22 06:53:08,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=456560.0, ans=0.125 2023-12-22 06:53:10,391 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.82 vs. limit=12.0 2023-12-22 06:53:18,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=456626.6666666667, ans=0.125 2023-12-22 06:53:50,622 INFO [train.py:886] (1/4) Epoch 15, batch 1800, loss[loss=0.01334, audio_tagging_loss=0.01334, over 24750.00 frames. ], tot_loss[loss=0.01435, audio_tagging_loss=0.01435, over 4951787.14 frames. ], batch size: 99, lr: 7.31e-03, grad_scale: 64.0 2023-12-22 06:53:51,138 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.48 vs. limit=15.0 2023-12-22 06:53:57,945 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+01 2.752e+01 2.915e+01 3.059e+01 3.559e+01, threshold=5.830e+01, percent-clipped=0.0 2023-12-22 06:54:07,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=456893.3333333333, ans=0.125 2023-12-22 06:54:07,634 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.10 vs. limit=15.0 2023-12-22 06:54:21,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=457026.6666666667, ans=0.125 2023-12-22 06:54:23,245 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.05 vs. limit=15.0 2023-12-22 06:54:27,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=457026.6666666667, ans=0.1 2023-12-22 06:54:27,683 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.86 vs. limit=22.5 2023-12-22 06:54:30,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=457026.6666666667, ans=0.125 2023-12-22 06:54:39,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=457093.3333333333, ans=0.125 2023-12-22 06:54:39,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=457093.3333333333, ans=10.0 2023-12-22 06:54:40,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=457093.3333333333, ans=0.125 2023-12-22 06:54:42,187 INFO [train.py:886] (1/4) Epoch 15, batch 1850, loss[loss=0.01666, audio_tagging_loss=0.01666, over 24750.00 frames. ], tot_loss[loss=0.01447, audio_tagging_loss=0.01447, over 4949900.56 frames. ], batch size: 99, lr: 7.31e-03, grad_scale: 64.0 2023-12-22 06:55:01,927 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.39 vs. limit=22.5 2023-12-22 06:55:12,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=457293.3333333333, ans=0.125 2023-12-22 06:55:16,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=457360.0, ans=0.125 2023-12-22 06:55:25,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=457426.6666666667, ans=0.2 2023-12-22 06:55:25,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=457426.6666666667, ans=0.125 2023-12-22 06:55:26,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=457426.6666666667, ans=0.125 2023-12-22 06:55:34,481 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.86 vs. limit=6.0 2023-12-22 06:55:34,787 INFO [train.py:886] (1/4) Epoch 15, batch 1900, loss[loss=0.01496, audio_tagging_loss=0.01496, over 24750.00 frames. ], tot_loss[loss=0.01464, audio_tagging_loss=0.01464, over 4942689.59 frames. ], batch size: 99, lr: 7.30e-03, grad_scale: 64.0 2023-12-22 06:55:39,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=457493.3333333333, ans=0.2 2023-12-22 06:55:41,996 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.516e+01 2.819e+01 2.950e+01 3.088e+01 3.539e+01, threshold=5.899e+01, percent-clipped=0.0 2023-12-22 06:55:58,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=457626.6666666667, ans=0.125 2023-12-22 06:56:11,356 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.348e-02 2023-12-22 06:56:14,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=457693.3333333333, ans=0.025 2023-12-22 06:56:24,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=457760.0, ans=0.1 2023-12-22 06:56:26,094 INFO [train.py:886] (1/4) Epoch 15, batch 1950, loss[loss=0.01491, audio_tagging_loss=0.01491, over 25000.00 frames. ], tot_loss[loss=0.01469, audio_tagging_loss=0.01469, over 4941613.38 frames. ], batch size: 100, lr: 7.30e-03, grad_scale: 64.0 2023-12-22 06:56:27,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=457826.6666666667, ans=10.0 2023-12-22 06:56:34,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=457826.6666666667, ans=0.1 2023-12-22 06:56:48,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=457960.0, ans=0.0 2023-12-22 06:56:59,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.72 vs. limit=12.0 2023-12-22 06:57:04,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=458026.6666666667, ans=0.1 2023-12-22 06:57:07,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=458093.3333333333, ans=0.125 2023-12-22 06:57:18,510 INFO [train.py:886] (1/4) Epoch 15, batch 2000, loss[loss=0.01538, audio_tagging_loss=0.01538, over 24750.00 frames. ], tot_loss[loss=0.01451, audio_tagging_loss=0.01451, over 4942641.16 frames. ], batch size: 99, lr: 7.30e-03, grad_scale: 64.0 2023-12-22 06:57:25,043 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.489e+01 2.729e+01 2.880e+01 3.053e+01 3.863e+01, threshold=5.759e+01, percent-clipped=0.0 2023-12-22 06:57:27,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=458226.6666666667, ans=0.09899494936611666 2023-12-22 06:57:33,576 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.55 vs. limit=22.5 2023-12-22 06:57:37,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=458226.6666666667, ans=0.125 2023-12-22 06:57:39,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=458293.3333333333, ans=0.125 2023-12-22 06:57:43,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=458293.3333333333, ans=0.125 2023-12-22 06:57:50,524 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2023-12-22 06:57:55,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=458360.0, ans=0.125 2023-12-22 06:57:58,837 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.83 vs. limit=15.0 2023-12-22 06:58:10,748 INFO [train.py:886] (1/4) Epoch 15, batch 2050, loss[loss=0.01432, audio_tagging_loss=0.01432, over 25000.00 frames. ], tot_loss[loss=0.01441, audio_tagging_loss=0.01441, over 4945594.66 frames. ], batch size: 100, lr: 7.30e-03, grad_scale: 64.0 2023-12-22 06:58:34,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=458626.6666666667, ans=0.125 2023-12-22 06:58:35,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=458626.6666666667, ans=0.0 2023-12-22 06:58:54,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=458760.0, ans=0.1 2023-12-22 06:59:01,469 INFO [train.py:886] (1/4) Epoch 15, batch 2100, loss[loss=0.01326, audio_tagging_loss=0.01326, over 24750.00 frames. ], tot_loss[loss=0.01434, audio_tagging_loss=0.01434, over 4947081.95 frames. ], batch size: 99, lr: 7.29e-03, grad_scale: 64.0 2023-12-22 06:59:09,473 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.454e+01 2.721e+01 2.805e+01 2.996e+01 3.469e+01, threshold=5.611e+01, percent-clipped=0.0 2023-12-22 06:59:11,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=458893.3333333333, ans=0.2 2023-12-22 06:59:16,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=458893.3333333333, ans=0.125 2023-12-22 06:59:17,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=458893.3333333333, ans=0.125 2023-12-22 06:59:34,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.12 vs. limit=6.0 2023-12-22 06:59:53,689 INFO [train.py:886] (1/4) Epoch 15, batch 2150, loss[loss=0.01336, audio_tagging_loss=0.01336, over 25000.00 frames. ], tot_loss[loss=0.0144, audio_tagging_loss=0.0144, over 4949112.72 frames. ], batch size: 100, lr: 7.29e-03, grad_scale: 64.0 2023-12-22 07:00:02,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=459160.0, ans=0.125 2023-12-22 07:00:08,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=459226.6666666667, ans=0.1 2023-12-22 07:00:13,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.15 vs. limit=15.0 2023-12-22 07:00:37,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=459426.6666666667, ans=0.09899494936611666 2023-12-22 07:00:38,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=459426.6666666667, ans=0.09899494936611666 2023-12-22 07:00:43,249 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.53 vs. limit=15.0 2023-12-22 07:00:44,580 INFO [train.py:886] (1/4) Epoch 15, batch 2200, loss[loss=0.01614, audio_tagging_loss=0.01614, over 24750.00 frames. ], tot_loss[loss=0.0145, audio_tagging_loss=0.0145, over 4947676.50 frames. ], batch size: 99, lr: 7.29e-03, grad_scale: 64.0 2023-12-22 07:00:52,731 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.447e+01 2.795e+01 2.951e+01 3.077e+01 3.607e+01, threshold=5.903e+01, percent-clipped=0.0 2023-12-22 07:00:59,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=459560.0, ans=0.125 2023-12-22 07:01:10,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=459626.6666666667, ans=0.125 2023-12-22 07:01:11,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=459626.6666666667, ans=0.125 2023-12-22 07:01:12,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=459626.6666666667, ans=0.2 2023-12-22 07:01:16,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=459693.3333333333, ans=0.0 2023-12-22 07:01:34,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=459760.0, ans=0.125 2023-12-22 07:01:36,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=459826.6666666667, ans=0.125 2023-12-22 07:01:37,225 INFO [train.py:886] (1/4) Epoch 15, batch 2250, loss[loss=0.0156, audio_tagging_loss=0.0156, over 25000.00 frames. ], tot_loss[loss=0.01453, audio_tagging_loss=0.01453, over 4941915.51 frames. ], batch size: 100, lr: 7.29e-03, grad_scale: 64.0 2023-12-22 07:01:48,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=459893.3333333333, ans=0.0 2023-12-22 07:01:58,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=459960.0, ans=0.125 2023-12-22 07:02:03,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=459960.0, ans=0.0 2023-12-22 07:02:11,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=460026.6666666667, ans=0.125 2023-12-22 07:02:13,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=460026.6666666667, ans=0.125 2023-12-22 07:02:14,262 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.23 vs. limit=12.0 2023-12-22 07:02:18,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=460093.3333333333, ans=0.125 2023-12-22 07:02:29,289 INFO [train.py:886] (1/4) Epoch 15, batch 2300, loss[loss=0.01321, audio_tagging_loss=0.01321, over 25000.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4940500.38 frames. ], batch size: 100, lr: 7.28e-03, grad_scale: 64.0 2023-12-22 07:02:31,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=460160.0, ans=0.0 2023-12-22 07:02:36,498 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.419e+01 2.737e+01 2.896e+01 3.072e+01 5.073e+01, threshold=5.791e+01, percent-clipped=0.0 2023-12-22 07:02:44,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=460226.6666666667, ans=0.0 2023-12-22 07:03:12,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=460426.6666666667, ans=0.0 2023-12-22 07:03:20,221 INFO [train.py:886] (1/4) Epoch 15, batch 2350, loss[loss=0.01794, audio_tagging_loss=0.01794, over 24750.00 frames. ], tot_loss[loss=0.01434, audio_tagging_loss=0.01434, over 4947893.06 frames. ], batch size: 99, lr: 7.28e-03, grad_scale: 64.0 2023-12-22 07:03:51,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=460693.3333333333, ans=0.2 2023-12-22 07:04:13,017 INFO [train.py:886] (1/4) Epoch 15, batch 2400, loss[loss=0.01521, audio_tagging_loss=0.01521, over 25000.00 frames. ], tot_loss[loss=0.01428, audio_tagging_loss=0.01428, over 4948332.40 frames. ], batch size: 100, lr: 7.28e-03, grad_scale: 64.0 2023-12-22 07:04:15,193 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-22 07:04:19,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=460826.6666666667, ans=0.125 2023-12-22 07:04:19,754 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.499e+01 2.726e+01 2.841e+01 2.994e+01 3.395e+01, threshold=5.683e+01, percent-clipped=0.0 2023-12-22 07:04:24,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=460893.3333333333, ans=0.125 2023-12-22 07:04:29,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=460893.3333333333, ans=0.0 2023-12-22 07:04:29,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=460893.3333333333, ans=0.2 2023-12-22 07:04:30,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=460893.3333333333, ans=0.125 2023-12-22 07:04:43,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=461026.6666666667, ans=10.0 2023-12-22 07:04:46,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=461026.6666666667, ans=0.125 2023-12-22 07:04:51,534 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.02 vs. limit=15.0 2023-12-22 07:04:53,155 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=6.039e-02 2023-12-22 07:04:53,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=461093.3333333333, ans=0.125 2023-12-22 07:05:04,078 INFO [train.py:886] (1/4) Epoch 15, batch 2450, loss[loss=0.01287, audio_tagging_loss=0.01287, over 25000.00 frames. ], tot_loss[loss=0.01419, audio_tagging_loss=0.01419, over 4952965.86 frames. ], batch size: 100, lr: 7.28e-03, grad_scale: 64.0 2023-12-22 07:05:09,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=461160.0, ans=0.0 2023-12-22 07:05:38,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=461360.0, ans=0.125 2023-12-22 07:05:39,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=461360.0, ans=0.125 2023-12-22 07:05:42,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=461360.0, ans=0.125 2023-12-22 07:05:48,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=461426.6666666667, ans=0.0 2023-12-22 07:05:56,313 INFO [train.py:886] (1/4) Epoch 15, batch 2500, loss[loss=0.01676, audio_tagging_loss=0.01676, over 24750.00 frames. ], tot_loss[loss=0.01431, audio_tagging_loss=0.01431, over 4953585.13 frames. ], batch size: 99, lr: 7.27e-03, grad_scale: 64.0 2023-12-22 07:05:59,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=461493.3333333333, ans=0.125 2023-12-22 07:06:02,977 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.485e+01 2.837e+01 2.934e+01 3.109e+01 3.960e+01, threshold=5.868e+01, percent-clipped=0.0 2023-12-22 07:06:03,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=461493.3333333333, ans=0.125 2023-12-22 07:06:04,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=461493.3333333333, ans=0.125 2023-12-22 07:06:08,047 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.90 vs. limit=12.0 2023-12-22 07:06:14,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=461560.0, ans=0.125 2023-12-22 07:06:15,076 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.272e-03 2023-12-22 07:06:17,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=461626.6666666667, ans=0.125 2023-12-22 07:06:23,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=461626.6666666667, ans=0.125 2023-12-22 07:06:46,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=461760.0, ans=0.125 2023-12-22 07:06:47,954 INFO [train.py:886] (1/4) Epoch 15, batch 2550, loss[loss=0.01481, audio_tagging_loss=0.01481, over 24750.00 frames. ], tot_loss[loss=0.0144, audio_tagging_loss=0.0144, over 4950270.95 frames. ], batch size: 99, lr: 7.27e-03, grad_scale: 64.0 2023-12-22 07:06:50,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=461826.6666666667, ans=0.125 2023-12-22 07:06:53,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=461826.6666666667, ans=10.0 2023-12-22 07:06:55,073 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.48 vs. limit=15.0 2023-12-22 07:07:09,816 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.38 vs. limit=10.0 2023-12-22 07:07:18,335 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.19 vs. limit=15.0 2023-12-22 07:07:39,801 INFO [train.py:886] (1/4) Epoch 15, batch 2600, loss[loss=0.01091, audio_tagging_loss=0.01091, over 25000.00 frames. ], tot_loss[loss=0.01439, audio_tagging_loss=0.01439, over 4949634.61 frames. ], batch size: 100, lr: 7.27e-03, grad_scale: 64.0 2023-12-22 07:07:41,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=462160.0, ans=0.125 2023-12-22 07:07:47,086 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.523e+01 2.786e+01 2.944e+01 3.057e+01 3.830e+01, threshold=5.888e+01, percent-clipped=0.0 2023-12-22 07:07:57,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=462226.6666666667, ans=0.015 2023-12-22 07:08:14,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=462360.0, ans=0.125 2023-12-22 07:08:19,752 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2023-12-22 07:08:23,557 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=15.0 2023-12-22 07:08:29,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=462426.6666666667, ans=0.125 2023-12-22 07:08:31,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=462493.3333333333, ans=0.125 2023-12-22 07:08:32,434 INFO [train.py:886] (1/4) Epoch 15, batch 2650, loss[loss=0.01595, audio_tagging_loss=0.01595, over 25000.00 frames. ], tot_loss[loss=0.01427, audio_tagging_loss=0.01427, over 4951274.21 frames. ], batch size: 100, lr: 7.27e-03, grad_scale: 64.0 2023-12-22 07:08:42,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=462560.0, ans=0.5 2023-12-22 07:08:50,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=462560.0, ans=0.125 2023-12-22 07:08:59,103 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.32 vs. limit=22.5 2023-12-22 07:09:05,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=462693.3333333333, ans=0.125 2023-12-22 07:09:07,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=462693.3333333333, ans=0.0 2023-12-22 07:09:14,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=462760.0, ans=0.015 2023-12-22 07:09:20,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=462760.0, ans=0.0 2023-12-22 07:09:23,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=462826.6666666667, ans=0.125 2023-12-22 07:09:24,939 INFO [train.py:886] (1/4) Epoch 15, batch 2700, loss[loss=0.01402, audio_tagging_loss=0.01402, over 25000.00 frames. ], tot_loss[loss=0.01424, audio_tagging_loss=0.01424, over 4949625.68 frames. ], batch size: 100, lr: 7.26e-03, grad_scale: 64.0 2023-12-22 07:09:32,225 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.416e+01 2.720e+01 2.866e+01 2.980e+01 3.396e+01, threshold=5.733e+01, percent-clipped=0.0 2023-12-22 07:09:37,166 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.74 vs. limit=15.0 2023-12-22 07:09:40,138 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=26.79 vs. limit=15.0 2023-12-22 07:10:05,090 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.85 vs. limit=15.0 2023-12-22 07:10:14,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=463093.3333333333, ans=0.0 2023-12-22 07:10:16,481 INFO [train.py:886] (1/4) Epoch 15, batch 2750, loss[loss=0.01613, audio_tagging_loss=0.01613, over 25000.00 frames. ], tot_loss[loss=0.01421, audio_tagging_loss=0.01421, over 4952567.69 frames. ], batch size: 100, lr: 7.26e-03, grad_scale: 64.0 2023-12-22 07:10:51,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=463360.0, ans=0.09899494936611666 2023-12-22 07:10:52,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=463360.0, ans=15.0 2023-12-22 07:10:59,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=463426.6666666667, ans=0.2 2023-12-22 07:11:03,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=463426.6666666667, ans=0.125 2023-12-22 07:11:09,256 INFO [train.py:886] (1/4) Epoch 15, batch 2800, loss[loss=0.01504, audio_tagging_loss=0.01504, over 24750.00 frames. ], tot_loss[loss=0.01425, audio_tagging_loss=0.01425, over 4951450.66 frames. ], batch size: 99, lr: 7.26e-03, grad_scale: 64.0 2023-12-22 07:11:15,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=463493.3333333333, ans=0.1 2023-12-22 07:11:17,365 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.486e+01 2.810e+01 2.949e+01 3.108e+01 3.472e+01, threshold=5.898e+01, percent-clipped=0.0 2023-12-22 07:11:18,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=463560.0, ans=0.125 2023-12-22 07:11:24,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=463560.0, ans=0.2 2023-12-22 07:11:25,533 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.64 vs. limit=15.0 2023-12-22 07:11:41,992 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.19 vs. limit=15.0 2023-12-22 07:12:00,616 INFO [train.py:886] (1/4) Epoch 15, batch 2850, loss[loss=0.0129, audio_tagging_loss=0.0129, over 21920.00 frames. ], tot_loss[loss=0.01436, audio_tagging_loss=0.01436, over 4945506.38 frames. ], batch size: 107, lr: 7.26e-03, grad_scale: 64.0 2023-12-22 07:12:03,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=463826.6666666667, ans=0.2 2023-12-22 07:12:03,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=463826.6666666667, ans=0.2 2023-12-22 07:12:34,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=464026.6666666667, ans=0.125 2023-12-22 07:12:35,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=464026.6666666667, ans=0.2 2023-12-22 07:12:35,391 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.12 vs. limit=22.5 2023-12-22 07:12:44,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=464093.3333333333, ans=0.1 2023-12-22 07:12:48,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=464093.3333333333, ans=0.125 2023-12-22 07:12:50,305 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.45 vs. limit=12.0 2023-12-22 07:12:52,601 INFO [train.py:886] (1/4) Epoch 15, batch 2900, loss[loss=0.01722, audio_tagging_loss=0.01722, over 25000.00 frames. ], tot_loss[loss=0.01431, audio_tagging_loss=0.01431, over 4945993.50 frames. ], batch size: 100, lr: 7.25e-03, grad_scale: 64.0 2023-12-22 07:13:00,913 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.373e+01 2.777e+01 2.915e+01 3.046e+01 3.469e+01, threshold=5.830e+01, percent-clipped=0.0 2023-12-22 07:13:12,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=464293.3333333333, ans=0.0 2023-12-22 07:13:36,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=464426.6666666667, ans=0.125 2023-12-22 07:13:38,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=464426.6666666667, ans=0.125 2023-12-22 07:13:44,940 INFO [train.py:886] (1/4) Epoch 15, batch 2950, loss[loss=0.01521, audio_tagging_loss=0.01521, over 25000.00 frames. ], tot_loss[loss=0.01423, audio_tagging_loss=0.01423, over 4954269.86 frames. ], batch size: 100, lr: 7.25e-03, grad_scale: 64.0 2023-12-22 07:13:52,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=464493.3333333333, ans=0.1 2023-12-22 07:14:10,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=464626.6666666667, ans=0.0 2023-12-22 07:14:17,977 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.79 vs. limit=10.0 2023-12-22 07:14:26,118 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.11 vs. limit=15.0 2023-12-22 07:14:36,687 INFO [train.py:886] (1/4) Epoch 15, batch 3000, loss[loss=0.01336, audio_tagging_loss=0.01336, over 25000.00 frames. ], tot_loss[loss=0.01423, audio_tagging_loss=0.01423, over 4958580.94 frames. ], batch size: 100, lr: 7.25e-03, grad_scale: 64.0 2023-12-22 07:14:36,688 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 07:14:57,489 INFO [train.py:917] (1/4) Epoch 15, validation: loss=0.03387, audio_tagging_loss=0.03387, over 3737520.00 frames. 2023-12-22 07:14:57,490 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 07:15:04,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.49 vs. limit=12.0 2023-12-22 07:15:05,582 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.509e+01 2.703e+01 2.839e+01 2.986e+01 3.331e+01, threshold=5.678e+01, percent-clipped=0.0 2023-12-22 07:15:33,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=465026.6666666667, ans=0.2 2023-12-22 07:15:41,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=465093.3333333333, ans=0.2 2023-12-22 07:15:48,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=465160.0, ans=0.125 2023-12-22 07:15:49,509 INFO [train.py:886] (1/4) Epoch 15, batch 3050, loss[loss=0.01567, audio_tagging_loss=0.01567, over 25000.00 frames. ], tot_loss[loss=0.01419, audio_tagging_loss=0.01419, over 4959485.60 frames. ], batch size: 100, lr: 7.24e-03, grad_scale: 64.0 2023-12-22 07:15:55,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=465160.0, ans=0.5 2023-12-22 07:15:58,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=465226.6666666667, ans=0.07 2023-12-22 07:16:23,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=465360.0, ans=0.125 2023-12-22 07:16:40,684 INFO [train.py:886] (1/4) Epoch 15, batch 3100, loss[loss=0.01419, audio_tagging_loss=0.01419, over 25000.00 frames. ], tot_loss[loss=0.0142, audio_tagging_loss=0.0142, over 4963245.94 frames. ], batch size: 100, lr: 7.24e-03, grad_scale: 64.0 2023-12-22 07:16:48,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=465493.3333333333, ans=0.025 2023-12-22 07:16:49,704 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.496e+01 2.762e+01 2.896e+01 3.048e+01 3.549e+01, threshold=5.793e+01, percent-clipped=0.0 2023-12-22 07:16:52,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=465560.0, ans=0.125 2023-12-22 07:16:53,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.02 vs. limit=15.0 2023-12-22 07:17:16,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.58 vs. limit=6.0 2023-12-22 07:17:21,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=465693.3333333333, ans=0.2 2023-12-22 07:17:21,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=465693.3333333333, ans=0.125 2023-12-22 07:17:29,875 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.76 vs. limit=15.0 2023-12-22 07:17:33,964 INFO [train.py:886] (1/4) Epoch 15, batch 3150, loss[loss=0.01401, audio_tagging_loss=0.01401, over 24750.00 frames. ], tot_loss[loss=0.01433, audio_tagging_loss=0.01433, over 4960542.36 frames. ], batch size: 99, lr: 7.24e-03, grad_scale: 64.0 2023-12-22 07:17:43,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=465893.3333333333, ans=0.125 2023-12-22 07:17:45,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=465893.3333333333, ans=0.0 2023-12-22 07:17:58,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=465960.0, ans=0.0 2023-12-22 07:18:23,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=466093.3333333333, ans=0.125 2023-12-22 07:18:25,592 INFO [train.py:886] (1/4) Epoch 15, batch 3200, loss[loss=0.01283, audio_tagging_loss=0.01283, over 25000.00 frames. ], tot_loss[loss=0.01442, audio_tagging_loss=0.01442, over 4955564.90 frames. ], batch size: 100, lr: 7.24e-03, grad_scale: 64.0 2023-12-22 07:18:27,671 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.19 vs. limit=15.0 2023-12-22 07:18:32,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=466160.0, ans=0.0 2023-12-22 07:18:34,466 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.409e+01 2.747e+01 2.900e+01 3.032e+01 3.537e+01, threshold=5.801e+01, percent-clipped=0.0 2023-12-22 07:18:34,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=466160.0, ans=0.125 2023-12-22 07:18:51,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=466293.3333333333, ans=0.0 2023-12-22 07:18:52,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=466293.3333333333, ans=0.125 2023-12-22 07:18:53,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=466293.3333333333, ans=0.125 2023-12-22 07:19:00,656 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.42 vs. limit=10.0 2023-12-22 07:19:14,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=466426.6666666667, ans=0.125 2023-12-22 07:19:16,947 INFO [train.py:886] (1/4) Epoch 15, batch 3250, loss[loss=0.01244, audio_tagging_loss=0.01244, over 25000.00 frames. ], tot_loss[loss=0.01434, audio_tagging_loss=0.01434, over 4958352.29 frames. ], batch size: 100, lr: 7.23e-03, grad_scale: 64.0 2023-12-22 07:19:17,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=466493.3333333333, ans=0.125 2023-12-22 07:19:41,190 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=15.0 2023-12-22 07:19:57,079 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.56 vs. limit=10.0 2023-12-22 07:20:03,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=466760.0, ans=0.125 2023-12-22 07:20:04,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=466760.0, ans=0.125 2023-12-22 07:20:09,538 INFO [train.py:886] (1/4) Epoch 15, batch 3300, loss[loss=0.01292, audio_tagging_loss=0.01292, over 24750.00 frames. ], tot_loss[loss=0.01426, audio_tagging_loss=0.01426, over 4955547.20 frames. ], batch size: 99, lr: 7.23e-03, grad_scale: 64.0 2023-12-22 07:20:13,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=466826.6666666667, ans=0.125 2023-12-22 07:20:17,792 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.385e+01 2.692e+01 2.863e+01 2.982e+01 3.634e+01, threshold=5.726e+01, percent-clipped=0.0 2023-12-22 07:20:20,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=466893.3333333333, ans=0.0 2023-12-22 07:20:27,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=466893.3333333333, ans=0.125 2023-12-22 07:20:46,507 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.33 vs. limit=15.0 2023-12-22 07:20:56,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=467093.3333333333, ans=0.125 2023-12-22 07:21:00,885 INFO [train.py:886] (1/4) Epoch 15, batch 3350, loss[loss=0.01509, audio_tagging_loss=0.01509, over 25000.00 frames. ], tot_loss[loss=0.0142, audio_tagging_loss=0.0142, over 4955216.86 frames. ], batch size: 100, lr: 7.23e-03, grad_scale: 64.0 2023-12-22 07:21:11,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=467226.6666666667, ans=0.125 2023-12-22 07:21:50,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=467426.6666666667, ans=0.2 2023-12-22 07:21:53,189 INFO [train.py:886] (1/4) Epoch 15, batch 3400, loss[loss=0.01303, audio_tagging_loss=0.01303, over 24750.00 frames. ], tot_loss[loss=0.01425, audio_tagging_loss=0.01425, over 4953567.22 frames. ], batch size: 99, lr: 7.23e-03, grad_scale: 64.0 2023-12-22 07:21:53,729 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.51 vs. limit=15.0 2023-12-22 07:21:56,462 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.33 vs. limit=15.0 2023-12-22 07:21:58,302 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.46 vs. limit=15.0 2023-12-22 07:22:00,731 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.501e+01 2.741e+01 2.914e+01 3.061e+01 3.505e+01, threshold=5.827e+01, percent-clipped=0.0 2023-12-22 07:22:21,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=467626.6666666667, ans=0.125 2023-12-22 07:22:35,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=467760.0, ans=0.125 2023-12-22 07:22:41,302 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=15.0 2023-12-22 07:22:44,539 INFO [train.py:886] (1/4) Epoch 15, batch 3450, loss[loss=0.0122, audio_tagging_loss=0.0122, over 24750.00 frames. ], tot_loss[loss=0.01433, audio_tagging_loss=0.01433, over 4948009.27 frames. ], batch size: 99, lr: 7.22e-03, grad_scale: 64.0 2023-12-22 07:22:44,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=467826.6666666667, ans=0.2 2023-12-22 07:23:02,220 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 07:23:06,896 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.04 vs. limit=15.0 2023-12-22 07:23:23,869 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.20 vs. limit=15.0 2023-12-22 07:23:27,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=468093.3333333333, ans=0.125 2023-12-22 07:23:36,034 INFO [train.py:886] (1/4) Epoch 15, batch 3500, loss[loss=0.01605, audio_tagging_loss=0.01605, over 24750.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4941233.78 frames. ], batch size: 99, lr: 7.22e-03, grad_scale: 64.0 2023-12-22 07:23:41,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=468160.0, ans=0.0 2023-12-22 07:23:43,627 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.476e+01 2.815e+01 2.965e+01 3.096e+01 3.766e+01, threshold=5.930e+01, percent-clipped=0.0 2023-12-22 07:23:47,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=468226.6666666667, ans=0.5 2023-12-22 07:24:07,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=468360.0, ans=0.125 2023-12-22 07:24:14,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=468360.0, ans=0.0 2023-12-22 07:24:17,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=468426.6666666667, ans=0.0 2023-12-22 07:24:19,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=468426.6666666667, ans=0.0 2023-12-22 07:24:28,370 INFO [train.py:886] (1/4) Epoch 15, batch 3550, loss[loss=0.01289, audio_tagging_loss=0.01289, over 24750.00 frames. ], tot_loss[loss=0.01434, audio_tagging_loss=0.01434, over 4943020.79 frames. ], batch size: 99, lr: 7.22e-03, grad_scale: 64.0 2023-12-22 07:24:29,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=468493.3333333333, ans=0.125 2023-12-22 07:24:39,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=468560.0, ans=0.125 2023-12-22 07:24:43,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=468560.0, ans=0.0 2023-12-22 07:24:45,810 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.64 vs. limit=6.0 2023-12-22 07:24:49,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=468626.6666666667, ans=0.0 2023-12-22 07:24:59,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=468693.3333333333, ans=0.125 2023-12-22 07:25:06,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=468693.3333333333, ans=0.125 2023-12-22 07:25:12,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=468760.0, ans=0.09899494936611666 2023-12-22 07:25:19,409 INFO [train.py:886] (1/4) Epoch 15, batch 3600, loss[loss=0.0161, audio_tagging_loss=0.0161, over 25000.00 frames. ], tot_loss[loss=0.01425, audio_tagging_loss=0.01425, over 4945868.60 frames. ], batch size: 100, lr: 7.22e-03, grad_scale: 64.0 2023-12-22 07:25:22,503 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.461e-02 2023-12-22 07:25:26,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=468826.6666666667, ans=0.2 2023-12-22 07:25:28,303 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+01 2.699e+01 2.856e+01 3.019e+01 3.433e+01, threshold=5.713e+01, percent-clipped=0.0 2023-12-22 07:25:33,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=468893.3333333333, ans=0.0 2023-12-22 07:25:37,073 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=15.0 2023-12-22 07:25:43,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=468960.0, ans=0.1 2023-12-22 07:25:59,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=469093.3333333333, ans=0.0 2023-12-22 07:26:10,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=469160.0, ans=0.2 2023-12-22 07:26:11,398 INFO [train.py:886] (1/4) Epoch 15, batch 3650, loss[loss=0.01444, audio_tagging_loss=0.01444, over 25000.00 frames. ], tot_loss[loss=0.01419, audio_tagging_loss=0.01419, over 4946511.84 frames. ], batch size: 100, lr: 7.21e-03, grad_scale: 64.0 2023-12-22 07:26:36,711 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.40 vs. limit=15.0 2023-12-22 07:26:46,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=469360.0, ans=0.95 2023-12-22 07:27:02,575 INFO [train.py:886] (1/4) Epoch 15, batch 3700, loss[loss=0.01647, audio_tagging_loss=0.01647, over 24750.00 frames. ], tot_loss[loss=0.01423, audio_tagging_loss=0.01423, over 4952211.26 frames. ], batch size: 99, lr: 7.21e-03, grad_scale: 64.0 2023-12-22 07:27:03,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=469493.3333333333, ans=0.125 2023-12-22 07:27:11,638 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.491e+01 2.741e+01 2.872e+01 3.028e+01 3.371e+01, threshold=5.744e+01, percent-clipped=0.0 2023-12-22 07:27:12,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=469560.0, ans=0.2 2023-12-22 07:27:13,240 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.16 vs. limit=15.0 2023-12-22 07:27:13,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=469560.0, ans=0.125 2023-12-22 07:27:24,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=469626.6666666667, ans=0.0 2023-12-22 07:27:25,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.14 vs. limit=15.0 2023-12-22 07:27:39,270 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.44 vs. limit=15.0 2023-12-22 07:27:47,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=469760.0, ans=0.125 2023-12-22 07:27:55,262 INFO [train.py:886] (1/4) Epoch 15, batch 3750, loss[loss=0.01596, audio_tagging_loss=0.01596, over 24750.00 frames. ], tot_loss[loss=0.01431, audio_tagging_loss=0.01431, over 4951421.56 frames. ], batch size: 99, lr: 7.21e-03, grad_scale: 64.0 2023-12-22 07:27:57,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=469826.6666666667, ans=0.1 2023-12-22 07:28:12,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=469893.3333333333, ans=0.125 2023-12-22 07:28:27,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=470026.6666666667, ans=0.0 2023-12-22 07:28:45,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=470160.0, ans=0.1 2023-12-22 07:28:47,228 INFO [train.py:886] (1/4) Epoch 15, batch 3800, loss[loss=0.01393, audio_tagging_loss=0.01393, over 24046.00 frames. ], tot_loss[loss=0.01436, audio_tagging_loss=0.01436, over 4950148.85 frames. ], batch size: 100, lr: 7.21e-03, grad_scale: 64.0 2023-12-22 07:28:48,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=470160.0, ans=0.1 2023-12-22 07:28:55,408 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.476e+01 2.844e+01 2.987e+01 3.173e+01 3.633e+01, threshold=5.973e+01, percent-clipped=0.0 2023-12-22 07:28:59,001 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.79 vs. limit=22.5 2023-12-22 07:28:59,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=470226.6666666667, ans=15.0 2023-12-22 07:29:03,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=470226.6666666667, ans=0.125 2023-12-22 07:29:18,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=470360.0, ans=0.2 2023-12-22 07:29:32,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=470426.6666666667, ans=0.125 2023-12-22 07:29:37,991 INFO [train.py:886] (1/4) Epoch 15, batch 3850, loss[loss=0.01306, audio_tagging_loss=0.01306, over 24750.00 frames. ], tot_loss[loss=0.01436, audio_tagging_loss=0.01436, over 4952989.66 frames. ], batch size: 99, lr: 7.20e-03, grad_scale: 64.0 2023-12-22 07:29:43,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=470493.3333333333, ans=0.125 2023-12-22 07:30:00,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=470626.6666666667, ans=0.0 2023-12-22 07:30:30,450 INFO [train.py:886] (1/4) Epoch 15, batch 3900, loss[loss=0.01211, audio_tagging_loss=0.01211, over 25000.00 frames. ], tot_loss[loss=0.0143, audio_tagging_loss=0.0143, over 4948897.41 frames. ], batch size: 100, lr: 7.20e-03, grad_scale: 64.0 2023-12-22 07:30:38,604 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.440e+01 2.742e+01 2.857e+01 3.020e+01 3.655e+01, threshold=5.714e+01, percent-clipped=0.0 2023-12-22 07:30:38,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=470826.6666666667, ans=0.125 2023-12-22 07:30:40,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=470893.3333333333, ans=0.5 2023-12-22 07:31:01,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=471026.6666666667, ans=0.2 2023-12-22 07:31:12,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=471093.3333333333, ans=0.0 2023-12-22 07:31:16,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=471093.3333333333, ans=0.125 2023-12-22 07:31:22,515 INFO [train.py:886] (1/4) Epoch 15, batch 3950, loss[loss=0.01745, audio_tagging_loss=0.01745, over 25000.00 frames. ], tot_loss[loss=0.01426, audio_tagging_loss=0.01426, over 4949609.82 frames. ], batch size: 100, lr: 7.20e-03, grad_scale: 64.0 2023-12-22 07:31:27,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=471160.0, ans=0.1 2023-12-22 07:31:52,335 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.47 vs. limit=22.5 2023-12-22 07:31:56,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=471360.0, ans=0.125 2023-12-22 07:32:01,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=471360.0, ans=0.1 2023-12-22 07:32:01,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.45 vs. limit=15.0 2023-12-22 07:32:13,923 INFO [train.py:886] (1/4) Epoch 15, batch 4000, loss[loss=0.01249, audio_tagging_loss=0.01249, over 25000.00 frames. ], tot_loss[loss=0.01429, audio_tagging_loss=0.01429, over 4955725.17 frames. ], batch size: 100, lr: 7.20e-03, grad_scale: 64.0 2023-12-22 07:32:18,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=471493.3333333333, ans=0.125 2023-12-22 07:32:21,504 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.461e+01 2.772e+01 2.863e+01 2.977e+01 3.465e+01, threshold=5.726e+01, percent-clipped=0.0 2023-12-22 07:32:27,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=471560.0, ans=0.1 2023-12-22 07:32:30,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=471560.0, ans=0.0 2023-12-22 07:32:32,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=471560.0, ans=0.125 2023-12-22 07:32:52,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=471693.3333333333, ans=0.125 2023-12-22 07:33:04,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=471826.6666666667, ans=0.125 2023-12-22 07:33:05,124 INFO [train.py:886] (1/4) Epoch 15, batch 4050, loss[loss=0.01347, audio_tagging_loss=0.01347, over 24750.00 frames. ], tot_loss[loss=0.01429, audio_tagging_loss=0.01429, over 4958106.47 frames. ], batch size: 99, lr: 7.19e-03, grad_scale: 64.0 2023-12-22 07:33:06,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=471826.6666666667, ans=0.0 2023-12-22 07:33:07,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=471826.6666666667, ans=0.04949747468305833 2023-12-22 07:33:18,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=471893.3333333333, ans=0.09899494936611666 2023-12-22 07:33:39,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=472026.6666666667, ans=0.1 2023-12-22 07:33:42,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=472026.6666666667, ans=0.0 2023-12-22 07:33:57,431 INFO [train.py:886] (1/4) Epoch 15, batch 4100, loss[loss=0.01461, audio_tagging_loss=0.01461, over 24750.00 frames. ], tot_loss[loss=0.01439, audio_tagging_loss=0.01439, over 4952033.48 frames. ], batch size: 99, lr: 7.19e-03, grad_scale: 64.0 2023-12-22 07:34:05,957 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.602e+01 2.831e+01 2.961e+01 3.185e+01 3.752e+01, threshold=5.922e+01, percent-clipped=0.0 2023-12-22 07:34:08,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=472226.6666666667, ans=0.0 2023-12-22 07:34:15,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=472226.6666666667, ans=0.125 2023-12-22 07:34:33,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=472360.0, ans=0.1 2023-12-22 07:34:42,505 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.05 vs. limit=15.0 2023-12-22 07:34:49,340 INFO [train.py:886] (1/4) Epoch 15, batch 4150, loss[loss=0.01453, audio_tagging_loss=0.01453, over 24750.00 frames. ], tot_loss[loss=0.01439, audio_tagging_loss=0.01439, over 4940485.85 frames. ], batch size: 99, lr: 7.19e-03, grad_scale: 64.0 2023-12-22 07:34:51,556 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.70 vs. limit=22.5 2023-12-22 07:35:13,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=472626.6666666667, ans=0.125 2023-12-22 07:35:14,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=472626.6666666667, ans=0.1 2023-12-22 07:35:30,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=472760.0, ans=0.05 2023-12-22 07:35:40,925 INFO [train.py:886] (1/4) Epoch 15, batch 4200, loss[loss=0.01369, audio_tagging_loss=0.01369, over 25000.00 frames. ], tot_loss[loss=0.0143, audio_tagging_loss=0.0143, over 4946599.87 frames. ], batch size: 100, lr: 7.19e-03, grad_scale: 64.0 2023-12-22 07:35:42,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=472826.6666666667, ans=0.125 2023-12-22 07:35:44,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=472826.6666666667, ans=0.2 2023-12-22 07:35:45,241 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.69 vs. limit=15.0 2023-12-22 07:35:48,442 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.360e+01 2.733e+01 2.855e+01 3.027e+01 3.631e+01, threshold=5.711e+01, percent-clipped=0.0 2023-12-22 07:36:05,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2023-12-22 07:36:06,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=472960.0, ans=0.1 2023-12-22 07:36:13,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=473026.6666666667, ans=0.1 2023-12-22 07:36:19,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=473026.6666666667, ans=0.125 2023-12-22 07:36:25,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=473093.3333333333, ans=0.05 2023-12-22 07:36:32,704 INFO [train.py:886] (1/4) Epoch 15, batch 4250, loss[loss=0.01332, audio_tagging_loss=0.01332, over 25000.00 frames. ], tot_loss[loss=0.01413, audio_tagging_loss=0.01413, over 4955405.50 frames. ], batch size: 100, lr: 7.18e-03, grad_scale: 64.0 2023-12-22 07:36:37,070 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.17 vs. limit=15.0 2023-12-22 07:36:43,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=473226.6666666667, ans=0.0 2023-12-22 07:36:46,968 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.43 vs. limit=15.0 2023-12-22 07:36:54,431 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.93 vs. limit=15.0 2023-12-22 07:36:56,213 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 07:36:57,407 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.70 vs. limit=22.5 2023-12-22 07:37:00,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=473293.3333333333, ans=0.0 2023-12-22 07:37:24,538 INFO [train.py:886] (1/4) Epoch 15, batch 4300, loss[loss=0.01383, audio_tagging_loss=0.01383, over 25000.00 frames. ], tot_loss[loss=0.01412, audio_tagging_loss=0.01412, over 4959532.00 frames. ], batch size: 100, lr: 7.18e-03, grad_scale: 64.0 2023-12-22 07:37:32,803 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.452e+01 2.780e+01 2.893e+01 3.020e+01 3.657e+01, threshold=5.787e+01, percent-clipped=0.0 2023-12-22 07:37:41,566 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.45 vs. limit=15.0 2023-12-22 07:38:14,615 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.08 vs. limit=6.0 2023-12-22 07:38:16,786 INFO [train.py:886] (1/4) Epoch 15, batch 4350, loss[loss=0.01244, audio_tagging_loss=0.01244, over 21813.00 frames. ], tot_loss[loss=0.0142, audio_tagging_loss=0.0142, over 4959350.49 frames. ], batch size: 107, lr: 7.18e-03, grad_scale: 64.0 2023-12-22 07:38:19,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=473826.6666666667, ans=0.125 2023-12-22 07:38:26,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=473893.3333333333, ans=0.04949747468305833 2023-12-22 07:38:34,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=473893.3333333333, ans=0.125 2023-12-22 07:38:50,492 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.37 vs. limit=15.0 2023-12-22 07:38:52,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=474026.6666666667, ans=0.125 2023-12-22 07:39:08,302 INFO [train.py:886] (1/4) Epoch 15, batch 4400, loss[loss=0.01332, audio_tagging_loss=0.01332, over 24750.00 frames. ], tot_loss[loss=0.01439, audio_tagging_loss=0.01439, over 4957868.55 frames. ], batch size: 99, lr: 7.18e-03, grad_scale: 64.0 2023-12-22 07:39:08,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=474160.0, ans=0.125 2023-12-22 07:39:14,829 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=7.097e-02 2023-12-22 07:39:16,467 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.580e+01 2.787e+01 2.952e+01 3.076e+01 3.640e+01, threshold=5.903e+01, percent-clipped=0.0 2023-12-22 07:39:39,709 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.32 vs. limit=10.0 2023-12-22 07:39:53,441 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.32 vs. limit=15.0 2023-12-22 07:39:56,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=474426.6666666667, ans=0.0 2023-12-22 07:39:59,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=474493.3333333333, ans=0.0 2023-12-22 07:40:00,417 INFO [train.py:886] (1/4) Epoch 15, batch 4450, loss[loss=0.01478, audio_tagging_loss=0.01478, over 24750.00 frames. ], tot_loss[loss=0.01444, audio_tagging_loss=0.01444, over 4951800.09 frames. ], batch size: 99, lr: 7.17e-03, grad_scale: 64.0 2023-12-22 07:40:22,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=474626.6666666667, ans=0.125 2023-12-22 07:40:45,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=474760.0, ans=0.09899494936611666 2023-12-22 07:40:51,815 INFO [train.py:886] (1/4) Epoch 15, batch 4500, loss[loss=0.01382, audio_tagging_loss=0.01382, over 25000.00 frames. ], tot_loss[loss=0.01433, audio_tagging_loss=0.01433, over 4953423.17 frames. ], batch size: 100, lr: 7.17e-03, grad_scale: 64.0 2023-12-22 07:40:59,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=474826.6666666667, ans=0.1 2023-12-22 07:41:00,682 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.363e+01 2.748e+01 2.874e+01 3.023e+01 3.451e+01, threshold=5.747e+01, percent-clipped=0.0 2023-12-22 07:41:29,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=475026.6666666667, ans=0.0 2023-12-22 07:41:38,001 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.57 vs. limit=10.0 2023-12-22 07:41:43,291 INFO [train.py:886] (1/4) Epoch 15, batch 4550, loss[loss=0.01359, audio_tagging_loss=0.01359, over 25000.00 frames. ], tot_loss[loss=0.01424, audio_tagging_loss=0.01424, over 4956894.81 frames. ], batch size: 100, lr: 7.17e-03, grad_scale: 64.0 2023-12-22 07:42:09,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=475293.3333333333, ans=0.0 2023-12-22 07:42:20,793 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.87 vs. limit=10.0 2023-12-22 07:42:35,161 INFO [train.py:886] (1/4) Epoch 15, batch 4600, loss[loss=0.01421, audio_tagging_loss=0.01421, over 25000.00 frames. ], tot_loss[loss=0.01423, audio_tagging_loss=0.01423, over 4959895.47 frames. ], batch size: 100, lr: 7.17e-03, grad_scale: 64.0 2023-12-22 07:42:37,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=475493.3333333333, ans=0.1 2023-12-22 07:42:37,219 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2023-12-22 07:42:39,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=475493.3333333333, ans=0.125 2023-12-22 07:42:43,596 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.459e+01 2.790e+01 2.912e+01 3.073e+01 3.704e+01, threshold=5.824e+01, percent-clipped=0.0 2023-12-22 07:42:48,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=475560.0, ans=0.0 2023-12-22 07:42:55,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=475626.6666666667, ans=0.125 2023-12-22 07:42:58,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=475626.6666666667, ans=0.125 2023-12-22 07:43:04,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=475626.6666666667, ans=0.1 2023-12-22 07:43:06,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=475693.3333333333, ans=0.125 2023-12-22 07:43:18,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=475760.0, ans=0.125 2023-12-22 07:43:19,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=475760.0, ans=0.0 2023-12-22 07:43:21,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=475760.0, ans=0.125 2023-12-22 07:43:27,438 INFO [train.py:886] (1/4) Epoch 15, batch 4650, loss[loss=0.01345, audio_tagging_loss=0.01345, over 25000.00 frames. ], tot_loss[loss=0.01423, audio_tagging_loss=0.01423, over 4958578.99 frames. ], batch size: 100, lr: 7.16e-03, grad_scale: 64.0 2023-12-22 07:43:30,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=475826.6666666667, ans=0.2 2023-12-22 07:43:33,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=475826.6666666667, ans=0.035 2023-12-22 07:43:37,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=475893.3333333333, ans=0.125 2023-12-22 07:43:53,948 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.98 vs. limit=15.0 2023-12-22 07:44:16,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=476093.3333333333, ans=0.1 2023-12-22 07:44:18,121 INFO [train.py:886] (1/4) Epoch 15, batch 4700, loss[loss=0.01509, audio_tagging_loss=0.01509, over 24750.00 frames. ], tot_loss[loss=0.01441, audio_tagging_loss=0.01441, over 4952086.87 frames. ], batch size: 99, lr: 7.16e-03, grad_scale: 64.0 2023-12-22 07:44:26,798 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.571e+01 2.835e+01 2.953e+01 3.126e+01 3.694e+01, threshold=5.906e+01, percent-clipped=0.0 2023-12-22 07:44:56,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=476426.6666666667, ans=0.125 2023-12-22 07:45:00,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=476426.6666666667, ans=0.125 2023-12-22 07:45:03,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=476426.6666666667, ans=0.125 2023-12-22 07:45:05,677 INFO [train.py:886] (1/4) Epoch 15, batch 4750, loss[loss=0.01176, audio_tagging_loss=0.01176, over 24750.00 frames. ], tot_loss[loss=0.01446, audio_tagging_loss=0.01446, over 4951210.48 frames. ], batch size: 99, lr: 7.16e-03, grad_scale: 128.0 2023-12-22 07:45:09,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=476493.3333333333, ans=0.0 2023-12-22 07:45:42,780 INFO [train.py:886] (1/4) Epoch 16, batch 0, loss[loss=0.03805, audio_tagging_loss=0.03805, over 20538.00 frames. ], tot_loss[loss=0.03805, audio_tagging_loss=0.03805, over 20538.00 frames. ], batch size: 107, lr: 6.93e-03, grad_scale: 32.0 2023-12-22 07:45:42,780 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 07:46:04,064 INFO [train.py:917] (1/4) Epoch 16, validation: loss=0.03318, audio_tagging_loss=0.03318, over 3737520.00 frames. 2023-12-22 07:46:04,064 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 07:46:24,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=476733.3333333333, ans=0.125 2023-12-22 07:46:49,656 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.572e+01 2.881e+01 3.132e+01 4.118e+01 9.111e+01, threshold=6.264e+01, percent-clipped=8.0 2023-12-22 07:46:52,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=476866.6666666667, ans=0.0 2023-12-22 07:46:53,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=476866.6666666667, ans=0.125 2023-12-22 07:46:55,366 INFO [train.py:886] (1/4) Epoch 16, batch 50, loss[loss=0.01781, audio_tagging_loss=0.01781, over 25000.00 frames. ], tot_loss[loss=0.02253, audio_tagging_loss=0.02253, over 1120644.91 frames. ], batch size: 100, lr: 6.93e-03, grad_scale: 32.0 2023-12-22 07:46:57,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=476933.3333333333, ans=0.0 2023-12-22 07:47:28,181 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.65 vs. limit=22.5 2023-12-22 07:47:36,502 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.99 vs. limit=15.0 2023-12-22 07:47:41,132 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.18 vs. limit=15.0 2023-12-22 07:47:44,800 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=15.0 2023-12-22 07:47:47,682 INFO [train.py:886] (1/4) Epoch 16, batch 100, loss[loss=0.01775, audio_tagging_loss=0.01775, over 25000.00 frames. ], tot_loss[loss=0.01972, audio_tagging_loss=0.01972, over 1972003.74 frames. ], batch size: 100, lr: 6.92e-03, grad_scale: 32.0 2023-12-22 07:48:12,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=477400.0, ans=0.1 2023-12-22 07:48:13,604 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.58 vs. limit=22.5 2023-12-22 07:48:33,204 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.658e+01 3.003e+01 3.220e+01 3.387e+01 3.937e+01, threshold=6.439e+01, percent-clipped=0.0 2023-12-22 07:48:38,858 INFO [train.py:886] (1/4) Epoch 16, batch 150, loss[loss=0.0158, audio_tagging_loss=0.0158, over 25000.00 frames. ], tot_loss[loss=0.01798, audio_tagging_loss=0.01798, over 2637834.69 frames. ], batch size: 100, lr: 6.92e-03, grad_scale: 32.0 2023-12-22 07:48:43,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=477600.0, ans=0.0 2023-12-22 07:48:55,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=477666.6666666667, ans=0.09899494936611666 2023-12-22 07:49:10,665 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2023-12-22 07:49:14,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=477800.0, ans=0.0 2023-12-22 07:49:26,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=477866.6666666667, ans=0.125 2023-12-22 07:49:31,246 INFO [train.py:886] (1/4) Epoch 16, batch 200, loss[loss=0.01404, audio_tagging_loss=0.01404, over 25000.00 frames. ], tot_loss[loss=0.01686, audio_tagging_loss=0.01686, over 3155979.67 frames. ], batch size: 100, lr: 6.92e-03, grad_scale: 32.0 2023-12-22 07:49:31,503 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=7.780e-03 2023-12-22 07:49:49,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=478000.0, ans=15.0 2023-12-22 07:50:08,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=478133.3333333333, ans=0.125 2023-12-22 07:50:16,274 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.531e+01 2.761e+01 2.944e+01 3.069e+01 3.550e+01, threshold=5.887e+01, percent-clipped=0.0 2023-12-22 07:50:22,661 INFO [train.py:886] (1/4) Epoch 16, batch 250, loss[loss=0.01609, audio_tagging_loss=0.01609, over 25000.00 frames. ], tot_loss[loss=0.01618, audio_tagging_loss=0.01618, over 3554292.30 frames. ], batch size: 100, lr: 6.92e-03, grad_scale: 32.0 2023-12-22 07:50:22,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=478266.6666666667, ans=0.125 2023-12-22 07:50:31,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=478266.6666666667, ans=0.125 2023-12-22 07:50:39,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=478333.3333333333, ans=0.125 2023-12-22 07:50:42,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=478400.0, ans=0.125 2023-12-22 07:50:48,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=478400.0, ans=0.125 2023-12-22 07:50:56,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=478466.6666666667, ans=0.0 2023-12-22 07:51:04,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=478533.3333333333, ans=0.125 2023-12-22 07:51:15,088 INFO [train.py:886] (1/4) Epoch 16, batch 300, loss[loss=0.01462, audio_tagging_loss=0.01462, over 24946.00 frames. ], tot_loss[loss=0.01578, audio_tagging_loss=0.01578, over 3857517.07 frames. ], batch size: 100, lr: 6.91e-03, grad_scale: 32.0 2023-12-22 07:51:17,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=478600.0, ans=0.1 2023-12-22 07:51:20,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=478600.0, ans=0.0 2023-12-22 07:51:25,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=478666.6666666667, ans=0.1 2023-12-22 07:51:36,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=478733.3333333333, ans=0.125 2023-12-22 07:51:38,657 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.98 vs. limit=10.0 2023-12-22 07:52:00,601 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.509e+01 2.872e+01 2.968e+01 3.150e+01 3.579e+01, threshold=5.936e+01, percent-clipped=0.0 2023-12-22 07:52:01,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.30 vs. limit=15.0 2023-12-22 07:52:03,615 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.51 vs. limit=10.0 2023-12-22 07:52:07,799 INFO [train.py:886] (1/4) Epoch 16, batch 350, loss[loss=0.01619, audio_tagging_loss=0.01619, over 24750.00 frames. ], tot_loss[loss=0.01549, audio_tagging_loss=0.01549, over 4093661.41 frames. ], batch size: 99, lr: 6.91e-03, grad_scale: 32.0 2023-12-22 07:52:12,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=478933.3333333333, ans=0.125 2023-12-22 07:52:21,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=479000.0, ans=0.125 2023-12-22 07:52:40,843 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.41 vs. limit=15.0 2023-12-22 07:52:47,578 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.81 vs. limit=22.5 2023-12-22 07:52:47,623 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.85 vs. limit=22.5 2023-12-22 07:52:59,652 INFO [train.py:886] (1/4) Epoch 16, batch 400, loss[loss=0.01501, audio_tagging_loss=0.01501, over 24750.00 frames. ], tot_loss[loss=0.01511, audio_tagging_loss=0.01511, over 4281013.32 frames. ], batch size: 99, lr: 6.91e-03, grad_scale: 32.0 2023-12-22 07:52:59,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=479266.6666666667, ans=0.125 2023-12-22 07:53:07,567 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.93 vs. limit=12.0 2023-12-22 07:53:14,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=479333.3333333333, ans=0.0 2023-12-22 07:53:15,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=479333.3333333333, ans=0.0 2023-12-22 07:53:44,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=479533.3333333333, ans=0.1 2023-12-22 07:53:45,881 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.489e+01 2.751e+01 2.885e+01 3.045e+01 3.501e+01, threshold=5.769e+01, percent-clipped=0.0 2023-12-22 07:53:51,543 INFO [train.py:886] (1/4) Epoch 16, batch 450, loss[loss=0.01388, audio_tagging_loss=0.01388, over 25000.00 frames. ], tot_loss[loss=0.01476, audio_tagging_loss=0.01476, over 4430769.20 frames. ], batch size: 100, lr: 6.91e-03, grad_scale: 32.0 2023-12-22 07:53:53,629 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.161e-02 2023-12-22 07:54:00,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=479666.6666666667, ans=0.2 2023-12-22 07:54:04,754 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.58 vs. limit=15.0 2023-12-22 07:54:06,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=479666.6666666667, ans=0.2 2023-12-22 07:54:10,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=479666.6666666667, ans=0.0 2023-12-22 07:54:23,336 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.25 vs. limit=10.0 2023-12-22 07:54:43,147 INFO [train.py:886] (1/4) Epoch 16, batch 500, loss[loss=0.0159, audio_tagging_loss=0.0159, over 25000.00 frames. ], tot_loss[loss=0.01462, audio_tagging_loss=0.01462, over 4548591.04 frames. ], batch size: 100, lr: 6.91e-03, grad_scale: 32.0 2023-12-22 07:54:51,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=479933.3333333333, ans=0.125 2023-12-22 07:54:57,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2023-12-22 07:55:14,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=480066.6666666667, ans=0.0 2023-12-22 07:55:31,301 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.482e+01 2.740e+01 2.884e+01 3.004e+01 3.376e+01, threshold=5.768e+01, percent-clipped=0.0 2023-12-22 07:55:31,840 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.81 vs. limit=12.0 2023-12-22 07:55:35,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=480200.0, ans=0.04949747468305833 2023-12-22 07:55:37,649 INFO [train.py:886] (1/4) Epoch 16, batch 550, loss[loss=0.01406, audio_tagging_loss=0.01406, over 24750.00 frames. ], tot_loss[loss=0.01453, audio_tagging_loss=0.01453, over 4636675.44 frames. ], batch size: 99, lr: 6.90e-03, grad_scale: 32.0 2023-12-22 07:56:21,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=480533.3333333333, ans=0.125 2023-12-22 07:56:29,826 INFO [train.py:886] (1/4) Epoch 16, batch 600, loss[loss=0.01204, audio_tagging_loss=0.01204, over 24750.00 frames. ], tot_loss[loss=0.01461, audio_tagging_loss=0.01461, over 4707687.50 frames. ], batch size: 99, lr: 6.90e-03, grad_scale: 32.0 2023-12-22 07:56:53,451 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.84 vs. limit=22.5 2023-12-22 07:57:03,245 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.23 vs. limit=15.0 2023-12-22 07:57:07,394 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.83 vs. limit=8.0 2023-12-22 07:57:14,893 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.350e+01 2.812e+01 2.929e+01 3.069e+01 3.591e+01, threshold=5.857e+01, percent-clipped=0.0 2023-12-22 07:57:21,254 INFO [train.py:886] (1/4) Epoch 16, batch 650, loss[loss=0.01527, audio_tagging_loss=0.01527, over 25000.00 frames. ], tot_loss[loss=0.01463, audio_tagging_loss=0.01463, over 4757126.16 frames. ], batch size: 100, lr: 6.90e-03, grad_scale: 32.0 2023-12-22 07:57:27,374 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.48 vs. limit=22.5 2023-12-22 07:57:29,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=480933.3333333333, ans=0.04949747468305833 2023-12-22 07:57:37,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=481000.0, ans=0.125 2023-12-22 07:57:47,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=481066.6666666667, ans=0.125 2023-12-22 07:58:13,150 INFO [train.py:886] (1/4) Epoch 16, batch 700, loss[loss=0.0153, audio_tagging_loss=0.0153, over 24750.00 frames. ], tot_loss[loss=0.01464, audio_tagging_loss=0.01464, over 4801039.98 frames. ], batch size: 99, lr: 6.90e-03, grad_scale: 32.0 2023-12-22 07:58:34,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=481400.0, ans=0.0 2023-12-22 07:58:44,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=481466.6666666667, ans=0.125 2023-12-22 07:58:58,093 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.524e-02 2023-12-22 07:58:58,831 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.523e+01 2.788e+01 2.907e+01 3.038e+01 3.343e+01, threshold=5.814e+01, percent-clipped=0.0 2023-12-22 07:59:05,262 INFO [train.py:886] (1/4) Epoch 16, batch 750, loss[loss=0.01299, audio_tagging_loss=0.01299, over 25000.00 frames. ], tot_loss[loss=0.0145, audio_tagging_loss=0.0145, over 4833136.97 frames. ], batch size: 100, lr: 6.89e-03, grad_scale: 32.0 2023-12-22 07:59:12,458 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.26 vs. limit=22.5 2023-12-22 07:59:21,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=481666.6666666667, ans=0.1 2023-12-22 07:59:25,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=481733.3333333333, ans=0.0 2023-12-22 07:59:45,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=481866.6666666667, ans=0.2 2023-12-22 07:59:45,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=481866.6666666667, ans=0.125 2023-12-22 07:59:56,145 INFO [train.py:886] (1/4) Epoch 16, batch 800, loss[loss=0.01561, audio_tagging_loss=0.01561, over 25000.00 frames. ], tot_loss[loss=0.01436, audio_tagging_loss=0.01436, over 4859097.19 frames. ], batch size: 100, lr: 6.89e-03, grad_scale: 32.0 2023-12-22 08:00:02,820 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.26 vs. limit=15.0 2023-12-22 08:00:27,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=482133.3333333333, ans=0.125 2023-12-22 08:00:32,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=482133.3333333333, ans=0.0 2023-12-22 08:00:42,819 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.523e+01 2.804e+01 2.918e+01 3.049e+01 4.122e+01, threshold=5.837e+01, percent-clipped=0.0 2023-12-22 08:00:43,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=482200.0, ans=0.125 2023-12-22 08:00:49,187 INFO [train.py:886] (1/4) Epoch 16, batch 850, loss[loss=0.01324, audio_tagging_loss=0.01324, over 25000.00 frames. ], tot_loss[loss=0.01431, audio_tagging_loss=0.01431, over 4871807.11 frames. ], batch size: 100, lr: 6.89e-03, grad_scale: 32.0 2023-12-22 08:00:53,448 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.04 vs. limit=15.0 2023-12-22 08:00:54,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=482266.6666666667, ans=0.0 2023-12-22 08:00:55,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=482266.6666666667, ans=0.125 2023-12-22 08:01:13,896 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.98 vs. limit=22.5 2023-12-22 08:01:25,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=482466.6666666667, ans=0.0 2023-12-22 08:01:31,254 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.38 vs. limit=15.0 2023-12-22 08:01:39,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=482600.0, ans=0.0 2023-12-22 08:01:40,643 INFO [train.py:886] (1/4) Epoch 16, batch 900, loss[loss=0.0156, audio_tagging_loss=0.0156, over 24750.00 frames. ], tot_loss[loss=0.0144, audio_tagging_loss=0.0144, over 4896317.16 frames. ], batch size: 99, lr: 6.89e-03, grad_scale: 32.0 2023-12-22 08:02:01,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=482733.3333333333, ans=0.125 2023-12-22 08:02:26,294 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.491e+01 2.779e+01 2.893e+01 3.122e+01 3.718e+01, threshold=5.785e+01, percent-clipped=0.0 2023-12-22 08:02:31,928 INFO [train.py:886] (1/4) Epoch 16, batch 950, loss[loss=0.01391, audio_tagging_loss=0.01391, over 24750.00 frames. ], tot_loss[loss=0.01444, audio_tagging_loss=0.01444, over 4900770.01 frames. ], batch size: 99, lr: 6.88e-03, grad_scale: 32.0 2023-12-22 08:02:41,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=483000.0, ans=0.2 2023-12-22 08:02:41,779 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.77 vs. limit=6.0 2023-12-22 08:02:42,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=483000.0, ans=0.2 2023-12-22 08:03:03,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=483133.3333333333, ans=0.0 2023-12-22 08:03:04,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=483133.3333333333, ans=0.125 2023-12-22 08:03:04,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.72 vs. limit=15.0 2023-12-22 08:03:13,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=483200.0, ans=0.2 2023-12-22 08:03:17,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=483200.0, ans=0.125 2023-12-22 08:03:25,266 INFO [train.py:886] (1/4) Epoch 16, batch 1000, loss[loss=0.01483, audio_tagging_loss=0.01483, over 24750.00 frames. ], tot_loss[loss=0.01441, audio_tagging_loss=0.01441, over 4911027.28 frames. ], batch size: 99, lr: 6.88e-03, grad_scale: 32.0 2023-12-22 08:03:28,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=483266.6666666667, ans=0.0 2023-12-22 08:04:05,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=483533.3333333333, ans=0.125 2023-12-22 08:04:09,566 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.390e+01 2.722e+01 2.865e+01 3.109e+01 3.615e+01, threshold=5.730e+01, percent-clipped=0.0 2023-12-22 08:04:15,959 INFO [train.py:886] (1/4) Epoch 16, batch 1050, loss[loss=0.01412, audio_tagging_loss=0.01412, over 25000.00 frames. ], tot_loss[loss=0.01428, audio_tagging_loss=0.01428, over 4922009.17 frames. ], batch size: 100, lr: 6.88e-03, grad_scale: 32.0 2023-12-22 08:04:19,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=483600.0, ans=0.2 2023-12-22 08:04:26,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=483666.6666666667, ans=0.125 2023-12-22 08:04:35,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=483666.6666666667, ans=0.125 2023-12-22 08:04:41,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=483733.3333333333, ans=0.125 2023-12-22 08:04:53,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=483800.0, ans=0.2 2023-12-22 08:05:02,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=483866.6666666667, ans=0.125 2023-12-22 08:05:03,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=483866.6666666667, ans=0.125 2023-12-22 08:05:03,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=483866.6666666667, ans=0.2 2023-12-22 08:05:08,399 INFO [train.py:886] (1/4) Epoch 16, batch 1100, loss[loss=0.01114, audio_tagging_loss=0.01114, over 24750.00 frames. ], tot_loss[loss=0.01411, audio_tagging_loss=0.01411, over 4928304.14 frames. ], batch size: 99, lr: 6.88e-03, grad_scale: 32.0 2023-12-22 08:05:10,710 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.57 vs. limit=15.0 2023-12-22 08:05:12,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=483933.3333333333, ans=0.0 2023-12-22 08:05:12,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=483933.3333333333, ans=0.2 2023-12-22 08:05:16,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=483933.3333333333, ans=0.0 2023-12-22 08:05:20,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=484000.0, ans=0.1 2023-12-22 08:05:24,003 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2023-12-22 08:05:36,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=484066.6666666667, ans=0.125 2023-12-22 08:05:43,119 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.38 vs. limit=15.0 2023-12-22 08:05:53,773 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.525e+01 2.808e+01 2.906e+01 3.022e+01 3.549e+01, threshold=5.811e+01, percent-clipped=0.0 2023-12-22 08:05:59,465 INFO [train.py:886] (1/4) Epoch 16, batch 1150, loss[loss=0.01397, audio_tagging_loss=0.01397, over 25000.00 frames. ], tot_loss[loss=0.01413, audio_tagging_loss=0.01413, over 4934992.98 frames. ], batch size: 100, lr: 6.87e-03, grad_scale: 32.0 2023-12-22 08:06:03,378 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.43 vs. limit=15.0 2023-12-22 08:06:08,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=484266.6666666667, ans=0.125 2023-12-22 08:06:14,177 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=24.85 vs. limit=22.5 2023-12-22 08:06:17,764 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.009e-02 2023-12-22 08:06:21,526 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 08:06:21,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=484400.0, ans=0.125 2023-12-22 08:06:38,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=484466.6666666667, ans=0.05 2023-12-22 08:06:51,520 INFO [train.py:886] (1/4) Epoch 16, batch 1200, loss[loss=0.01385, audio_tagging_loss=0.01385, over 24750.00 frames. ], tot_loss[loss=0.01412, audio_tagging_loss=0.01412, over 4936103.09 frames. ], batch size: 99, lr: 6.87e-03, grad_scale: 32.0 2023-12-22 08:06:57,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=484600.0, ans=0.0 2023-12-22 08:07:08,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=484666.6666666667, ans=0.0 2023-12-22 08:07:13,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=484733.3333333333, ans=0.2 2023-12-22 08:07:17,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=484733.3333333333, ans=15.0 2023-12-22 08:07:20,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=484733.3333333333, ans=0.025 2023-12-22 08:07:24,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=484800.0, ans=0.0 2023-12-22 08:07:24,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=484800.0, ans=0.07 2023-12-22 08:07:25,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=484800.0, ans=0.125 2023-12-22 08:07:26,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=484800.0, ans=0.07 2023-12-22 08:07:34,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=484866.6666666667, ans=0.07 2023-12-22 08:07:35,752 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.439e+01 2.792e+01 2.945e+01 3.119e+01 3.482e+01, threshold=5.890e+01, percent-clipped=0.0 2023-12-22 08:07:37,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=484866.6666666667, ans=0.125 2023-12-22 08:07:38,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=484866.6666666667, ans=0.125 2023-12-22 08:07:40,752 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.04 vs. limit=22.5 2023-12-22 08:07:41,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=484933.3333333333, ans=0.0 2023-12-22 08:07:42,917 INFO [train.py:886] (1/4) Epoch 16, batch 1250, loss[loss=0.01501, audio_tagging_loss=0.01501, over 24750.00 frames. ], tot_loss[loss=0.01429, audio_tagging_loss=0.01429, over 4934220.27 frames. ], batch size: 99, lr: 6.87e-03, grad_scale: 32.0 2023-12-22 08:07:44,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=484933.3333333333, ans=0.125 2023-12-22 08:08:04,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=485066.6666666667, ans=0.0 2023-12-22 08:08:05,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=485066.6666666667, ans=0.0 2023-12-22 08:08:33,974 INFO [train.py:886] (1/4) Epoch 16, batch 1300, loss[loss=0.01646, audio_tagging_loss=0.01646, over 24750.00 frames. ], tot_loss[loss=0.0144, audio_tagging_loss=0.0144, over 4937414.73 frames. ], batch size: 99, lr: 6.87e-03, grad_scale: 32.0 2023-12-22 08:09:05,947 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.16 vs. limit=22.5 2023-12-22 08:09:15,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=485533.3333333333, ans=10.0 2023-12-22 08:09:15,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=485533.3333333333, ans=0.125 2023-12-22 08:09:17,900 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.15 vs. limit=15.0 2023-12-22 08:09:19,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=485533.3333333333, ans=0.0 2023-12-22 08:09:20,877 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+01 2.809e+01 2.938e+01 3.092e+01 3.584e+01, threshold=5.876e+01, percent-clipped=0.0 2023-12-22 08:09:23,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=485533.3333333333, ans=0.2 2023-12-22 08:09:26,555 INFO [train.py:886] (1/4) Epoch 16, batch 1350, loss[loss=0.01064, audio_tagging_loss=0.01064, over 25000.00 frames. ], tot_loss[loss=0.01426, audio_tagging_loss=0.01426, over 4937995.62 frames. ], batch size: 100, lr: 6.87e-03, grad_scale: 32.0 2023-12-22 08:09:31,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=485600.0, ans=0.0 2023-12-22 08:09:39,971 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.47 vs. limit=22.5 2023-12-22 08:09:40,888 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.83 vs. limit=15.0 2023-12-22 08:10:05,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=485800.0, ans=0.2 2023-12-22 08:10:13,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=485866.6666666667, ans=0.0 2023-12-22 08:10:14,117 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.37 vs. limit=22.5 2023-12-22 08:10:18,064 INFO [train.py:886] (1/4) Epoch 16, batch 1400, loss[loss=0.01193, audio_tagging_loss=0.01193, over 25000.00 frames. ], tot_loss[loss=0.01419, audio_tagging_loss=0.01419, over 4942520.05 frames. ], batch size: 100, lr: 6.86e-03, grad_scale: 32.0 2023-12-22 08:10:30,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=486000.0, ans=0.1 2023-12-22 08:10:46,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=486066.6666666667, ans=0.125 2023-12-22 08:11:03,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=486200.0, ans=0.125 2023-12-22 08:11:04,395 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.416e+01 2.787e+01 2.895e+01 3.084e+01 3.427e+01, threshold=5.790e+01, percent-clipped=0.0 2023-12-22 08:11:10,102 INFO [train.py:886] (1/4) Epoch 16, batch 1450, loss[loss=0.01389, audio_tagging_loss=0.01389, over 25000.00 frames. ], tot_loss[loss=0.01419, audio_tagging_loss=0.01419, over 4949383.06 frames. ], batch size: 100, lr: 6.86e-03, grad_scale: 32.0 2023-12-22 08:11:24,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=486333.3333333333, ans=0.125 2023-12-22 08:11:35,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0 2023-12-22 08:11:36,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=486400.0, ans=0.125 2023-12-22 08:11:50,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=486533.3333333333, ans=0.0 2023-12-22 08:12:01,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=486600.0, ans=0.125 2023-12-22 08:12:02,648 INFO [train.py:886] (1/4) Epoch 16, batch 1500, loss[loss=0.0139, audio_tagging_loss=0.0139, over 25000.00 frames. ], tot_loss[loss=0.01422, audio_tagging_loss=0.01422, over 4953431.30 frames. ], batch size: 100, lr: 6.86e-03, grad_scale: 32.0 2023-12-22 08:12:04,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=486600.0, ans=0.2 2023-12-22 08:12:06,116 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.84 vs. limit=15.0 2023-12-22 08:12:34,018 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.95 vs. limit=22.5 2023-12-22 08:12:40,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=486800.0, ans=0.125 2023-12-22 08:12:48,469 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.524e+01 2.830e+01 2.960e+01 3.074e+01 3.564e+01, threshold=5.919e+01, percent-clipped=0.0 2023-12-22 08:12:54,857 INFO [train.py:886] (1/4) Epoch 16, batch 1550, loss[loss=0.01501, audio_tagging_loss=0.01501, over 24750.00 frames. ], tot_loss[loss=0.01425, audio_tagging_loss=0.01425, over 4947804.72 frames. ], batch size: 99, lr: 6.86e-03, grad_scale: 32.0 2023-12-22 08:13:15,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=487066.6666666667, ans=0.0 2023-12-22 08:13:19,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=487066.6666666667, ans=0.0 2023-12-22 08:13:27,697 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.61 vs. limit=10.0 2023-12-22 08:13:44,063 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.75 vs. limit=15.0 2023-12-22 08:13:47,238 INFO [train.py:886] (1/4) Epoch 16, batch 1600, loss[loss=0.0123, audio_tagging_loss=0.0123, over 24750.00 frames. ], tot_loss[loss=0.01433, audio_tagging_loss=0.01433, over 4944538.13 frames. ], batch size: 99, lr: 6.85e-03, grad_scale: 32.0 2023-12-22 08:13:57,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=487333.3333333333, ans=0.2 2023-12-22 08:14:32,412 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.459e+01 2.798e+01 2.942e+01 3.066e+01 3.582e+01, threshold=5.884e+01, percent-clipped=0.0 2023-12-22 08:14:38,774 INFO [train.py:886] (1/4) Epoch 16, batch 1650, loss[loss=0.01313, audio_tagging_loss=0.01313, over 24750.00 frames. ], tot_loss[loss=0.01428, audio_tagging_loss=0.01428, over 4949380.64 frames. ], batch size: 99, lr: 6.85e-03, grad_scale: 32.0 2023-12-22 08:14:43,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=487600.0, ans=0.125 2023-12-22 08:15:02,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=487733.3333333333, ans=0.0 2023-12-22 08:15:25,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=487866.6666666667, ans=0.1 2023-12-22 08:15:26,355 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.37 vs. limit=15.0 2023-12-22 08:15:31,043 INFO [train.py:886] (1/4) Epoch 16, batch 1700, loss[loss=0.01177, audio_tagging_loss=0.01177, over 24750.00 frames. ], tot_loss[loss=0.01419, audio_tagging_loss=0.01419, over 4953542.85 frames. ], batch size: 99, lr: 6.85e-03, grad_scale: 32.0 2023-12-22 08:16:01,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.26 vs. limit=15.0 2023-12-22 08:16:05,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=488133.3333333333, ans=0.2 2023-12-22 08:16:11,469 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.11 vs. limit=15.0 2023-12-22 08:16:16,268 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.428e+01 2.752e+01 2.880e+01 3.018e+01 3.839e+01, threshold=5.760e+01, percent-clipped=0.0 2023-12-22 08:16:17,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=488200.0, ans=0.125 2023-12-22 08:16:21,336 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.55 vs. limit=6.0 2023-12-22 08:16:22,697 INFO [train.py:886] (1/4) Epoch 16, batch 1750, loss[loss=0.01159, audio_tagging_loss=0.01159, over 21792.00 frames. ], tot_loss[loss=0.01408, audio_tagging_loss=0.01408, over 4954845.35 frames. ], batch size: 107, lr: 6.85e-03, grad_scale: 32.0 2023-12-22 08:16:24,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=488266.6666666667, ans=0.125 2023-12-22 08:16:33,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=488333.3333333333, ans=0.95 2023-12-22 08:16:42,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=488400.0, ans=0.2 2023-12-22 08:16:44,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=488400.0, ans=0.0 2023-12-22 08:16:49,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.27 vs. limit=6.0 2023-12-22 08:17:13,906 INFO [train.py:886] (1/4) Epoch 16, batch 1800, loss[loss=0.01501, audio_tagging_loss=0.01501, over 25000.00 frames. ], tot_loss[loss=0.01403, audio_tagging_loss=0.01403, over 4956617.37 frames. ], batch size: 100, lr: 6.84e-03, grad_scale: 32.0 2023-12-22 08:17:31,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=488666.6666666667, ans=0.2 2023-12-22 08:17:32,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=488666.6666666667, ans=0.125 2023-12-22 08:17:33,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=488733.3333333333, ans=0.125 2023-12-22 08:17:43,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=488733.3333333333, ans=0.5 2023-12-22 08:17:46,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=488800.0, ans=0.0 2023-12-22 08:17:48,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=488800.0, ans=0.0 2023-12-22 08:17:58,165 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.41 vs. limit=15.0 2023-12-22 08:17:59,523 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+01 2.777e+01 2.946e+01 3.090e+01 3.583e+01, threshold=5.892e+01, percent-clipped=0.0 2023-12-22 08:18:05,226 INFO [train.py:886] (1/4) Epoch 16, batch 1850, loss[loss=0.01377, audio_tagging_loss=0.01377, over 25000.00 frames. ], tot_loss[loss=0.01414, audio_tagging_loss=0.01414, over 4955053.45 frames. ], batch size: 100, lr: 6.84e-03, grad_scale: 32.0 2023-12-22 08:18:08,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=488933.3333333333, ans=0.0 2023-12-22 08:18:13,754 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.94 vs. limit=15.0 2023-12-22 08:18:37,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=489133.3333333333, ans=0.0 2023-12-22 08:18:57,691 INFO [train.py:886] (1/4) Epoch 16, batch 1900, loss[loss=0.01771, audio_tagging_loss=0.01771, over 24750.00 frames. ], tot_loss[loss=0.01429, audio_tagging_loss=0.01429, over 4953332.08 frames. ], batch size: 99, lr: 6.84e-03, grad_scale: 32.0 2023-12-22 08:19:43,101 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.607e+01 2.822e+01 2.946e+01 3.143e+01 3.927e+01, threshold=5.893e+01, percent-clipped=0.0 2023-12-22 08:19:49,453 INFO [train.py:886] (1/4) Epoch 16, batch 1950, loss[loss=0.01602, audio_tagging_loss=0.01602, over 25000.00 frames. ], tot_loss[loss=0.01426, audio_tagging_loss=0.01426, over 4951383.34 frames. ], batch size: 100, lr: 6.84e-03, grad_scale: 32.0 2023-12-22 08:19:51,541 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 08:20:04,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=489666.6666666667, ans=0.1 2023-12-22 08:20:20,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=489800.0, ans=0.0 2023-12-22 08:20:24,227 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 08:20:24,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=489800.0, ans=0.0 2023-12-22 08:20:41,330 INFO [train.py:886] (1/4) Epoch 16, batch 2000, loss[loss=0.01343, audio_tagging_loss=0.01343, over 25000.00 frames. ], tot_loss[loss=0.0142, audio_tagging_loss=0.0142, over 4952993.69 frames. ], batch size: 100, lr: 6.83e-03, grad_scale: 64.0 2023-12-22 08:20:54,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=490000.0, ans=0.125 2023-12-22 08:20:54,770 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=15.0 2023-12-22 08:21:01,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=490066.6666666667, ans=0.09899494936611666 2023-12-22 08:21:02,772 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.88 vs. limit=15.0 2023-12-22 08:21:10,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=490066.6666666667, ans=0.0 2023-12-22 08:21:12,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=490133.3333333333, ans=0.025 2023-12-22 08:21:24,788 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.62 vs. limit=15.0 2023-12-22 08:21:26,211 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+01 2.757e+01 2.905e+01 3.071e+01 3.430e+01, threshold=5.811e+01, percent-clipped=0.0 2023-12-22 08:21:33,348 INFO [train.py:886] (1/4) Epoch 16, batch 2050, loss[loss=0.01513, audio_tagging_loss=0.01513, over 24750.00 frames. ], tot_loss[loss=0.01408, audio_tagging_loss=0.01408, over 4948803.97 frames. ], batch size: 99, lr: 6.83e-03, grad_scale: 64.0 2023-12-22 08:21:47,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=490333.3333333333, ans=0.125 2023-12-22 08:21:52,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=490400.0, ans=0.1 2023-12-22 08:21:57,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=490400.0, ans=0.2 2023-12-22 08:21:58,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=490400.0, ans=0.1 2023-12-22 08:22:11,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=490466.6666666667, ans=0.125 2023-12-22 08:22:18,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.72 vs. limit=22.5 2023-12-22 08:22:20,276 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.51 vs. limit=6.0 2023-12-22 08:22:23,482 INFO [train.py:886] (1/4) Epoch 16, batch 2100, loss[loss=0.01309, audio_tagging_loss=0.01309, over 25000.00 frames. ], tot_loss[loss=0.01397, audio_tagging_loss=0.01397, over 4951553.76 frames. ], batch size: 100, lr: 6.83e-03, grad_scale: 64.0 2023-12-22 08:22:28,273 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.71 vs. limit=15.0 2023-12-22 08:22:31,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=490600.0, ans=0.125 2023-12-22 08:22:31,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=490600.0, ans=0.1 2023-12-22 08:22:54,143 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.84 vs. limit=12.0 2023-12-22 08:23:09,917 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.389e+01 2.753e+01 2.940e+01 3.068e+01 3.569e+01, threshold=5.880e+01, percent-clipped=0.0 2023-12-22 08:23:16,231 INFO [train.py:886] (1/4) Epoch 16, batch 2150, loss[loss=0.01617, audio_tagging_loss=0.01617, over 24950.00 frames. ], tot_loss[loss=0.0141, audio_tagging_loss=0.0141, over 4951898.88 frames. ], batch size: 100, lr: 6.83e-03, grad_scale: 64.0 2023-12-22 08:23:16,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=490933.3333333333, ans=0.0 2023-12-22 08:23:18,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=490933.3333333333, ans=0.0 2023-12-22 08:23:24,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=491000.0, ans=0.125 2023-12-22 08:23:27,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=491000.0, ans=0.125 2023-12-22 08:24:01,118 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.35 vs. limit=6.0 2023-12-22 08:24:07,025 INFO [train.py:886] (1/4) Epoch 16, batch 2200, loss[loss=0.01286, audio_tagging_loss=0.01286, over 24750.00 frames. ], tot_loss[loss=0.01427, audio_tagging_loss=0.01427, over 4949931.95 frames. ], batch size: 99, lr: 6.83e-03, grad_scale: 64.0 2023-12-22 08:24:12,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=491266.6666666667, ans=0.05 2023-12-22 08:24:14,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=491266.6666666667, ans=0.04949747468305833 2023-12-22 08:24:20,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=491333.3333333333, ans=0.0 2023-12-22 08:24:28,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=491400.0, ans=0.125 2023-12-22 08:24:33,407 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.70 vs. limit=10.0 2023-12-22 08:24:41,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=491466.6666666667, ans=0.5 2023-12-22 08:24:49,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=491533.3333333333, ans=0.0 2023-12-22 08:24:50,885 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.38 vs. limit=15.0 2023-12-22 08:24:53,268 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.428e+01 2.816e+01 2.901e+01 3.052e+01 3.527e+01, threshold=5.802e+01, percent-clipped=0.0 2023-12-22 08:24:58,964 INFO [train.py:886] (1/4) Epoch 16, batch 2250, loss[loss=0.01224, audio_tagging_loss=0.01224, over 24750.00 frames. ], tot_loss[loss=0.01429, audio_tagging_loss=0.01429, over 4945678.48 frames. ], batch size: 99, lr: 6.82e-03, grad_scale: 64.0 2023-12-22 08:25:04,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=491600.0, ans=0.125 2023-12-22 08:25:43,780 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 08:25:48,693 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.98 vs. limit=22.5 2023-12-22 08:25:50,140 INFO [train.py:886] (1/4) Epoch 16, batch 2300, loss[loss=0.01546, audio_tagging_loss=0.01546, over 25000.00 frames. ], tot_loss[loss=0.01422, audio_tagging_loss=0.01422, over 4946846.03 frames. ], batch size: 100, lr: 6.82e-03, grad_scale: 64.0 2023-12-22 08:26:08,498 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.14 vs. limit=15.0 2023-12-22 08:26:11,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=492066.6666666667, ans=0.0 2023-12-22 08:26:11,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=492066.6666666667, ans=0.125 2023-12-22 08:26:16,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=492066.6666666667, ans=0.0 2023-12-22 08:26:21,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=492133.3333333333, ans=0.125 2023-12-22 08:26:35,517 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.473e+01 2.719e+01 2.858e+01 3.036e+01 3.603e+01, threshold=5.716e+01, percent-clipped=0.0 2023-12-22 08:26:36,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=492200.0, ans=0.125 2023-12-22 08:26:41,330 INFO [train.py:886] (1/4) Epoch 16, batch 2350, loss[loss=0.01566, audio_tagging_loss=0.01566, over 24750.00 frames. ], tot_loss[loss=0.01417, audio_tagging_loss=0.01417, over 4950788.37 frames. ], batch size: 99, lr: 6.82e-03, grad_scale: 64.0 2023-12-22 08:26:41,615 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 08:26:46,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=492266.6666666667, ans=0.125 2023-12-22 08:26:46,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=492266.6666666667, ans=0.1 2023-12-22 08:27:06,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=492400.0, ans=0.1 2023-12-22 08:27:10,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=492400.0, ans=0.125 2023-12-22 08:27:11,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=492466.6666666667, ans=0.2 2023-12-22 08:27:19,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.86 vs. limit=15.0 2023-12-22 08:27:29,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=492533.3333333333, ans=0.0 2023-12-22 08:27:33,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=492600.0, ans=0.125 2023-12-22 08:27:34,565 INFO [train.py:886] (1/4) Epoch 16, batch 2400, loss[loss=0.01459, audio_tagging_loss=0.01459, over 22759.00 frames. ], tot_loss[loss=0.01419, audio_tagging_loss=0.01419, over 4946993.43 frames. ], batch size: 107, lr: 6.82e-03, grad_scale: 64.0 2023-12-22 08:27:38,059 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=15.0 2023-12-22 08:27:41,398 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 08:27:45,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=492666.6666666667, ans=0.2 2023-12-22 08:27:48,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=492666.6666666667, ans=0.125 2023-12-22 08:27:50,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=492666.6666666667, ans=0.1 2023-12-22 08:27:54,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=492733.3333333333, ans=0.125 2023-12-22 08:28:00,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=492733.3333333333, ans=0.125 2023-12-22 08:28:18,835 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.512e+01 2.788e+01 2.902e+01 3.058e+01 4.154e+01, threshold=5.803e+01, percent-clipped=0.0 2023-12-22 08:28:25,287 INFO [train.py:886] (1/4) Epoch 16, batch 2450, loss[loss=0.01572, audio_tagging_loss=0.01572, over 25000.00 frames. ], tot_loss[loss=0.01419, audio_tagging_loss=0.01419, over 4950669.05 frames. ], batch size: 100, lr: 6.81e-03, grad_scale: 64.0 2023-12-22 08:28:37,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=493000.0, ans=0.125 2023-12-22 08:28:46,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=493066.6666666667, ans=0.2 2023-12-22 08:28:49,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.43 vs. limit=15.0 2023-12-22 08:28:55,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=493133.3333333333, ans=0.2 2023-12-22 08:29:03,608 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.75 vs. limit=22.5 2023-12-22 08:29:07,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=493200.0, ans=0.125 2023-12-22 08:29:11,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=493200.0, ans=0.125 2023-12-22 08:29:17,213 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.10 vs. limit=15.0 2023-12-22 08:29:17,814 INFO [train.py:886] (1/4) Epoch 16, batch 2500, loss[loss=0.014, audio_tagging_loss=0.014, over 25000.00 frames. ], tot_loss[loss=0.01428, audio_tagging_loss=0.01428, over 4948123.84 frames. ], batch size: 100, lr: 6.81e-03, grad_scale: 64.0 2023-12-22 08:29:18,152 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.21 vs. limit=12.0 2023-12-22 08:29:18,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=493266.6666666667, ans=0.0 2023-12-22 08:29:34,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=493333.3333333333, ans=0.1 2023-12-22 08:29:42,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=493400.0, ans=0.0 2023-12-22 08:29:49,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=493466.6666666667, ans=0.0 2023-12-22 08:30:03,600 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.509e+01 2.809e+01 2.998e+01 3.086e+01 4.150e+01, threshold=5.995e+01, percent-clipped=0.0 2023-12-22 08:30:03,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=493533.3333333333, ans=0.1 2023-12-22 08:30:09,930 INFO [train.py:886] (1/4) Epoch 16, batch 2550, loss[loss=0.01652, audio_tagging_loss=0.01652, over 24750.00 frames. ], tot_loss[loss=0.01429, audio_tagging_loss=0.01429, over 4941380.51 frames. ], batch size: 99, lr: 6.81e-03, grad_scale: 64.0 2023-12-22 08:30:19,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=493666.6666666667, ans=0.125 2023-12-22 08:30:23,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=493666.6666666667, ans=0.0 2023-12-22 08:31:01,090 INFO [train.py:886] (1/4) Epoch 16, batch 2600, loss[loss=0.01378, audio_tagging_loss=0.01378, over 25000.00 frames. ], tot_loss[loss=0.01422, audio_tagging_loss=0.01422, over 4944955.48 frames. ], batch size: 100, lr: 6.81e-03, grad_scale: 64.0 2023-12-22 08:31:03,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=493933.3333333333, ans=0.125 2023-12-22 08:31:19,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=494000.0, ans=0.2 2023-12-22 08:31:22,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=494066.6666666667, ans=0.1 2023-12-22 08:31:24,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=494066.6666666667, ans=0.125 2023-12-22 08:31:27,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=494066.6666666667, ans=0.125 2023-12-22 08:31:32,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=494133.3333333333, ans=0.04949747468305833 2023-12-22 08:31:41,712 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.68 vs. limit=15.0 2023-12-22 08:31:46,509 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.522e+01 2.802e+01 2.926e+01 3.072e+01 4.080e+01, threshold=5.852e+01, percent-clipped=0.0 2023-12-22 08:31:48,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=494200.0, ans=0.125 2023-12-22 08:31:52,135 INFO [train.py:886] (1/4) Epoch 16, batch 2650, loss[loss=0.01392, audio_tagging_loss=0.01392, over 25000.00 frames. ], tot_loss[loss=0.01413, audio_tagging_loss=0.01413, over 4948114.83 frames. ], batch size: 100, lr: 6.81e-03, grad_scale: 64.0 2023-12-22 08:31:55,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=494266.6666666667, ans=0.0 2023-12-22 08:32:08,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=494333.3333333333, ans=10.0 2023-12-22 08:32:34,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=494533.3333333333, ans=0.0 2023-12-22 08:32:44,385 INFO [train.py:886] (1/4) Epoch 16, batch 2700, loss[loss=0.01321, audio_tagging_loss=0.01321, over 24750.00 frames. ], tot_loss[loss=0.01407, audio_tagging_loss=0.01407, over 4951147.88 frames. ], batch size: 99, lr: 6.80e-03, grad_scale: 64.0 2023-12-22 08:33:17,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=494800.0, ans=0.2 2023-12-22 08:33:29,569 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.420e+01 2.769e+01 2.925e+01 3.079e+01 4.143e+01, threshold=5.850e+01, percent-clipped=0.0 2023-12-22 08:33:30,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=494866.6666666667, ans=0.015 2023-12-22 08:33:35,989 INFO [train.py:886] (1/4) Epoch 16, batch 2750, loss[loss=0.01583, audio_tagging_loss=0.01583, over 25000.00 frames. ], tot_loss[loss=0.01416, audio_tagging_loss=0.01416, over 4957088.41 frames. ], batch size: 100, lr: 6.80e-03, grad_scale: 64.0 2023-12-22 08:33:58,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=495066.6666666667, ans=10.0 2023-12-22 08:34:04,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=495066.6666666667, ans=0.0 2023-12-22 08:34:15,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=495133.3333333333, ans=0.125 2023-12-22 08:34:25,881 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.64 vs. limit=6.0 2023-12-22 08:34:28,195 INFO [train.py:886] (1/4) Epoch 16, batch 2800, loss[loss=0.01137, audio_tagging_loss=0.01137, over 24750.00 frames. ], tot_loss[loss=0.01415, audio_tagging_loss=0.01415, over 4950015.50 frames. ], batch size: 99, lr: 6.80e-03, grad_scale: 64.0 2023-12-22 08:34:44,310 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.47 vs. limit=10.0 2023-12-22 08:34:51,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=495400.0, ans=0.0 2023-12-22 08:35:13,350 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.512e+01 2.809e+01 2.977e+01 3.124e+01 3.459e+01, threshold=5.955e+01, percent-clipped=0.0 2023-12-22 08:35:19,684 INFO [train.py:886] (1/4) Epoch 16, batch 2850, loss[loss=0.01454, audio_tagging_loss=0.01454, over 24750.00 frames. ], tot_loss[loss=0.01425, audio_tagging_loss=0.01425, over 4946824.55 frames. ], batch size: 99, lr: 6.80e-03, grad_scale: 64.0 2023-12-22 08:35:19,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=495600.0, ans=0.125 2023-12-22 08:35:41,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=495733.3333333333, ans=0.125 2023-12-22 08:35:50,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=495800.0, ans=0.125 2023-12-22 08:35:59,356 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.66 vs. limit=15.0 2023-12-22 08:36:02,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=495866.6666666667, ans=0.125 2023-12-22 08:36:11,115 INFO [train.py:886] (1/4) Epoch 16, batch 2900, loss[loss=0.01203, audio_tagging_loss=0.01203, over 24750.00 frames. ], tot_loss[loss=0.01424, audio_tagging_loss=0.01424, over 4942958.23 frames. ], batch size: 99, lr: 6.79e-03, grad_scale: 64.0 2023-12-22 08:36:17,990 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.59 vs. limit=15.0 2023-12-22 08:36:34,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=496066.6666666667, ans=0.0 2023-12-22 08:36:56,680 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.420e+01 2.734e+01 2.897e+01 3.006e+01 3.535e+01, threshold=5.795e+01, percent-clipped=0.0 2023-12-22 08:37:03,077 INFO [train.py:886] (1/4) Epoch 16, batch 2950, loss[loss=0.01291, audio_tagging_loss=0.01291, over 24750.00 frames. ], tot_loss[loss=0.01407, audio_tagging_loss=0.01407, over 4949464.79 frames. ], batch size: 99, lr: 6.79e-03, grad_scale: 64.0 2023-12-22 08:37:06,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=496266.6666666667, ans=0.015 2023-12-22 08:37:07,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=496266.6666666667, ans=15.0 2023-12-22 08:37:08,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=496266.6666666667, ans=0.2 2023-12-22 08:37:14,745 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.14 vs. limit=10.0 2023-12-22 08:37:26,879 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=15.0 2023-12-22 08:37:35,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=496466.6666666667, ans=0.09899494936611666 2023-12-22 08:37:39,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=496466.6666666667, ans=0.125 2023-12-22 08:37:54,008 INFO [train.py:886] (1/4) Epoch 16, batch 3000, loss[loss=0.01261, audio_tagging_loss=0.01261, over 25000.00 frames. ], tot_loss[loss=0.01397, audio_tagging_loss=0.01397, over 4950821.71 frames. ], batch size: 100, lr: 6.79e-03, grad_scale: 64.0 2023-12-22 08:37:54,008 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 08:38:13,779 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3136, 4.5794, 5.1467, 4.7107], device='cuda:1') 2023-12-22 08:38:14,835 INFO [train.py:917] (1/4) Epoch 16, validation: loss=0.0344, audio_tagging_loss=0.0344, over 3737520.00 frames. 2023-12-22 08:38:14,836 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 08:38:20,364 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 08:38:32,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=496666.6666666667, ans=0.125 2023-12-22 08:38:37,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=496733.3333333333, ans=0.0 2023-12-22 08:38:41,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=496733.3333333333, ans=0.0 2023-12-22 08:39:01,434 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.383e+01 2.766e+01 2.892e+01 3.038e+01 3.392e+01, threshold=5.783e+01, percent-clipped=0.0 2023-12-22 08:39:03,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=496866.6666666667, ans=0.125 2023-12-22 08:39:07,802 INFO [train.py:886] (1/4) Epoch 16, batch 3050, loss[loss=0.01413, audio_tagging_loss=0.01413, over 24750.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4955162.45 frames. ], batch size: 99, lr: 6.79e-03, grad_scale: 64.0 2023-12-22 08:39:20,476 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 08:39:28,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=497066.6666666667, ans=0.0 2023-12-22 08:39:28,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=497066.6666666667, ans=0.025 2023-12-22 08:39:32,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=497066.6666666667, ans=0.2 2023-12-22 08:39:35,130 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.77 vs. limit=12.0 2023-12-22 08:39:51,740 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.50 vs. limit=10.0 2023-12-22 08:39:56,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=497200.0, ans=0.125 2023-12-22 08:39:58,678 INFO [train.py:886] (1/4) Epoch 16, batch 3100, loss[loss=0.01406, audio_tagging_loss=0.01406, over 24750.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4954499.95 frames. ], batch size: 99, lr: 6.78e-03, grad_scale: 64.0 2023-12-22 08:40:12,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=497333.3333333333, ans=0.125 2023-12-22 08:40:23,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=497400.0, ans=0.04949747468305833 2023-12-22 08:40:25,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=497400.0, ans=0.0 2023-12-22 08:40:45,395 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.521e+01 2.809e+01 2.944e+01 3.089e+01 3.657e+01, threshold=5.888e+01, percent-clipped=0.0 2023-12-22 08:40:51,056 INFO [train.py:886] (1/4) Epoch 16, batch 3150, loss[loss=0.01275, audio_tagging_loss=0.01275, over 24750.00 frames. ], tot_loss[loss=0.01402, audio_tagging_loss=0.01402, over 4947656.67 frames. ], batch size: 99, lr: 6.78e-03, grad_scale: 64.0 2023-12-22 08:40:54,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=497600.0, ans=0.07 2023-12-22 08:40:56,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=497600.0, ans=0.05 2023-12-22 08:40:59,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=497666.6666666667, ans=0.125 2023-12-22 08:40:59,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=497666.6666666667, ans=15.0 2023-12-22 08:41:14,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=497733.3333333333, ans=0.1 2023-12-22 08:41:27,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=497800.0, ans=0.0 2023-12-22 08:41:39,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=497866.6666666667, ans=0.125 2023-12-22 08:41:42,568 INFO [train.py:886] (1/4) Epoch 16, batch 3200, loss[loss=0.0154, audio_tagging_loss=0.0154, over 25000.00 frames. ], tot_loss[loss=0.01414, audio_tagging_loss=0.01414, over 4945162.14 frames. ], batch size: 100, lr: 6.78e-03, grad_scale: 64.0 2023-12-22 08:41:46,743 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.47 vs. limit=12.0 2023-12-22 08:42:00,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=498000.0, ans=0.1 2023-12-22 08:42:26,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=498200.0, ans=0.0 2023-12-22 08:42:27,664 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.481e+01 2.733e+01 2.858e+01 3.051e+01 3.454e+01, threshold=5.717e+01, percent-clipped=0.0 2023-12-22 08:42:28,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=498200.0, ans=10.0 2023-12-22 08:42:30,072 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.41 vs. limit=15.0 2023-12-22 08:42:33,353 INFO [train.py:886] (1/4) Epoch 16, batch 3250, loss[loss=0.01308, audio_tagging_loss=0.01308, over 25000.00 frames. ], tot_loss[loss=0.0141, audio_tagging_loss=0.0141, over 4952072.30 frames. ], batch size: 100, lr: 6.78e-03, grad_scale: 64.0 2023-12-22 08:42:40,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=498266.6666666667, ans=0.125 2023-12-22 08:42:46,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=498333.3333333333, ans=0.125 2023-12-22 08:42:57,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=498400.0, ans=0.125 2023-12-22 08:43:16,351 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.30 vs. limit=15.0 2023-12-22 08:43:26,557 INFO [train.py:886] (1/4) Epoch 16, batch 3300, loss[loss=0.01191, audio_tagging_loss=0.01191, over 25000.00 frames. ], tot_loss[loss=0.01413, audio_tagging_loss=0.01413, over 4952864.75 frames. ], batch size: 100, lr: 6.78e-03, grad_scale: 64.0 2023-12-22 08:43:33,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=498600.0, ans=0.0 2023-12-22 08:43:56,486 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.43 vs. limit=12.0 2023-12-22 08:44:02,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=498800.0, ans=0.2 2023-12-22 08:44:03,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=498800.0, ans=0.125 2023-12-22 08:44:11,338 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.353e+01 2.738e+01 2.869e+01 3.079e+01 3.471e+01, threshold=5.738e+01, percent-clipped=0.0 2023-12-22 08:44:16,355 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.70 vs. limit=15.0 2023-12-22 08:44:17,635 INFO [train.py:886] (1/4) Epoch 16, batch 3350, loss[loss=0.01485, audio_tagging_loss=0.01485, over 25000.00 frames. ], tot_loss[loss=0.01415, audio_tagging_loss=0.01415, over 4961556.57 frames. ], batch size: 100, lr: 6.77e-03, grad_scale: 64.0 2023-12-22 08:44:28,043 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.91 vs. limit=6.0 2023-12-22 08:44:33,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=499000.0, ans=0.95 2023-12-22 08:44:36,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=499000.0, ans=0.0 2023-12-22 08:44:56,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.94 vs. limit=12.0 2023-12-22 08:45:01,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=499200.0, ans=0.0 2023-12-22 08:45:05,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=499200.0, ans=0.125 2023-12-22 08:45:08,416 INFO [train.py:886] (1/4) Epoch 16, batch 3400, loss[loss=0.01607, audio_tagging_loss=0.01607, over 25000.00 frames. ], tot_loss[loss=0.01407, audio_tagging_loss=0.01407, over 4963206.57 frames. ], batch size: 100, lr: 6.77e-03, grad_scale: 64.0 2023-12-22 08:45:11,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=499266.6666666667, ans=0.05 2023-12-22 08:45:23,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=499333.3333333333, ans=0.0 2023-12-22 08:45:39,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.15 vs. limit=22.5 2023-12-22 08:45:42,986 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.81 vs. limit=15.0 2023-12-22 08:45:49,191 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.30 vs. limit=15.0 2023-12-22 08:45:52,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=499533.3333333333, ans=0.0 2023-12-22 08:45:53,380 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.507e+01 2.800e+01 2.974e+01 3.102e+01 3.785e+01, threshold=5.949e+01, percent-clipped=0.0 2023-12-22 08:45:58,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=499600.0, ans=0.125 2023-12-22 08:45:59,705 INFO [train.py:886] (1/4) Epoch 16, batch 3450, loss[loss=0.01535, audio_tagging_loss=0.01535, over 24750.00 frames. ], tot_loss[loss=0.01422, audio_tagging_loss=0.01422, over 4957313.50 frames. ], batch size: 99, lr: 6.77e-03, grad_scale: 64.0 2023-12-22 08:45:59,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=499600.0, ans=0.0 2023-12-22 08:46:07,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=499600.0, ans=0.1 2023-12-22 08:46:11,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=499666.6666666667, ans=0.125 2023-12-22 08:46:12,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=499666.6666666667, ans=0.0 2023-12-22 08:46:37,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=499800.0, ans=10.0 2023-12-22 08:46:40,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=499800.0, ans=0.125 2023-12-22 08:46:51,492 INFO [train.py:886] (1/4) Epoch 16, batch 3500, loss[loss=0.01551, audio_tagging_loss=0.01551, over 21881.00 frames. ], tot_loss[loss=0.01426, audio_tagging_loss=0.01426, over 4946395.09 frames. ], batch size: 107, lr: 6.77e-03, grad_scale: 64.0 2023-12-22 08:47:00,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=499933.3333333333, ans=0.125 2023-12-22 08:47:03,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=500000.0, ans=0.125 2023-12-22 08:47:06,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=500000.0, ans=0.125 2023-12-22 08:47:22,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=500133.3333333333, ans=0.125 2023-12-22 08:47:27,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=500133.3333333333, ans=0.125 2023-12-22 08:47:38,981 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.455e+01 2.826e+01 2.952e+01 3.104e+01 3.647e+01, threshold=5.905e+01, percent-clipped=0.0 2023-12-22 08:47:45,463 INFO [train.py:886] (1/4) Epoch 16, batch 3550, loss[loss=0.01451, audio_tagging_loss=0.01451, over 24750.00 frames. ], tot_loss[loss=0.01416, audio_tagging_loss=0.01416, over 4943253.38 frames. ], batch size: 99, lr: 6.76e-03, grad_scale: 64.0 2023-12-22 08:48:10,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=500400.0, ans=0.125 2023-12-22 08:48:12,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=500400.0, ans=0.125 2023-12-22 08:48:13,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=500400.0, ans=0.07 2023-12-22 08:48:18,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=500466.6666666667, ans=0.125 2023-12-22 08:48:22,854 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.48 vs. limit=15.0 2023-12-22 08:48:24,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=500466.6666666667, ans=0.2 2023-12-22 08:48:24,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.13 vs. limit=15.0 2023-12-22 08:48:25,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=500533.3333333333, ans=0.125 2023-12-22 08:48:36,976 INFO [train.py:886] (1/4) Epoch 16, batch 3600, loss[loss=0.01387, audio_tagging_loss=0.01387, over 25000.00 frames. ], tot_loss[loss=0.01404, audio_tagging_loss=0.01404, over 4946761.68 frames. ], batch size: 100, lr: 6.76e-03, grad_scale: 64.0 2023-12-22 08:48:38,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=500600.0, ans=0.035 2023-12-22 08:48:47,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=500666.6666666667, ans=0.5 2023-12-22 08:48:53,159 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.00 vs. limit=10.0 2023-12-22 08:48:54,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=500666.6666666667, ans=0.04949747468305833 2023-12-22 08:48:55,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=500666.6666666667, ans=0.1 2023-12-22 08:48:56,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=500733.3333333333, ans=0.125 2023-12-22 08:49:04,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.33 vs. limit=10.0 2023-12-22 08:49:16,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=500800.0, ans=0.125 2023-12-22 08:49:22,452 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.277e+01 2.766e+01 2.870e+01 3.088e+01 3.484e+01, threshold=5.741e+01, percent-clipped=0.0 2023-12-22 08:49:28,816 INFO [train.py:886] (1/4) Epoch 16, batch 3650, loss[loss=0.01293, audio_tagging_loss=0.01293, over 25000.00 frames. ], tot_loss[loss=0.0139, audio_tagging_loss=0.0139, over 4947459.29 frames. ], batch size: 100, lr: 6.76e-03, grad_scale: 64.0 2023-12-22 08:49:30,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=500933.3333333333, ans=0.05 2023-12-22 08:49:30,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=500933.3333333333, ans=0.2 2023-12-22 08:49:34,720 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.58 vs. limit=22.5 2023-12-22 08:50:00,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=501133.3333333333, ans=0.0 2023-12-22 08:50:09,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.26 vs. limit=22.5 2023-12-22 08:50:13,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=501200.0, ans=0.125 2023-12-22 08:50:21,101 INFO [train.py:886] (1/4) Epoch 16, batch 3700, loss[loss=0.01661, audio_tagging_loss=0.01661, over 25000.00 frames. ], tot_loss[loss=0.01392, audio_tagging_loss=0.01392, over 4951975.68 frames. ], batch size: 100, lr: 6.76e-03, grad_scale: 64.0 2023-12-22 08:50:48,361 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 08:51:06,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=501533.3333333333, ans=22.5 2023-12-22 08:51:06,602 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.332e+01 2.832e+01 2.951e+01 3.083e+01 3.444e+01, threshold=5.901e+01, percent-clipped=0.0 2023-12-22 08:51:06,828 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 08:51:13,038 INFO [train.py:886] (1/4) Epoch 16, batch 3750, loss[loss=0.01695, audio_tagging_loss=0.01695, over 25000.00 frames. ], tot_loss[loss=0.01405, audio_tagging_loss=0.01405, over 4948961.11 frames. ], batch size: 100, lr: 6.76e-03, grad_scale: 64.0 2023-12-22 08:51:21,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=501600.0, ans=0.1 2023-12-22 08:51:25,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=501666.6666666667, ans=0.1 2023-12-22 08:51:31,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=501666.6666666667, ans=0.05 2023-12-22 08:51:33,880 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.08 vs. limit=15.0 2023-12-22 08:51:57,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=501866.6666666667, ans=0.125 2023-12-22 08:52:04,568 INFO [train.py:886] (1/4) Epoch 16, batch 3800, loss[loss=0.01557, audio_tagging_loss=0.01557, over 24750.00 frames. ], tot_loss[loss=0.01418, audio_tagging_loss=0.01418, over 4941059.02 frames. ], batch size: 99, lr: 6.75e-03, grad_scale: 64.0 2023-12-22 08:52:05,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=501933.3333333333, ans=0.125 2023-12-22 08:52:11,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=501933.3333333333, ans=15.0 2023-12-22 08:52:18,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=502000.0, ans=0.2 2023-12-22 08:52:22,390 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.73 vs. limit=15.0 2023-12-22 08:52:34,308 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.00 vs. limit=22.5 2023-12-22 08:52:35,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=502133.3333333333, ans=0.125 2023-12-22 08:52:46,905 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.60 vs. limit=15.0 2023-12-22 08:52:50,279 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.369e+01 2.815e+01 2.951e+01 3.109e+01 3.599e+01, threshold=5.902e+01, percent-clipped=0.0 2023-12-22 08:52:56,721 INFO [train.py:886] (1/4) Epoch 16, batch 3850, loss[loss=0.01613, audio_tagging_loss=0.01613, over 24750.00 frames. ], tot_loss[loss=0.01414, audio_tagging_loss=0.01414, over 4940515.02 frames. ], batch size: 99, lr: 6.75e-03, grad_scale: 64.0 2023-12-22 08:53:11,063 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.66 vs. limit=15.0 2023-12-22 08:53:20,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=502400.0, ans=0.125 2023-12-22 08:53:20,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=502400.0, ans=0.0 2023-12-22 08:53:42,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=502533.3333333333, ans=0.1 2023-12-22 08:53:48,105 INFO [train.py:886] (1/4) Epoch 16, batch 3900, loss[loss=0.01315, audio_tagging_loss=0.01315, over 25000.00 frames. ], tot_loss[loss=0.01399, audio_tagging_loss=0.01399, over 4944584.58 frames. ], batch size: 100, lr: 6.75e-03, grad_scale: 64.0 2023-12-22 08:53:51,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=502600.0, ans=0.1 2023-12-22 08:53:53,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=502600.0, ans=0.1 2023-12-22 08:54:24,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=502800.0, ans=0.0 2023-12-22 08:54:34,024 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.409e+01 2.745e+01 2.917e+01 3.077e+01 3.660e+01, threshold=5.833e+01, percent-clipped=0.0 2023-12-22 08:54:36,221 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.03 vs. limit=15.0 2023-12-22 08:54:36,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=502866.6666666667, ans=0.125 2023-12-22 08:54:36,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=502866.6666666667, ans=0.125 2023-12-22 08:54:40,367 INFO [train.py:886] (1/4) Epoch 16, batch 3950, loss[loss=0.01336, audio_tagging_loss=0.01336, over 25000.00 frames. ], tot_loss[loss=0.01399, audio_tagging_loss=0.01399, over 4950510.48 frames. ], batch size: 100, lr: 6.75e-03, grad_scale: 64.0 2023-12-22 08:54:59,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=503066.6666666667, ans=0.1 2023-12-22 08:55:01,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=503066.6666666667, ans=0.125 2023-12-22 08:55:08,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=503066.6666666667, ans=0.05 2023-12-22 08:55:10,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=503133.3333333333, ans=0.0 2023-12-22 08:55:13,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=503133.3333333333, ans=0.125 2023-12-22 08:55:14,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=503133.3333333333, ans=0.0 2023-12-22 08:55:22,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=503200.0, ans=0.125 2023-12-22 08:55:25,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=503200.0, ans=0.1 2023-12-22 08:55:31,474 INFO [train.py:886] (1/4) Epoch 16, batch 4000, loss[loss=0.01536, audio_tagging_loss=0.01536, over 25000.00 frames. ], tot_loss[loss=0.01406, audio_tagging_loss=0.01406, over 4954413.50 frames. ], batch size: 100, lr: 6.74e-03, grad_scale: 128.0 2023-12-22 08:55:47,400 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2023-12-22 08:56:04,038 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.39 vs. limit=12.0 2023-12-22 08:56:13,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=503533.3333333333, ans=0.125 2023-12-22 08:56:18,187 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.540e+01 2.822e+01 2.939e+01 3.066e+01 3.812e+01, threshold=5.877e+01, percent-clipped=0.0 2023-12-22 08:56:19,507 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.18 vs. limit=22.5 2023-12-22 08:56:20,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=503533.3333333333, ans=0.125 2023-12-22 08:56:22,958 INFO [train.py:886] (1/4) Epoch 16, batch 4050, loss[loss=0.01471, audio_tagging_loss=0.01471, over 25000.00 frames. ], tot_loss[loss=0.01417, audio_tagging_loss=0.01417, over 4958236.51 frames. ], batch size: 100, lr: 6.74e-03, grad_scale: 64.0 2023-12-22 08:56:33,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=503666.6666666667, ans=0.0 2023-12-22 08:56:35,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=503666.6666666667, ans=0.125 2023-12-22 08:56:48,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=503733.3333333333, ans=10.0 2023-12-22 08:56:49,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=503733.3333333333, ans=0.125 2023-12-22 08:56:57,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=503800.0, ans=0.0 2023-12-22 08:57:00,333 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.39 vs. limit=15.0 2023-12-22 08:57:05,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=503866.6666666667, ans=0.125 2023-12-22 08:57:12,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=503866.6666666667, ans=0.0 2023-12-22 08:57:15,706 INFO [train.py:886] (1/4) Epoch 16, batch 4100, loss[loss=0.01367, audio_tagging_loss=0.01367, over 24750.00 frames. ], tot_loss[loss=0.01425, audio_tagging_loss=0.01425, over 4951630.71 frames. ], batch size: 99, lr: 6.74e-03, grad_scale: 64.0 2023-12-22 08:57:19,114 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.96 vs. limit=15.0 2023-12-22 08:57:19,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=503933.3333333333, ans=0.5 2023-12-22 08:57:55,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=504200.0, ans=0.125 2023-12-22 08:58:01,237 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.494e+01 2.833e+01 2.971e+01 3.147e+01 3.748e+01, threshold=5.941e+01, percent-clipped=0.0 2023-12-22 08:58:06,729 INFO [train.py:886] (1/4) Epoch 16, batch 4150, loss[loss=0.01445, audio_tagging_loss=0.01445, over 24750.00 frames. ], tot_loss[loss=0.0142, audio_tagging_loss=0.0142, over 4950562.82 frames. ], batch size: 99, lr: 6.74e-03, grad_scale: 64.0 2023-12-22 08:58:14,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=504266.6666666667, ans=0.125 2023-12-22 08:58:22,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=504333.3333333333, ans=0.0 2023-12-22 08:58:22,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.01 vs. limit=15.0 2023-12-22 08:58:58,058 INFO [train.py:886] (1/4) Epoch 16, batch 4200, loss[loss=0.01378, audio_tagging_loss=0.01378, over 25000.00 frames. ], tot_loss[loss=0.01413, audio_tagging_loss=0.01413, over 4953684.43 frames. ], batch size: 100, lr: 6.74e-03, grad_scale: 64.0 2023-12-22 08:59:13,502 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.19 vs. limit=15.0 2023-12-22 08:59:20,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=504733.3333333333, ans=0.125 2023-12-22 08:59:23,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=504733.3333333333, ans=0.1 2023-12-22 08:59:28,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=504800.0, ans=0.0 2023-12-22 08:59:29,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=504800.0, ans=0.0 2023-12-22 08:59:33,432 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.30 vs. limit=15.0 2023-12-22 08:59:43,948 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.486e+01 2.796e+01 2.928e+01 3.041e+01 3.614e+01, threshold=5.855e+01, percent-clipped=0.0 2023-12-22 08:59:48,637 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 08:59:49,384 INFO [train.py:886] (1/4) Epoch 16, batch 4250, loss[loss=0.0129, audio_tagging_loss=0.0129, over 25000.00 frames. ], tot_loss[loss=0.01406, audio_tagging_loss=0.01406, over 4952520.71 frames. ], batch size: 100, lr: 6.73e-03, grad_scale: 64.0 2023-12-22 08:59:55,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=504933.3333333333, ans=0.0 2023-12-22 08:59:57,833 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.04 vs. limit=6.0 2023-12-22 08:59:58,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=504933.3333333333, ans=0.125 2023-12-22 09:00:04,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=505000.0, ans=0.125 2023-12-22 09:00:06,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=505000.0, ans=0.125 2023-12-22 09:00:15,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=505066.6666666667, ans=0.125 2023-12-22 09:00:40,929 INFO [train.py:886] (1/4) Epoch 16, batch 4300, loss[loss=0.01508, audio_tagging_loss=0.01508, over 25000.00 frames. ], tot_loss[loss=0.01407, audio_tagging_loss=0.01407, over 4956584.47 frames. ], batch size: 100, lr: 6.73e-03, grad_scale: 64.0 2023-12-22 09:00:42,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=505266.6666666667, ans=0.125 2023-12-22 09:00:51,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=505333.3333333333, ans=0.125 2023-12-22 09:00:54,955 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 09:01:02,569 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.79 vs. limit=8.0 2023-12-22 09:01:15,430 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.49 vs. limit=22.5 2023-12-22 09:01:16,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=505466.6666666667, ans=0.2 2023-12-22 09:01:21,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.60 vs. limit=6.0 2023-12-22 09:01:24,588 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 09:01:27,806 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.412e+01 2.822e+01 2.946e+01 3.100e+01 3.669e+01, threshold=5.892e+01, percent-clipped=0.0 2023-12-22 09:01:29,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=505533.3333333333, ans=0.1 2023-12-22 09:01:33,247 INFO [train.py:886] (1/4) Epoch 16, batch 4350, loss[loss=0.01267, audio_tagging_loss=0.01267, over 24750.00 frames. ], tot_loss[loss=0.01415, audio_tagging_loss=0.01415, over 4959155.64 frames. ], batch size: 99, lr: 6.73e-03, grad_scale: 64.0 2023-12-22 09:01:35,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=505600.0, ans=0.1 2023-12-22 09:01:35,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=505600.0, ans=0.2 2023-12-22 09:01:53,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=505733.3333333333, ans=0.2 2023-12-22 09:01:58,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=505733.3333333333, ans=0.2 2023-12-22 09:02:02,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=505800.0, ans=0.0 2023-12-22 09:02:22,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=505933.3333333333, ans=0.1 2023-12-22 09:02:23,189 INFO [train.py:886] (1/4) Epoch 16, batch 4400, loss[loss=0.0143, audio_tagging_loss=0.0143, over 24750.00 frames. ], tot_loss[loss=0.01424, audio_tagging_loss=0.01424, over 4955376.33 frames. ], batch size: 99, lr: 6.73e-03, grad_scale: 64.0 2023-12-22 09:02:26,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=505933.3333333333, ans=0.025 2023-12-22 09:02:30,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.71 vs. limit=15.0 2023-12-22 09:02:46,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=506066.6666666667, ans=0.2 2023-12-22 09:02:49,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=506066.6666666667, ans=0.2 2023-12-22 09:02:55,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=506133.3333333333, ans=0.1 2023-12-22 09:03:10,657 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.444e+01 2.860e+01 2.974e+01 3.134e+01 3.649e+01, threshold=5.949e+01, percent-clipped=0.0 2023-12-22 09:03:15,415 INFO [train.py:886] (1/4) Epoch 16, batch 4450, loss[loss=0.01407, audio_tagging_loss=0.01407, over 25000.00 frames. ], tot_loss[loss=0.01431, audio_tagging_loss=0.01431, over 4949984.26 frames. ], batch size: 100, lr: 6.72e-03, grad_scale: 64.0 2023-12-22 09:03:19,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=506266.6666666667, ans=0.1 2023-12-22 09:03:26,238 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.02 vs. limit=15.0 2023-12-22 09:03:39,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=506400.0, ans=0.0 2023-12-22 09:03:41,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=506400.0, ans=0.125 2023-12-22 09:03:59,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=506533.3333333333, ans=0.1 2023-12-22 09:04:00,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=506533.3333333333, ans=0.125 2023-12-22 09:04:06,921 INFO [train.py:886] (1/4) Epoch 16, batch 4500, loss[loss=0.0148, audio_tagging_loss=0.0148, over 25000.00 frames. ], tot_loss[loss=0.01418, audio_tagging_loss=0.01418, over 4950319.65 frames. ], batch size: 100, lr: 6.72e-03, grad_scale: 64.0 2023-12-22 09:04:09,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=506600.0, ans=0.04949747468305833 2023-12-22 09:04:25,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=506666.6666666667, ans=0.0 2023-12-22 09:04:28,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=506733.3333333333, ans=0.0 2023-12-22 09:04:55,316 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 09:04:55,934 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.520e+01 2.756e+01 2.905e+01 3.063e+01 3.487e+01, threshold=5.810e+01, percent-clipped=0.0 2023-12-22 09:04:56,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=506866.6666666667, ans=0.125 2023-12-22 09:05:00,649 INFO [train.py:886] (1/4) Epoch 16, batch 4550, loss[loss=0.01478, audio_tagging_loss=0.01478, over 24750.00 frames. ], tot_loss[loss=0.01414, audio_tagging_loss=0.01414, over 4944437.69 frames. ], batch size: 99, lr: 6.72e-03, grad_scale: 64.0 2023-12-22 09:05:07,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=506933.3333333333, ans=0.025 2023-12-22 09:05:09,518 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.61 vs. limit=15.0 2023-12-22 09:05:29,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=507066.6666666667, ans=0.0 2023-12-22 09:05:34,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=507133.3333333333, ans=0.0 2023-12-22 09:05:36,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=507133.3333333333, ans=0.125 2023-12-22 09:05:44,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=507200.0, ans=0.1 2023-12-22 09:05:50,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=507200.0, ans=0.1 2023-12-22 09:05:53,170 INFO [train.py:886] (1/4) Epoch 16, batch 4600, loss[loss=0.01521, audio_tagging_loss=0.01521, over 25000.00 frames. ], tot_loss[loss=0.01416, audio_tagging_loss=0.01416, over 4947477.43 frames. ], batch size: 100, lr: 6.72e-03, grad_scale: 64.0 2023-12-22 09:05:57,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=507266.6666666667, ans=0.2 2023-12-22 09:05:58,239 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.04 vs. limit=15.0 2023-12-22 09:06:05,841 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.95 vs. limit=22.5 2023-12-22 09:06:14,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=507400.0, ans=0.125 2023-12-22 09:06:20,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=507400.0, ans=0.125 2023-12-22 09:06:39,327 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.493e+01 2.771e+01 2.902e+01 3.112e+01 3.389e+01, threshold=5.805e+01, percent-clipped=0.0 2023-12-22 09:06:44,722 INFO [train.py:886] (1/4) Epoch 16, batch 4650, loss[loss=0.01668, audio_tagging_loss=0.01668, over 25000.00 frames. ], tot_loss[loss=0.01421, audio_tagging_loss=0.01421, over 4949427.93 frames. ], batch size: 100, lr: 6.72e-03, grad_scale: 64.0 2023-12-22 09:07:25,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=507866.6666666667, ans=0.125 2023-12-22 09:07:33,704 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.12 vs. limit=15.0 2023-12-22 09:07:36,016 INFO [train.py:886] (1/4) Epoch 16, batch 4700, loss[loss=0.01409, audio_tagging_loss=0.01409, over 24750.00 frames. ], tot_loss[loss=0.01428, audio_tagging_loss=0.01428, over 4945715.52 frames. ], batch size: 99, lr: 6.71e-03, grad_scale: 64.0 2023-12-22 09:07:41,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=507933.3333333333, ans=0.2 2023-12-22 09:07:50,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=508000.0, ans=0.0 2023-12-22 09:07:54,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=508066.6666666667, ans=0.0 2023-12-22 09:07:55,116 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.74 vs. limit=22.5 2023-12-22 09:08:01,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.15 vs. limit=10.0 2023-12-22 09:08:02,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=508066.6666666667, ans=0.0 2023-12-22 09:08:17,715 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.31 vs. limit=15.0 2023-12-22 09:08:18,088 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.502e+01 2.851e+01 3.014e+01 3.171e+01 4.011e+01, threshold=6.029e+01, percent-clipped=0.0 2023-12-22 09:08:23,115 INFO [train.py:886] (1/4) Epoch 16, batch 4750, loss[loss=0.01458, audio_tagging_loss=0.01458, over 24750.00 frames. ], tot_loss[loss=0.01438, audio_tagging_loss=0.01438, over 4940989.78 frames. ], batch size: 99, lr: 6.71e-03, grad_scale: 64.0 2023-12-22 09:08:24,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=508266.6666666667, ans=0.1 2023-12-22 09:08:32,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=508333.3333333333, ans=0.125 2023-12-22 09:08:59,116 INFO [train.py:886] (1/4) Epoch 17, batch 0, loss[loss=0.02862, audio_tagging_loss=0.02862, over 24008.00 frames. ], tot_loss[loss=0.02862, audio_tagging_loss=0.02862, over 24008.00 frames. ], batch size: 100, lr: 6.51e-03, grad_scale: 64.0 2023-12-22 09:08:59,117 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 09:09:08,273 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.0602, 2.5881, 3.9369, 3.1850], device='cuda:1') 2023-12-22 09:09:20,287 INFO [train.py:917] (1/4) Epoch 17, validation: loss=0.03195, audio_tagging_loss=0.03195, over 3737520.00 frames. 2023-12-22 09:09:20,288 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 09:09:39,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=508506.6666666667, ans=0.0 2023-12-22 09:09:40,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=508506.6666666667, ans=0.2 2023-12-22 09:09:43,193 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.83 vs. limit=10.0 2023-12-22 09:10:10,899 INFO [train.py:886] (1/4) Epoch 17, batch 50, loss[loss=0.01623, audio_tagging_loss=0.01623, over 25000.00 frames. ], tot_loss[loss=0.02214, audio_tagging_loss=0.02214, over 1115540.78 frames. ], batch size: 100, lr: 6.51e-03, grad_scale: 64.0 2023-12-22 09:10:20,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=508773.3333333333, ans=0.0 2023-12-22 09:10:23,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=508773.3333333333, ans=0.1 2023-12-22 09:10:28,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=508773.3333333333, ans=0.2 2023-12-22 09:10:38,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=508840.0, ans=0.1 2023-12-22 09:10:38,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=508840.0, ans=0.125 2023-12-22 09:10:41,307 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.691e+01 3.315e+01 3.576e+01 4.102e+01 9.303e+01, threshold=7.152e+01, percent-clipped=8.0 2023-12-22 09:10:52,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.70 vs. limit=22.5 2023-12-22 09:11:03,185 INFO [train.py:886] (1/4) Epoch 17, batch 100, loss[loss=0.01779, audio_tagging_loss=0.01779, over 24876.00 frames. ], tot_loss[loss=0.01926, audio_tagging_loss=0.01926, over 1973887.44 frames. ], batch size: 100, lr: 6.50e-03, grad_scale: 64.0 2023-12-22 09:11:13,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=509106.6666666667, ans=0.125 2023-12-22 09:11:14,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=509106.6666666667, ans=0.0 2023-12-22 09:11:36,832 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.43 vs. limit=15.0 2023-12-22 09:11:52,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=509306.6666666667, ans=0.125 2023-12-22 09:11:54,545 INFO [train.py:886] (1/4) Epoch 17, batch 150, loss[loss=0.01021, audio_tagging_loss=0.01021, over 25000.00 frames. ], tot_loss[loss=0.01746, audio_tagging_loss=0.01746, over 2632070.79 frames. ], batch size: 100, lr: 6.50e-03, grad_scale: 64.0 2023-12-22 09:12:16,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=509506.6666666667, ans=0.0 2023-12-22 09:12:22,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=509506.6666666667, ans=0.125 2023-12-22 09:12:24,690 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.663e+01 2.920e+01 3.033e+01 3.237e+01 3.873e+01, threshold=6.065e+01, percent-clipped=0.0 2023-12-22 09:12:30,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=509573.3333333333, ans=0.125 2023-12-22 09:12:33,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=509573.3333333333, ans=0.125 2023-12-22 09:12:37,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=509640.0, ans=0.1 2023-12-22 09:12:37,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=509640.0, ans=0.1 2023-12-22 09:12:46,843 INFO [train.py:886] (1/4) Epoch 17, batch 200, loss[loss=0.01392, audio_tagging_loss=0.01392, over 25000.00 frames. ], tot_loss[loss=0.01635, audio_tagging_loss=0.01635, over 3147966.31 frames. ], batch size: 100, lr: 6.50e-03, grad_scale: 64.0 2023-12-22 09:12:47,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=509706.6666666667, ans=0.125 2023-12-22 09:12:58,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=509773.3333333333, ans=0.2 2023-12-22 09:13:00,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=509773.3333333333, ans=0.1 2023-12-22 09:13:06,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=509773.3333333333, ans=0.0 2023-12-22 09:13:10,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=509840.0, ans=0.125 2023-12-22 09:13:23,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=509906.6666666667, ans=0.125 2023-12-22 09:13:24,091 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.506e-03 2023-12-22 09:13:34,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=509973.3333333333, ans=0.125 2023-12-22 09:13:37,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=510040.0, ans=0.125 2023-12-22 09:13:39,430 INFO [train.py:886] (1/4) Epoch 17, batch 250, loss[loss=0.0133, audio_tagging_loss=0.0133, over 24750.00 frames. ], tot_loss[loss=0.01581, audio_tagging_loss=0.01581, over 3550436.07 frames. ], batch size: 99, lr: 6.50e-03, grad_scale: 64.0 2023-12-22 09:14:04,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=510173.3333333333, ans=0.1 2023-12-22 09:14:08,578 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.464e+01 2.772e+01 2.917e+01 3.041e+01 3.552e+01, threshold=5.833e+01, percent-clipped=0.0 2023-12-22 09:14:19,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=510240.0, ans=0.1 2023-12-22 09:14:20,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=510306.6666666667, ans=0.0 2023-12-22 09:14:30,817 INFO [train.py:886] (1/4) Epoch 17, batch 300, loss[loss=0.01232, audio_tagging_loss=0.01232, over 24750.00 frames. ], tot_loss[loss=0.01545, audio_tagging_loss=0.01545, over 3858058.00 frames. ], batch size: 99, lr: 6.50e-03, grad_scale: 64.0 2023-12-22 09:14:35,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=510373.3333333333, ans=0.125 2023-12-22 09:14:35,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=510373.3333333333, ans=22.5 2023-12-22 09:14:37,600 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.48 vs. limit=15.0 2023-12-22 09:14:37,833 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.45 vs. limit=10.0 2023-12-22 09:14:49,131 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.21 vs. limit=15.0 2023-12-22 09:15:23,912 INFO [train.py:886] (1/4) Epoch 17, batch 350, loss[loss=0.01439, audio_tagging_loss=0.01439, over 25000.00 frames. ], tot_loss[loss=0.01516, audio_tagging_loss=0.01516, over 4097342.56 frames. ], batch size: 100, lr: 6.49e-03, grad_scale: 64.0 2023-12-22 09:15:31,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=510706.6666666667, ans=0.05 2023-12-22 09:15:39,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=510773.3333333333, ans=0.125 2023-12-22 09:15:42,014 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 09:15:54,267 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.457e+01 2.827e+01 3.010e+01 3.113e+01 3.707e+01, threshold=6.020e+01, percent-clipped=0.0 2023-12-22 09:15:56,810 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.09 vs. limit=15.0 2023-12-22 09:16:04,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=14.33 vs. limit=15.0 2023-12-22 09:16:09,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=510973.3333333333, ans=0.125 2023-12-22 09:16:15,127 INFO [train.py:886] (1/4) Epoch 17, batch 400, loss[loss=0.01241, audio_tagging_loss=0.01241, over 25000.00 frames. ], tot_loss[loss=0.01485, audio_tagging_loss=0.01485, over 4286264.60 frames. ], batch size: 100, lr: 6.49e-03, grad_scale: 64.0 2023-12-22 09:16:15,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=511040.0, ans=0.05 2023-12-22 09:16:15,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.33 vs. limit=15.0 2023-12-22 09:16:17,048 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 09:16:26,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=511106.6666666667, ans=0.2 2023-12-22 09:16:42,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=511173.3333333333, ans=0.125 2023-12-22 09:16:58,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=511306.6666666667, ans=0.125 2023-12-22 09:17:03,286 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.04 vs. limit=22.5 2023-12-22 09:17:07,489 INFO [train.py:886] (1/4) Epoch 17, batch 450, loss[loss=0.01696, audio_tagging_loss=0.01696, over 25000.00 frames. ], tot_loss[loss=0.01451, audio_tagging_loss=0.01451, over 4436094.97 frames. ], batch size: 100, lr: 6.49e-03, grad_scale: 64.0 2023-12-22 09:17:11,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=511373.3333333333, ans=0.0 2023-12-22 09:17:29,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=511506.6666666667, ans=0.0 2023-12-22 09:17:38,008 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.481e+01 2.743e+01 2.910e+01 3.047e+01 3.599e+01, threshold=5.820e+01, percent-clipped=0.0 2023-12-22 09:17:58,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=511640.0, ans=0.0 2023-12-22 09:18:00,205 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.84 vs. limit=15.0 2023-12-22 09:18:00,488 INFO [train.py:886] (1/4) Epoch 17, batch 500, loss[loss=0.01327, audio_tagging_loss=0.01327, over 25000.00 frames. ], tot_loss[loss=0.01439, audio_tagging_loss=0.01439, over 4552933.36 frames. ], batch size: 100, lr: 6.49e-03, grad_scale: 64.0 2023-12-22 09:18:03,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=511706.6666666667, ans=0.125 2023-12-22 09:18:05,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=511706.6666666667, ans=0.125 2023-12-22 09:18:14,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=511773.3333333333, ans=0.125 2023-12-22 09:18:14,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=511773.3333333333, ans=0.09899494936611666 2023-12-22 09:18:26,361 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.04 vs. limit=15.0 2023-12-22 09:18:47,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=511973.3333333333, ans=0.125 2023-12-22 09:18:52,180 INFO [train.py:886] (1/4) Epoch 17, batch 550, loss[loss=0.01349, audio_tagging_loss=0.01349, over 23978.00 frames. ], tot_loss[loss=0.01435, audio_tagging_loss=0.01435, over 4640060.30 frames. ], batch size: 100, lr: 6.49e-03, grad_scale: 64.0 2023-12-22 09:19:04,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=512106.6666666667, ans=10.0 2023-12-22 09:19:14,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=512173.3333333333, ans=0.0 2023-12-22 09:19:22,633 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.486e+01 2.788e+01 2.954e+01 3.144e+01 3.570e+01, threshold=5.909e+01, percent-clipped=0.0 2023-12-22 09:19:32,334 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.81 vs. limit=6.0 2023-12-22 09:19:40,298 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.62 vs. limit=15.0 2023-12-22 09:19:41,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=512306.6666666667, ans=0.125 2023-12-22 09:19:44,635 INFO [train.py:886] (1/4) Epoch 17, batch 600, loss[loss=0.01559, audio_tagging_loss=0.01559, over 24954.00 frames. ], tot_loss[loss=0.01438, audio_tagging_loss=0.01438, over 4708401.48 frames. ], batch size: 100, lr: 6.48e-03, grad_scale: 64.0 2023-12-22 09:19:53,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=512440.0, ans=0.0 2023-12-22 09:19:56,851 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0 2023-12-22 09:20:26,643 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.81 vs. limit=6.0 2023-12-22 09:20:29,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=512640.0, ans=0.125 2023-12-22 09:20:36,490 INFO [train.py:886] (1/4) Epoch 17, batch 650, loss[loss=0.01536, audio_tagging_loss=0.01536, over 24750.00 frames. ], tot_loss[loss=0.01432, audio_tagging_loss=0.01432, over 4755220.02 frames. ], batch size: 99, lr: 6.48e-03, grad_scale: 64.0 2023-12-22 09:20:44,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=512706.6666666667, ans=0.0 2023-12-22 09:21:04,862 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.497e+01 2.843e+01 2.927e+01 3.090e+01 3.715e+01, threshold=5.854e+01, percent-clipped=0.0 2023-12-22 09:21:18,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=512973.3333333333, ans=0.0 2023-12-22 09:21:20,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=512973.3333333333, ans=0.125 2023-12-22 09:21:27,351 INFO [train.py:886] (1/4) Epoch 17, batch 700, loss[loss=0.01401, audio_tagging_loss=0.01401, over 24750.00 frames. ], tot_loss[loss=0.01431, audio_tagging_loss=0.01431, over 4796742.51 frames. ], batch size: 99, lr: 6.48e-03, grad_scale: 64.0 2023-12-22 09:21:33,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.67 vs. limit=15.0 2023-12-22 09:21:34,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=513040.0, ans=0.125 2023-12-22 09:21:40,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=513106.6666666667, ans=0.125 2023-12-22 09:21:43,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=513106.6666666667, ans=0.125 2023-12-22 09:21:43,594 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.25 vs. limit=15.0 2023-12-22 09:22:04,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=513240.0, ans=0.0 2023-12-22 09:22:19,891 INFO [train.py:886] (1/4) Epoch 17, batch 750, loss[loss=0.01442, audio_tagging_loss=0.01442, over 25000.00 frames. ], tot_loss[loss=0.01428, audio_tagging_loss=0.01428, over 4834572.29 frames. ], batch size: 100, lr: 6.48e-03, grad_scale: 64.0 2023-12-22 09:22:33,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=513440.0, ans=0.125 2023-12-22 09:22:50,397 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.463e+01 2.839e+01 2.963e+01 3.097e+01 3.616e+01, threshold=5.927e+01, percent-clipped=0.0 2023-12-22 09:23:08,999 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.29 vs. limit=15.0 2023-12-22 09:23:10,512 INFO [train.py:886] (1/4) Epoch 17, batch 800, loss[loss=0.0137, audio_tagging_loss=0.0137, over 25000.00 frames. ], tot_loss[loss=0.01421, audio_tagging_loss=0.01421, over 4856858.93 frames. ], batch size: 100, lr: 6.47e-03, grad_scale: 64.0 2023-12-22 09:23:10,942 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.75 vs. limit=15.0 2023-12-22 09:23:12,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=513706.6666666667, ans=0.5 2023-12-22 09:23:18,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=513706.6666666667, ans=0.2 2023-12-22 09:23:39,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=513840.0, ans=0.125 2023-12-22 09:23:58,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=513973.3333333333, ans=0.0 2023-12-22 09:23:59,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=513973.3333333333, ans=0.125 2023-12-22 09:24:03,870 INFO [train.py:886] (1/4) Epoch 17, batch 850, loss[loss=0.01346, audio_tagging_loss=0.01346, over 25000.00 frames. ], tot_loss[loss=0.01413, audio_tagging_loss=0.01413, over 4876250.12 frames. ], batch size: 100, lr: 6.47e-03, grad_scale: 64.0 2023-12-22 09:24:08,171 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.43 vs. limit=22.5 2023-12-22 09:24:21,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=514106.6666666667, ans=0.125 2023-12-22 09:24:22,222 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.43 vs. limit=15.0 2023-12-22 09:24:24,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=514173.3333333333, ans=0.0 2023-12-22 09:24:24,801 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2023-12-22 09:24:28,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=514173.3333333333, ans=0.125 2023-12-22 09:24:31,312 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.89 vs. limit=15.0 2023-12-22 09:24:34,349 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.468e+01 2.760e+01 2.887e+01 3.062e+01 3.648e+01, threshold=5.774e+01, percent-clipped=0.0 2023-12-22 09:24:35,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=514240.0, ans=0.0 2023-12-22 09:24:36,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=514240.0, ans=0.125 2023-12-22 09:24:46,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=514306.6666666667, ans=0.05 2023-12-22 09:24:51,042 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.83 vs. limit=6.0 2023-12-22 09:24:55,668 INFO [train.py:886] (1/4) Epoch 17, batch 900, loss[loss=0.01244, audio_tagging_loss=0.01244, over 24750.00 frames. ], tot_loss[loss=0.0142, audio_tagging_loss=0.0142, over 4895136.08 frames. ], batch size: 99, lr: 6.47e-03, grad_scale: 64.0 2023-12-22 09:25:05,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=514440.0, ans=0.125 2023-12-22 09:25:08,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=514440.0, ans=0.1 2023-12-22 09:25:25,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=514573.3333333333, ans=0.125 2023-12-22 09:25:26,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=514573.3333333333, ans=0.025 2023-12-22 09:25:42,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=514640.0, ans=0.125 2023-12-22 09:25:42,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=514640.0, ans=0.125 2023-12-22 09:25:47,446 INFO [train.py:886] (1/4) Epoch 17, batch 950, loss[loss=0.01278, audio_tagging_loss=0.01278, over 24750.00 frames. ], tot_loss[loss=0.01429, audio_tagging_loss=0.01429, over 4905120.41 frames. ], batch size: 99, lr: 6.47e-03, grad_scale: 64.0 2023-12-22 09:25:56,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=514773.3333333333, ans=0.125 2023-12-22 09:26:18,085 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.614e+01 2.797e+01 2.941e+01 3.077e+01 3.617e+01, threshold=5.883e+01, percent-clipped=0.0 2023-12-22 09:26:36,893 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.32 vs. limit=15.0 2023-12-22 09:26:41,025 INFO [train.py:886] (1/4) Epoch 17, batch 1000, loss[loss=0.0147, audio_tagging_loss=0.0147, over 25000.00 frames. ], tot_loss[loss=0.01419, audio_tagging_loss=0.01419, over 4908323.67 frames. ], batch size: 100, lr: 6.47e-03, grad_scale: 64.0 2023-12-22 09:26:47,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=515040.0, ans=0.125 2023-12-22 09:26:52,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=515106.6666666667, ans=0.0 2023-12-22 09:26:56,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=515106.6666666667, ans=0.1 2023-12-22 09:27:20,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=515306.6666666667, ans=0.0 2023-12-22 09:27:22,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=515306.6666666667, ans=0.125 2023-12-22 09:27:24,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=515306.6666666667, ans=0.0 2023-12-22 09:27:31,114 INFO [train.py:886] (1/4) Epoch 17, batch 1050, loss[loss=0.01096, audio_tagging_loss=0.01096, over 25000.00 frames. ], tot_loss[loss=0.0141, audio_tagging_loss=0.0141, over 4918265.34 frames. ], batch size: 100, lr: 6.46e-03, grad_scale: 64.0 2023-12-22 09:27:47,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=515440.0, ans=0.1 2023-12-22 09:27:55,056 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.55 vs. limit=15.0 2023-12-22 09:28:01,250 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.471e+01 2.759e+01 2.948e+01 3.113e+01 4.038e+01, threshold=5.895e+01, percent-clipped=0.0 2023-12-22 09:28:09,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=515573.3333333333, ans=0.125 2023-12-22 09:28:13,375 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.96 vs. limit=10.0 2023-12-22 09:28:18,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=515640.0, ans=0.125 2023-12-22 09:28:24,300 INFO [train.py:886] (1/4) Epoch 17, batch 1100, loss[loss=0.01238, audio_tagging_loss=0.01238, over 25000.00 frames. ], tot_loss[loss=0.01389, audio_tagging_loss=0.01389, over 4926452.64 frames. ], batch size: 100, lr: 6.46e-03, grad_scale: 64.0 2023-12-22 09:28:29,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=515706.6666666667, ans=0.0 2023-12-22 09:28:41,651 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.17 vs. limit=6.0 2023-12-22 09:28:42,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=515773.3333333333, ans=0.0 2023-12-22 09:28:49,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=515840.0, ans=0.2 2023-12-22 09:28:56,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=515906.6666666667, ans=0.1 2023-12-22 09:29:15,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=515973.3333333333, ans=0.125 2023-12-22 09:29:17,774 INFO [train.py:886] (1/4) Epoch 17, batch 1150, loss[loss=0.01334, audio_tagging_loss=0.01334, over 24750.00 frames. ], tot_loss[loss=0.01394, audio_tagging_loss=0.01394, over 4938157.11 frames. ], batch size: 99, lr: 6.46e-03, grad_scale: 64.0 2023-12-22 09:29:23,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=516040.0, ans=0.0 2023-12-22 09:29:30,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=516106.6666666667, ans=0.125 2023-12-22 09:29:34,893 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.22 vs. limit=22.5 2023-12-22 09:29:45,186 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.80 vs. limit=22.5 2023-12-22 09:29:47,301 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.502e+01 2.790e+01 2.920e+01 3.045e+01 3.393e+01, threshold=5.839e+01, percent-clipped=0.0 2023-12-22 09:29:48,858 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.47 vs. limit=15.0 2023-12-22 09:30:08,784 INFO [train.py:886] (1/4) Epoch 17, batch 1200, loss[loss=0.01527, audio_tagging_loss=0.01527, over 25000.00 frames. ], tot_loss[loss=0.01405, audio_tagging_loss=0.01405, over 4944367.04 frames. ], batch size: 100, lr: 6.46e-03, grad_scale: 64.0 2023-12-22 09:30:10,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=516373.3333333333, ans=0.0 2023-12-22 09:30:12,256 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.45 vs. limit=22.5 2023-12-22 09:30:21,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=516440.0, ans=0.125 2023-12-22 09:30:36,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=516506.6666666667, ans=0.125 2023-12-22 09:30:49,072 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.53 vs. limit=10.0 2023-12-22 09:31:01,209 INFO [train.py:886] (1/4) Epoch 17, batch 1250, loss[loss=0.01567, audio_tagging_loss=0.01567, over 24750.00 frames. ], tot_loss[loss=0.01409, audio_tagging_loss=0.01409, over 4941993.01 frames. ], batch size: 99, lr: 6.46e-03, grad_scale: 64.0 2023-12-22 09:31:31,599 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.516e+01 2.867e+01 2.981e+01 3.113e+01 3.868e+01, threshold=5.962e+01, percent-clipped=0.0 2023-12-22 09:31:47,276 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.92 vs. limit=22.5 2023-12-22 09:31:53,084 INFO [train.py:886] (1/4) Epoch 17, batch 1300, loss[loss=0.01535, audio_tagging_loss=0.01535, over 25000.00 frames. ], tot_loss[loss=0.01419, audio_tagging_loss=0.01419, over 4938486.04 frames. ], batch size: 100, lr: 6.45e-03, grad_scale: 128.0 2023-12-22 09:32:30,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=517240.0, ans=0.125 2023-12-22 09:32:36,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=517306.6666666667, ans=0.125 2023-12-22 09:32:45,470 INFO [train.py:886] (1/4) Epoch 17, batch 1350, loss[loss=0.0125, audio_tagging_loss=0.0125, over 25000.00 frames. ], tot_loss[loss=0.01411, audio_tagging_loss=0.01411, over 4944639.16 frames. ], batch size: 100, lr: 6.45e-03, grad_scale: 64.0 2023-12-22 09:33:14,882 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=15.0 2023-12-22 09:33:15,417 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.569e-03 2023-12-22 09:33:16,146 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.365e+01 2.812e+01 2.965e+01 3.178e+01 3.888e+01, threshold=5.930e+01, percent-clipped=0.0 2023-12-22 09:33:23,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=517573.3333333333, ans=0.0 2023-12-22 09:33:38,006 INFO [train.py:886] (1/4) Epoch 17, batch 1400, loss[loss=0.01272, audio_tagging_loss=0.01272, over 25000.00 frames. ], tot_loss[loss=0.01399, audio_tagging_loss=0.01399, over 4945891.61 frames. ], batch size: 100, lr: 6.45e-03, grad_scale: 64.0 2023-12-22 09:33:42,302 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.51 vs. limit=15.0 2023-12-22 09:34:08,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=517906.6666666667, ans=0.0 2023-12-22 09:34:23,644 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=8.017e-03 2023-12-22 09:34:29,385 INFO [train.py:886] (1/4) Epoch 17, batch 1450, loss[loss=0.0117, audio_tagging_loss=0.0117, over 25000.00 frames. ], tot_loss[loss=0.01392, audio_tagging_loss=0.01392, over 4950014.64 frames. ], batch size: 100, lr: 6.45e-03, grad_scale: 64.0 2023-12-22 09:34:38,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=518040.0, ans=0.1 2023-12-22 09:35:00,423 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.489e+01 2.798e+01 2.926e+01 3.112e+01 3.814e+01, threshold=5.853e+01, percent-clipped=0.0 2023-12-22 09:35:08,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=518240.0, ans=0.0 2023-12-22 09:35:17,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=518306.6666666667, ans=0.0 2023-12-22 09:35:20,878 INFO [train.py:886] (1/4) Epoch 17, batch 1500, loss[loss=0.01257, audio_tagging_loss=0.01257, over 25000.00 frames. ], tot_loss[loss=0.01395, audio_tagging_loss=0.01395, over 4954121.70 frames. ], batch size: 100, lr: 6.45e-03, grad_scale: 64.0 2023-12-22 09:35:36,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=518440.0, ans=0.0 2023-12-22 09:35:43,135 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.24 vs. limit=22.5 2023-12-22 09:35:53,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=518573.3333333333, ans=0.0 2023-12-22 09:35:55,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=518573.3333333333, ans=0.125 2023-12-22 09:36:00,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=518640.0, ans=0.125 2023-12-22 09:36:01,168 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.39 vs. limit=12.0 2023-12-22 09:36:12,811 INFO [train.py:886] (1/4) Epoch 17, batch 1550, loss[loss=0.01263, audio_tagging_loss=0.01263, over 24750.00 frames. ], tot_loss[loss=0.01401, audio_tagging_loss=0.01401, over 4948962.81 frames. ], batch size: 99, lr: 6.44e-03, grad_scale: 64.0 2023-12-22 09:36:27,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=518773.3333333333, ans=0.125 2023-12-22 09:36:34,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=518840.0, ans=0.0 2023-12-22 09:36:35,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=518840.0, ans=0.1 2023-12-22 09:36:39,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=518840.0, ans=0.125 2023-12-22 09:36:44,192 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.447e+01 2.815e+01 2.986e+01 3.108e+01 3.824e+01, threshold=5.972e+01, percent-clipped=0.0 2023-12-22 09:36:52,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=518906.6666666667, ans=0.1 2023-12-22 09:36:53,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=518973.3333333333, ans=0.125 2023-12-22 09:36:53,840 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.51 vs. limit=22.5 2023-12-22 09:37:02,647 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.78 vs. limit=15.0 2023-12-22 09:37:03,983 INFO [train.py:886] (1/4) Epoch 17, batch 1600, loss[loss=0.01668, audio_tagging_loss=0.01668, over 24750.00 frames. ], tot_loss[loss=0.01417, audio_tagging_loss=0.01417, over 4942399.92 frames. ], batch size: 99, lr: 6.44e-03, grad_scale: 64.0 2023-12-22 09:37:09,976 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.01 vs. limit=15.0 2023-12-22 09:37:20,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=519106.6666666667, ans=0.125 2023-12-22 09:37:43,335 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.204e-01 2023-12-22 09:37:45,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=519306.6666666667, ans=0.0 2023-12-22 09:37:56,792 INFO [train.py:886] (1/4) Epoch 17, batch 1650, loss[loss=0.0112, audio_tagging_loss=0.0112, over 25000.00 frames. ], tot_loss[loss=0.01411, audio_tagging_loss=0.01411, over 4936732.04 frames. ], batch size: 100, lr: 6.44e-03, grad_scale: 64.0 2023-12-22 09:38:27,881 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.92 vs. limit=15.0 2023-12-22 09:38:28,292 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.523e+01 2.852e+01 2.976e+01 3.149e+01 3.480e+01, threshold=5.952e+01, percent-clipped=0.0 2023-12-22 09:38:41,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=519640.0, ans=0.05 2023-12-22 09:38:44,141 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=15.0 2023-12-22 09:38:48,454 INFO [train.py:886] (1/4) Epoch 17, batch 1700, loss[loss=0.01489, audio_tagging_loss=0.01489, over 24750.00 frames. ], tot_loss[loss=0.014, audio_tagging_loss=0.014, over 4937748.06 frames. ], batch size: 99, lr: 6.44e-03, grad_scale: 64.0 2023-12-22 09:38:51,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=519706.6666666667, ans=0.09899494936611666 2023-12-22 09:39:07,605 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.99 vs. limit=15.0 2023-12-22 09:39:11,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=519840.0, ans=0.125 2023-12-22 09:39:23,332 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.50 vs. limit=22.5 2023-12-22 09:39:30,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=519973.3333333333, ans=0.0 2023-12-22 09:39:31,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=519973.3333333333, ans=0.1 2023-12-22 09:39:34,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.27 vs. limit=15.0 2023-12-22 09:39:40,255 INFO [train.py:886] (1/4) Epoch 17, batch 1750, loss[loss=0.01166, audio_tagging_loss=0.01166, over 25000.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4944146.16 frames. ], batch size: 100, lr: 6.44e-03, grad_scale: 64.0 2023-12-22 09:39:53,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=520106.6666666667, ans=0.125 2023-12-22 09:40:06,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=520173.3333333333, ans=0.05 2023-12-22 09:40:11,184 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.491e+01 2.793e+01 2.974e+01 3.087e+01 3.674e+01, threshold=5.948e+01, percent-clipped=0.0 2023-12-22 09:40:14,310 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.888e-02 2023-12-22 09:40:20,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=520306.6666666667, ans=0.125 2023-12-22 09:40:23,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=520306.6666666667, ans=0.1 2023-12-22 09:40:28,434 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.54 vs. limit=22.5 2023-12-22 09:40:33,492 INFO [train.py:886] (1/4) Epoch 17, batch 1800, loss[loss=0.01232, audio_tagging_loss=0.01232, over 25000.00 frames. ], tot_loss[loss=0.01403, audio_tagging_loss=0.01403, over 4944030.71 frames. ], batch size: 100, lr: 6.43e-03, grad_scale: 64.0 2023-12-22 09:40:34,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=520373.3333333333, ans=0.125 2023-12-22 09:40:41,438 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 09:40:48,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=520440.0, ans=0.0 2023-12-22 09:40:56,107 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.82 vs. limit=15.0 2023-12-22 09:41:07,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=520573.3333333333, ans=0.125 2023-12-22 09:41:13,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=520640.0, ans=0.0 2023-12-22 09:41:23,653 INFO [train.py:886] (1/4) Epoch 17, batch 1850, loss[loss=0.01485, audio_tagging_loss=0.01485, over 24750.00 frames. ], tot_loss[loss=0.01414, audio_tagging_loss=0.01414, over 4941974.37 frames. ], batch size: 99, lr: 6.43e-03, grad_scale: 64.0 2023-12-22 09:41:25,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=520706.6666666667, ans=0.125 2023-12-22 09:41:46,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=520840.0, ans=0.0 2023-12-22 09:41:54,291 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.549e+01 2.846e+01 3.000e+01 3.166e+01 3.525e+01, threshold=6.000e+01, percent-clipped=0.0 2023-12-22 09:42:15,599 INFO [train.py:886] (1/4) Epoch 17, batch 1900, loss[loss=0.01261, audio_tagging_loss=0.01261, over 24750.00 frames. ], tot_loss[loss=0.01426, audio_tagging_loss=0.01426, over 4942043.48 frames. ], batch size: 99, lr: 6.43e-03, grad_scale: 64.0 2023-12-22 09:42:24,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=521106.6666666667, ans=0.1 2023-12-22 09:42:29,347 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.29 vs. limit=15.0 2023-12-22 09:42:33,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=521106.6666666667, ans=0.125 2023-12-22 09:42:46,921 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.14 vs. limit=22.5 2023-12-22 09:42:49,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=521240.0, ans=0.125 2023-12-22 09:42:58,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=521306.6666666667, ans=0.1 2023-12-22 09:43:06,669 INFO [train.py:886] (1/4) Epoch 17, batch 1950, loss[loss=0.01285, audio_tagging_loss=0.01285, over 25000.00 frames. ], tot_loss[loss=0.01426, audio_tagging_loss=0.01426, over 4943936.64 frames. ], batch size: 100, lr: 6.43e-03, grad_scale: 64.0 2023-12-22 09:43:07,298 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.43 vs. limit=15.0 2023-12-22 09:43:19,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=521440.0, ans=0.125 2023-12-22 09:43:28,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=521506.6666666667, ans=0.125 2023-12-22 09:43:37,155 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.529e+01 2.757e+01 2.929e+01 3.098e+01 3.613e+01, threshold=5.858e+01, percent-clipped=0.0 2023-12-22 09:43:40,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=521573.3333333333, ans=0.0 2023-12-22 09:43:57,566 INFO [train.py:886] (1/4) Epoch 17, batch 2000, loss[loss=0.01462, audio_tagging_loss=0.01462, over 24750.00 frames. ], tot_loss[loss=0.01421, audio_tagging_loss=0.01421, over 4944940.62 frames. ], batch size: 99, lr: 6.43e-03, grad_scale: 64.0 2023-12-22 09:43:57,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=521706.6666666667, ans=0.125 2023-12-22 09:44:03,723 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.36 vs. limit=10.0 2023-12-22 09:44:05,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=521706.6666666667, ans=0.2 2023-12-22 09:44:16,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=521773.3333333333, ans=0.1 2023-12-22 09:44:17,054 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2023-12-22 09:44:29,103 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.86 vs. limit=22.5 2023-12-22 09:44:38,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=521973.3333333333, ans=0.1 2023-12-22 09:44:42,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=521973.3333333333, ans=0.125 2023-12-22 09:44:49,067 INFO [train.py:886] (1/4) Epoch 17, batch 2050, loss[loss=0.01168, audio_tagging_loss=0.01168, over 25000.00 frames. ], tot_loss[loss=0.01408, audio_tagging_loss=0.01408, over 4947208.03 frames. ], batch size: 100, lr: 6.42e-03, grad_scale: 64.0 2023-12-22 09:44:52,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=522040.0, ans=0.025 2023-12-22 09:44:55,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=522040.0, ans=0.0 2023-12-22 09:44:56,825 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=15.0 2023-12-22 09:45:16,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=522173.3333333333, ans=0.1 2023-12-22 09:45:17,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=522173.3333333333, ans=0.0 2023-12-22 09:45:20,210 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.371e+01 2.771e+01 2.888e+01 3.046e+01 3.628e+01, threshold=5.776e+01, percent-clipped=0.0 2023-12-22 09:45:24,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=522240.0, ans=0.125 2023-12-22 09:45:40,156 INFO [train.py:886] (1/4) Epoch 17, batch 2100, loss[loss=0.01294, audio_tagging_loss=0.01294, over 25000.00 frames. ], tot_loss[loss=0.01403, audio_tagging_loss=0.01403, over 4948531.59 frames. ], batch size: 100, lr: 6.42e-03, grad_scale: 64.0 2023-12-22 09:46:01,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=522506.6666666667, ans=0.1 2023-12-22 09:46:03,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=522506.6666666667, ans=0.0 2023-12-22 09:46:05,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=522506.6666666667, ans=0.1 2023-12-22 09:46:28,892 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.17 vs. limit=8.0 2023-12-22 09:46:33,804 INFO [train.py:886] (1/4) Epoch 17, batch 2150, loss[loss=0.01557, audio_tagging_loss=0.01557, over 24750.00 frames. ], tot_loss[loss=0.01406, audio_tagging_loss=0.01406, over 4951381.01 frames. ], batch size: 99, lr: 6.42e-03, grad_scale: 64.0 2023-12-22 09:46:34,422 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.07 vs. limit=15.0 2023-12-22 09:46:46,469 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.08 vs. limit=15.0 2023-12-22 09:46:57,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=522840.0, ans=0.125 2023-12-22 09:47:04,314 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.575e+01 2.873e+01 2.993e+01 3.098e+01 3.427e+01, threshold=5.985e+01, percent-clipped=0.0 2023-12-22 09:47:12,153 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.29 vs. limit=15.0 2023-12-22 09:47:15,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=522973.3333333333, ans=0.125 2023-12-22 09:47:17,947 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.30 vs. limit=15.0 2023-12-22 09:47:25,519 INFO [train.py:886] (1/4) Epoch 17, batch 2200, loss[loss=0.01503, audio_tagging_loss=0.01503, over 24750.00 frames. ], tot_loss[loss=0.01418, audio_tagging_loss=0.01418, over 4945993.39 frames. ], batch size: 99, lr: 6.42e-03, grad_scale: 64.0 2023-12-22 09:47:51,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=523173.3333333333, ans=0.1 2023-12-22 09:47:52,792 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.69 vs. limit=15.0 2023-12-22 09:48:08,855 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.39 vs. limit=6.0 2023-12-22 09:48:17,376 INFO [train.py:886] (1/4) Epoch 17, batch 2250, loss[loss=0.01438, audio_tagging_loss=0.01438, over 25000.00 frames. ], tot_loss[loss=0.0142, audio_tagging_loss=0.0142, over 4948803.63 frames. ], batch size: 100, lr: 6.42e-03, grad_scale: 64.0 2023-12-22 09:48:22,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=523373.3333333333, ans=0.1 2023-12-22 09:48:22,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=523373.3333333333, ans=0.2 2023-12-22 09:48:45,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=523506.6666666667, ans=0.0 2023-12-22 09:48:48,952 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.505e+01 2.799e+01 2.927e+01 3.060e+01 4.133e+01, threshold=5.854e+01, percent-clipped=0.0 2023-12-22 09:48:49,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=523573.3333333333, ans=0.1 2023-12-22 09:48:52,314 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.32 vs. limit=22.5 2023-12-22 09:49:03,493 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.03 vs. limit=22.5 2023-12-22 09:49:10,376 INFO [train.py:886] (1/4) Epoch 17, batch 2300, loss[loss=0.0142, audio_tagging_loss=0.0142, over 24031.00 frames. ], tot_loss[loss=0.01409, audio_tagging_loss=0.01409, over 4949916.29 frames. ], batch size: 100, lr: 6.41e-03, grad_scale: 64.0 2023-12-22 09:49:36,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=523840.0, ans=0.2 2023-12-22 09:49:43,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=523906.6666666667, ans=0.0 2023-12-22 09:49:51,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.39 vs. limit=15.0 2023-12-22 09:49:55,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=523973.3333333333, ans=0.0 2023-12-22 09:49:58,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=523973.3333333333, ans=0.1 2023-12-22 09:50:02,413 INFO [train.py:886] (1/4) Epoch 17, batch 2350, loss[loss=0.01533, audio_tagging_loss=0.01533, over 25000.00 frames. ], tot_loss[loss=0.01399, audio_tagging_loss=0.01399, over 4955359.21 frames. ], batch size: 100, lr: 6.41e-03, grad_scale: 64.0 2023-12-22 09:50:05,538 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.94 vs. limit=15.0 2023-12-22 09:50:08,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=524040.0, ans=0.125 2023-12-22 09:50:09,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=524040.0, ans=0.125 2023-12-22 09:50:09,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=524040.0, ans=0.125 2023-12-22 09:50:11,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=524040.0, ans=0.125 2023-12-22 09:50:13,152 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.43 vs. limit=10.0 2023-12-22 09:50:22,701 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.83 vs. limit=15.0 2023-12-22 09:50:25,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=524173.3333333333, ans=0.125 2023-12-22 09:50:33,834 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.516e+01 2.768e+01 2.920e+01 3.076e+01 3.528e+01, threshold=5.841e+01, percent-clipped=0.0 2023-12-22 09:50:36,919 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.07 vs. limit=10.0 2023-12-22 09:50:49,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=524306.6666666666, ans=0.125 2023-12-22 09:50:54,129 INFO [train.py:886] (1/4) Epoch 17, batch 2400, loss[loss=0.01363, audio_tagging_loss=0.01363, over 25000.00 frames. ], tot_loss[loss=0.01394, audio_tagging_loss=0.01394, over 4957557.39 frames. ], batch size: 100, lr: 6.41e-03, grad_scale: 64.0 2023-12-22 09:51:11,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=524440.0, ans=0.125 2023-12-22 09:51:14,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=524506.6666666666, ans=0.1 2023-12-22 09:51:19,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=524506.6666666666, ans=0.2 2023-12-22 09:51:22,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=524506.6666666666, ans=0.125 2023-12-22 09:51:28,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=524573.3333333334, ans=0.0 2023-12-22 09:51:46,797 INFO [train.py:886] (1/4) Epoch 17, batch 2450, loss[loss=0.01495, audio_tagging_loss=0.01495, over 25000.00 frames. ], tot_loss[loss=0.01395, audio_tagging_loss=0.01395, over 4964643.43 frames. ], batch size: 100, lr: 6.41e-03, grad_scale: 64.0 2023-12-22 09:52:18,069 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.568e+01 2.810e+01 2.991e+01 3.131e+01 3.656e+01, threshold=5.983e+01, percent-clipped=0.0 2023-12-22 09:52:36,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=524973.3333333334, ans=0.125 2023-12-22 09:52:38,603 INFO [train.py:886] (1/4) Epoch 17, batch 2500, loss[loss=0.01476, audio_tagging_loss=0.01476, over 24750.00 frames. ], tot_loss[loss=0.01406, audio_tagging_loss=0.01406, over 4961502.74 frames. ], batch size: 99, lr: 6.40e-03, grad_scale: 64.0 2023-12-22 09:53:03,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=525173.3333333334, ans=0.125 2023-12-22 09:53:05,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=525173.3333333334, ans=0.0 2023-12-22 09:53:09,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=525240.0, ans=0.125 2023-12-22 09:53:15,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=525240.0, ans=0.04949747468305833 2023-12-22 09:53:16,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=525240.0, ans=0.125 2023-12-22 09:53:23,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=525306.6666666666, ans=0.125 2023-12-22 09:53:30,994 INFO [train.py:886] (1/4) Epoch 17, batch 2550, loss[loss=0.01314, audio_tagging_loss=0.01314, over 24750.00 frames. ], tot_loss[loss=0.01414, audio_tagging_loss=0.01414, over 4948596.18 frames. ], batch size: 99, lr: 6.40e-03, grad_scale: 64.0 2023-12-22 09:53:31,386 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.64 vs. limit=15.0 2023-12-22 09:53:34,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=525373.3333333334, ans=0.125 2023-12-22 09:53:39,584 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 09:53:43,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=525440.0, ans=0.0 2023-12-22 09:54:02,061 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.555e+01 2.815e+01 2.969e+01 3.145e+01 4.179e+01, threshold=5.937e+01, percent-clipped=0.0 2023-12-22 09:54:23,108 INFO [train.py:886] (1/4) Epoch 17, batch 2600, loss[loss=0.01456, audio_tagging_loss=0.01456, over 25000.00 frames. ], tot_loss[loss=0.01413, audio_tagging_loss=0.01413, over 4952326.76 frames. ], batch size: 100, lr: 6.40e-03, grad_scale: 64.0 2023-12-22 09:54:23,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=525706.6666666666, ans=0.0 2023-12-22 09:54:44,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=525840.0, ans=0.0 2023-12-22 09:55:00,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=525906.6666666666, ans=0.0 2023-12-22 09:55:01,872 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.99 vs. limit=15.0 2023-12-22 09:55:13,782 INFO [train.py:886] (1/4) Epoch 17, batch 2650, loss[loss=0.0146, audio_tagging_loss=0.0146, over 23975.00 frames. ], tot_loss[loss=0.01401, audio_tagging_loss=0.01401, over 4954884.96 frames. ], batch size: 100, lr: 6.40e-03, grad_scale: 64.0 2023-12-22 09:55:15,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=526040.0, ans=0.125 2023-12-22 09:55:18,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=526040.0, ans=0.125 2023-12-22 09:55:19,867 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.80 vs. limit=15.0 2023-12-22 09:55:24,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=526106.6666666666, ans=0.5 2023-12-22 09:55:26,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=526106.6666666666, ans=0.125 2023-12-22 09:55:28,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=526106.6666666666, ans=10.0 2023-12-22 09:55:28,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=526106.6666666666, ans=0.1 2023-12-22 09:55:36,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=526173.3333333334, ans=0.125 2023-12-22 09:55:38,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=526173.3333333334, ans=0.125 2023-12-22 09:55:44,585 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.500e+01 2.764e+01 2.889e+01 3.029e+01 3.511e+01, threshold=5.779e+01, percent-clipped=0.0 2023-12-22 09:55:53,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=526240.0, ans=0.125 2023-12-22 09:56:03,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.84 vs. limit=15.0 2023-12-22 09:56:06,518 INFO [train.py:886] (1/4) Epoch 17, batch 2700, loss[loss=0.01151, audio_tagging_loss=0.01151, over 24006.00 frames. ], tot_loss[loss=0.01405, audio_tagging_loss=0.01405, over 4955585.81 frames. ], batch size: 100, lr: 6.40e-03, grad_scale: 64.0 2023-12-22 09:56:06,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=526373.3333333334, ans=0.125 2023-12-22 09:56:16,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=526440.0, ans=0.0 2023-12-22 09:56:20,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.00 vs. limit=15.0 2023-12-22 09:56:25,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=526506.6666666666, ans=15.0 2023-12-22 09:56:39,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=526573.3333333334, ans=0.125 2023-12-22 09:56:40,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=526573.3333333334, ans=0.1 2023-12-22 09:56:42,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=526573.3333333334, ans=0.125 2023-12-22 09:56:47,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=526640.0, ans=0.125 2023-12-22 09:56:57,723 INFO [train.py:886] (1/4) Epoch 17, batch 2750, loss[loss=0.01315, audio_tagging_loss=0.01315, over 24750.00 frames. ], tot_loss[loss=0.01405, audio_tagging_loss=0.01405, over 4956955.18 frames. ], batch size: 99, lr: 6.39e-03, grad_scale: 64.0 2023-12-22 09:56:57,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=526706.6666666666, ans=0.1 2023-12-22 09:57:00,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=526706.6666666666, ans=0.125 2023-12-22 09:57:00,821 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.18 vs. limit=15.0 2023-12-22 09:57:05,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=526706.6666666666, ans=0.0 2023-12-22 09:57:13,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=526773.3333333334, ans=0.0 2023-12-22 09:57:22,076 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.01 vs. limit=22.5 2023-12-22 09:57:28,224 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.564e+01 2.829e+01 2.970e+01 3.111e+01 3.407e+01, threshold=5.940e+01, percent-clipped=0.0 2023-12-22 09:57:35,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=526906.6666666666, ans=0.1 2023-12-22 09:57:50,857 INFO [train.py:886] (1/4) Epoch 17, batch 2800, loss[loss=0.016, audio_tagging_loss=0.016, over 24750.00 frames. ], tot_loss[loss=0.01406, audio_tagging_loss=0.01406, over 4953653.97 frames. ], batch size: 99, lr: 6.39e-03, grad_scale: 64.0 2023-12-22 09:57:53,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=527040.0, ans=10.0 2023-12-22 09:58:00,139 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.57 vs. limit=10.0 2023-12-22 09:58:02,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=527106.6666666666, ans=0.125 2023-12-22 09:58:04,194 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 09:58:05,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=527106.6666666666, ans=0.1 2023-12-22 09:58:32,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=527306.6666666666, ans=0.125 2023-12-22 09:58:43,624 INFO [train.py:886] (1/4) Epoch 17, batch 2850, loss[loss=0.01809, audio_tagging_loss=0.01809, over 24750.00 frames. ], tot_loss[loss=0.01415, audio_tagging_loss=0.01415, over 4947339.15 frames. ], batch size: 99, lr: 6.39e-03, grad_scale: 64.0 2023-12-22 09:58:45,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=527373.3333333334, ans=0.125 2023-12-22 09:58:48,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=527373.3333333334, ans=0.2 2023-12-22 09:58:48,557 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.83 vs. limit=15.0 2023-12-22 09:58:57,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=527440.0, ans=0.125 2023-12-22 09:58:59,131 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.04 vs. limit=15.0 2023-12-22 09:59:05,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=527506.6666666666, ans=0.035 2023-12-22 09:59:12,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=527506.6666666666, ans=0.0 2023-12-22 09:59:14,230 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.574e+01 2.860e+01 2.998e+01 3.153e+01 3.451e+01, threshold=5.997e+01, percent-clipped=0.0 2023-12-22 09:59:14,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=527573.3333333334, ans=0.0 2023-12-22 09:59:23,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=527573.3333333334, ans=0.1 2023-12-22 09:59:25,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=527640.0, ans=0.0 2023-12-22 09:59:34,625 INFO [train.py:886] (1/4) Epoch 17, batch 2900, loss[loss=0.01376, audio_tagging_loss=0.01376, over 21434.00 frames. ], tot_loss[loss=0.01407, audio_tagging_loss=0.01407, over 4941983.67 frames. ], batch size: 107, lr: 6.39e-03, grad_scale: 64.0 2023-12-22 09:59:34,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=527706.6666666666, ans=0.1 2023-12-22 10:00:26,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=528040.0, ans=0.125 2023-12-22 10:00:27,718 INFO [train.py:886] (1/4) Epoch 17, batch 2950, loss[loss=0.01672, audio_tagging_loss=0.01672, over 25000.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4941479.81 frames. ], batch size: 100, lr: 6.39e-03, grad_scale: 64.0 2023-12-22 10:00:39,648 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.54 vs. limit=22.5 2023-12-22 10:00:46,124 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.67 vs. limit=6.0 2023-12-22 10:00:47,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=528106.6666666666, ans=15.0 2023-12-22 10:00:52,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=528173.3333333334, ans=0.0 2023-12-22 10:00:59,346 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.553e+01 2.819e+01 2.962e+01 3.151e+01 3.518e+01, threshold=5.925e+01, percent-clipped=0.0 2023-12-22 10:01:13,403 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.77 vs. limit=22.5 2023-12-22 10:01:15,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=528306.6666666666, ans=0.0 2023-12-22 10:01:15,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=528306.6666666666, ans=0.0 2023-12-22 10:01:16,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=528306.6666666666, ans=0.125 2023-12-22 10:01:19,352 INFO [train.py:886] (1/4) Epoch 17, batch 3000, loss[loss=0.01442, audio_tagging_loss=0.01442, over 25000.00 frames. ], tot_loss[loss=0.01392, audio_tagging_loss=0.01392, over 4941230.28 frames. ], batch size: 100, lr: 6.38e-03, grad_scale: 64.0 2023-12-22 10:01:19,352 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 10:01:31,028 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.1634, 0.8449, 4.3232, 4.3779], device='cuda:1') 2023-12-22 10:01:40,065 INFO [train.py:917] (1/4) Epoch 17, validation: loss=0.03336, audio_tagging_loss=0.03336, over 3737520.00 frames. 2023-12-22 10:01:40,065 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 10:01:46,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=528373.3333333334, ans=0.125 2023-12-22 10:01:55,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=528440.0, ans=0.0 2023-12-22 10:02:06,130 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.58 vs. limit=15.0 2023-12-22 10:02:33,299 INFO [train.py:886] (1/4) Epoch 17, batch 3050, loss[loss=0.01311, audio_tagging_loss=0.01311, over 25000.00 frames. ], tot_loss[loss=0.01381, audio_tagging_loss=0.01381, over 4948142.70 frames. ], batch size: 100, lr: 6.38e-03, grad_scale: 64.0 2023-12-22 10:02:48,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=528773.3333333334, ans=0.1 2023-12-22 10:02:51,311 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.39 vs. limit=15.0 2023-12-22 10:02:57,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=528840.0, ans=0.125 2023-12-22 10:03:00,379 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.54 vs. limit=12.0 2023-12-22 10:03:01,306 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2023-12-22 10:03:03,845 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.567e+01 2.845e+01 2.964e+01 3.082e+01 4.137e+01, threshold=5.927e+01, percent-clipped=0.0 2023-12-22 10:03:12,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=528906.6666666666, ans=0.2 2023-12-22 10:03:25,023 INFO [train.py:886] (1/4) Epoch 17, batch 3100, loss[loss=0.01399, audio_tagging_loss=0.01399, over 24750.00 frames. ], tot_loss[loss=0.01395, audio_tagging_loss=0.01395, over 4941915.95 frames. ], batch size: 99, lr: 6.38e-03, grad_scale: 64.0 2023-12-22 10:03:34,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=529106.6666666666, ans=15.0 2023-12-22 10:03:55,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=529240.0, ans=0.0 2023-12-22 10:04:17,572 INFO [train.py:886] (1/4) Epoch 17, batch 3150, loss[loss=0.01442, audio_tagging_loss=0.01442, over 24750.00 frames. ], tot_loss[loss=0.01405, audio_tagging_loss=0.01405, over 4937303.14 frames. ], batch size: 99, lr: 6.38e-03, grad_scale: 64.0 2023-12-22 10:04:27,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=529440.0, ans=0.125 2023-12-22 10:04:29,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=529440.0, ans=0.2 2023-12-22 10:04:31,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=529440.0, ans=0.1 2023-12-22 10:04:49,159 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.542e+01 2.844e+01 2.967e+01 3.134e+01 3.611e+01, threshold=5.935e+01, percent-clipped=0.0 2023-12-22 10:04:51,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=529573.3333333334, ans=0.125 2023-12-22 10:04:55,362 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.31 vs. limit=15.0 2023-12-22 10:05:02,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=529640.0, ans=0.125 2023-12-22 10:05:04,816 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.60 vs. limit=22.5 2023-12-22 10:05:09,800 INFO [train.py:886] (1/4) Epoch 17, batch 3200, loss[loss=0.01353, audio_tagging_loss=0.01353, over 24750.00 frames. ], tot_loss[loss=0.01413, audio_tagging_loss=0.01413, over 4933799.65 frames. ], batch size: 99, lr: 6.38e-03, grad_scale: 64.0 2023-12-22 10:05:15,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=529706.6666666666, ans=0.1 2023-12-22 10:05:16,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=529706.6666666666, ans=0.0 2023-12-22 10:05:19,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=529773.3333333334, ans=0.0 2023-12-22 10:05:26,439 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.73 vs. limit=15.0 2023-12-22 10:05:55,569 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.44 vs. limit=15.0 2023-12-22 10:06:01,803 INFO [train.py:886] (1/4) Epoch 17, batch 3250, loss[loss=0.01414, audio_tagging_loss=0.01414, over 25000.00 frames. ], tot_loss[loss=0.01408, audio_tagging_loss=0.01408, over 4936977.01 frames. ], batch size: 100, lr: 6.37e-03, grad_scale: 64.0 2023-12-22 10:06:14,506 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.74 vs. limit=22.5 2023-12-22 10:06:29,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=530173.3333333334, ans=0.125 2023-12-22 10:06:32,360 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.440e+01 2.816e+01 2.913e+01 3.117e+01 3.932e+01, threshold=5.825e+01, percent-clipped=0.0 2023-12-22 10:06:35,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=530240.0, ans=15.0 2023-12-22 10:06:42,628 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.56 vs. limit=10.0 2023-12-22 10:06:43,917 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 10:06:49,969 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.13 vs. limit=10.0 2023-12-22 10:06:52,040 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.85 vs. limit=10.0 2023-12-22 10:06:53,281 INFO [train.py:886] (1/4) Epoch 17, batch 3300, loss[loss=0.01136, audio_tagging_loss=0.01136, over 25000.00 frames. ], tot_loss[loss=0.01402, audio_tagging_loss=0.01402, over 4935540.59 frames. ], batch size: 100, lr: 6.37e-03, grad_scale: 64.0 2023-12-22 10:06:56,911 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.17 vs. limit=15.0 2023-12-22 10:07:08,930 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=15.0 2023-12-22 10:07:10,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=530440.0, ans=0.125 2023-12-22 10:07:45,506 INFO [train.py:886] (1/4) Epoch 17, batch 3350, loss[loss=0.01239, audio_tagging_loss=0.01239, over 24750.00 frames. ], tot_loss[loss=0.01388, audio_tagging_loss=0.01388, over 4938854.32 frames. ], batch size: 99, lr: 6.37e-03, grad_scale: 128.0 2023-12-22 10:07:55,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=530773.3333333334, ans=0.125 2023-12-22 10:08:13,610 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.15 vs. limit=15.0 2023-12-22 10:08:14,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=530840.0, ans=0.0 2023-12-22 10:08:17,807 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.375e+01 2.799e+01 2.966e+01 3.159e+01 3.514e+01, threshold=5.932e+01, percent-clipped=0.0 2023-12-22 10:08:28,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=530973.3333333334, ans=0.125 2023-12-22 10:08:36,134 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.05 vs. limit=6.0 2023-12-22 10:08:36,657 INFO [train.py:886] (1/4) Epoch 17, batch 3400, loss[loss=0.0137, audio_tagging_loss=0.0137, over 25000.00 frames. ], tot_loss[loss=0.01391, audio_tagging_loss=0.01391, over 4939372.67 frames. ], batch size: 100, lr: 6.37e-03, grad_scale: 64.0 2023-12-22 10:09:02,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=531173.3333333334, ans=0.0 2023-12-22 10:09:02,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=531173.3333333334, ans=15.0 2023-12-22 10:09:05,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=531173.3333333334, ans=0.125 2023-12-22 10:09:07,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=531240.0, ans=0.05 2023-12-22 10:09:16,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=531240.0, ans=0.0 2023-12-22 10:09:30,004 INFO [train.py:886] (1/4) Epoch 17, batch 3450, loss[loss=0.01512, audio_tagging_loss=0.01512, over 24750.00 frames. ], tot_loss[loss=0.01399, audio_tagging_loss=0.01399, over 4940734.76 frames. ], batch size: 99, lr: 6.37e-03, grad_scale: 64.0 2023-12-22 10:09:35,455 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.23 vs. limit=15.0 2023-12-22 10:09:41,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=531440.0, ans=0.0 2023-12-22 10:10:02,502 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.535e+01 2.890e+01 3.022e+01 3.190e+01 3.509e+01, threshold=6.045e+01, percent-clipped=0.0 2023-12-22 10:10:02,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=531573.3333333334, ans=0.125 2023-12-22 10:10:12,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=531640.0, ans=0.125 2023-12-22 10:10:16,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=531640.0, ans=0.1 2023-12-22 10:10:21,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=531640.0, ans=0.2 2023-12-22 10:10:23,156 INFO [train.py:886] (1/4) Epoch 17, batch 3500, loss[loss=0.0146, audio_tagging_loss=0.0146, over 24750.00 frames. ], tot_loss[loss=0.01413, audio_tagging_loss=0.01413, over 4937755.95 frames. ], batch size: 99, lr: 6.37e-03, grad_scale: 64.0 2023-12-22 10:10:24,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.22 vs. limit=22.5 2023-12-22 10:10:27,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=531706.6666666666, ans=0.125 2023-12-22 10:10:36,530 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 10:10:37,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=531773.3333333334, ans=0.1 2023-12-22 10:10:44,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=531840.0, ans=0.2 2023-12-22 10:10:44,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=531840.0, ans=10.0 2023-12-22 10:10:47,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=531840.0, ans=0.0 2023-12-22 10:10:50,015 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2023-12-22 10:11:02,705 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.28 vs. limit=15.0 2023-12-22 10:11:12,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=531973.3333333334, ans=0.125 2023-12-22 10:11:14,709 INFO [train.py:886] (1/4) Epoch 17, batch 3550, loss[loss=0.01258, audio_tagging_loss=0.01258, over 24750.00 frames. ], tot_loss[loss=0.014, audio_tagging_loss=0.014, over 4941313.15 frames. ], batch size: 99, lr: 6.36e-03, grad_scale: 64.0 2023-12-22 10:11:19,276 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.65 vs. limit=6.0 2023-12-22 10:11:31,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=532106.6666666666, ans=0.125 2023-12-22 10:11:36,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=532173.3333333334, ans=0.125 2023-12-22 10:11:36,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=532173.3333333334, ans=0.125 2023-12-22 10:11:46,396 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.431e+01 2.820e+01 2.970e+01 3.113e+01 4.129e+01, threshold=5.940e+01, percent-clipped=0.0 2023-12-22 10:11:49,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=532240.0, ans=0.125 2023-12-22 10:11:57,437 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.98 vs. limit=15.0 2023-12-22 10:12:02,691 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 10:12:06,050 INFO [train.py:886] (1/4) Epoch 17, batch 3600, loss[loss=0.01417, audio_tagging_loss=0.01417, over 25000.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4948785.08 frames. ], batch size: 100, lr: 6.36e-03, grad_scale: 64.0 2023-12-22 10:12:15,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=532440.0, ans=0.1 2023-12-22 10:12:24,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=532506.6666666666, ans=0.125 2023-12-22 10:12:32,983 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.90 vs. limit=12.0 2023-12-22 10:12:42,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=532573.3333333334, ans=0.125 2023-12-22 10:12:55,933 INFO [train.py:886] (1/4) Epoch 17, batch 3650, loss[loss=0.01198, audio_tagging_loss=0.01198, over 25000.00 frames. ], tot_loss[loss=0.0139, audio_tagging_loss=0.0139, over 4947572.94 frames. ], batch size: 100, lr: 6.36e-03, grad_scale: 64.0 2023-12-22 10:12:56,545 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.87 vs. limit=22.5 2023-12-22 10:12:56,606 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.19 vs. limit=6.0 2023-12-22 10:13:21,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=532840.0, ans=10.0 2023-12-22 10:13:22,272 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 10:13:27,615 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.518e+01 2.770e+01 2.907e+01 3.018e+01 3.500e+01, threshold=5.815e+01, percent-clipped=0.0 2023-12-22 10:13:34,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=532906.6666666666, ans=0.125 2023-12-22 10:13:36,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=532906.6666666666, ans=0.2 2023-12-22 10:13:48,030 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.27 vs. limit=22.5 2023-12-22 10:13:48,600 INFO [train.py:886] (1/4) Epoch 17, batch 3700, loss[loss=0.01549, audio_tagging_loss=0.01549, over 25000.00 frames. ], tot_loss[loss=0.01399, audio_tagging_loss=0.01399, over 4952953.78 frames. ], batch size: 100, lr: 6.36e-03, grad_scale: 64.0 2023-12-22 10:13:49,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=533040.0, ans=0.2 2023-12-22 10:13:57,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=533106.6666666666, ans=0.125 2023-12-22 10:14:00,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=533106.6666666666, ans=0.0 2023-12-22 10:14:24,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=533240.0, ans=0.0 2023-12-22 10:14:25,233 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.23 vs. limit=15.0 2023-12-22 10:14:26,168 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.40 vs. limit=6.0 2023-12-22 10:14:37,467 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.26 vs. limit=10.0 2023-12-22 10:14:42,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=533373.3333333334, ans=0.125 2023-12-22 10:14:43,268 INFO [train.py:886] (1/4) Epoch 17, batch 3750, loss[loss=0.01425, audio_tagging_loss=0.01425, over 25000.00 frames. ], tot_loss[loss=0.01403, audio_tagging_loss=0.01403, over 4955362.19 frames. ], batch size: 100, lr: 6.36e-03, grad_scale: 64.0 2023-12-22 10:14:48,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=533373.3333333334, ans=0.1 2023-12-22 10:15:14,480 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.78 vs. limit=6.0 2023-12-22 10:15:14,892 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.583e+01 2.851e+01 2.968e+01 3.092e+01 3.872e+01, threshold=5.935e+01, percent-clipped=0.0 2023-12-22 10:15:26,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=533640.0, ans=0.0 2023-12-22 10:15:33,549 INFO [train.py:886] (1/4) Epoch 17, batch 3800, loss[loss=0.01573, audio_tagging_loss=0.01573, over 25000.00 frames. ], tot_loss[loss=0.01409, audio_tagging_loss=0.01409, over 4949873.65 frames. ], batch size: 100, lr: 6.35e-03, grad_scale: 64.0 2023-12-22 10:15:33,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=533706.6666666666, ans=0.1 2023-12-22 10:15:42,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=533706.6666666666, ans=0.0 2023-12-22 10:15:58,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=533840.0, ans=0.0 2023-12-22 10:16:10,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=533906.6666666666, ans=0.1 2023-12-22 10:16:26,369 INFO [train.py:886] (1/4) Epoch 17, batch 3850, loss[loss=0.01344, audio_tagging_loss=0.01344, over 25000.00 frames. ], tot_loss[loss=0.0141, audio_tagging_loss=0.0141, over 4946725.35 frames. ], batch size: 100, lr: 6.35e-03, grad_scale: 64.0 2023-12-22 10:16:28,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=534040.0, ans=12.0 2023-12-22 10:16:38,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=534106.6666666666, ans=0.125 2023-12-22 10:16:58,071 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.583e+01 2.795e+01 2.953e+01 3.065e+01 3.551e+01, threshold=5.907e+01, percent-clipped=0.0 2023-12-22 10:16:58,664 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.75 vs. limit=6.0 2023-12-22 10:17:16,798 INFO [train.py:886] (1/4) Epoch 17, batch 3900, loss[loss=0.01136, audio_tagging_loss=0.01136, over 24750.00 frames. ], tot_loss[loss=0.01396, audio_tagging_loss=0.01396, over 4945057.63 frames. ], batch size: 99, lr: 6.35e-03, grad_scale: 64.0 2023-12-22 10:17:43,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=534506.6666666666, ans=0.2 2023-12-22 10:17:58,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=534640.0, ans=0.2 2023-12-22 10:18:08,580 INFO [train.py:886] (1/4) Epoch 17, batch 3950, loss[loss=0.01444, audio_tagging_loss=0.01444, over 25000.00 frames. ], tot_loss[loss=0.01389, audio_tagging_loss=0.01389, over 4948341.45 frames. ], batch size: 100, lr: 6.35e-03, grad_scale: 64.0 2023-12-22 10:18:17,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=534773.3333333334, ans=0.1 2023-12-22 10:18:20,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=534773.3333333334, ans=0.125 2023-12-22 10:18:32,572 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.39 vs. limit=10.0 2023-12-22 10:18:39,648 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.439e+01 2.778e+01 2.901e+01 3.094e+01 3.779e+01, threshold=5.801e+01, percent-clipped=0.0 2023-12-22 10:18:45,810 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.78 vs. limit=15.0 2023-12-22 10:18:48,914 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.66 vs. limit=6.0 2023-12-22 10:18:58,647 INFO [train.py:886] (1/4) Epoch 17, batch 4000, loss[loss=0.01434, audio_tagging_loss=0.01434, over 25000.00 frames. ], tot_loss[loss=0.01392, audio_tagging_loss=0.01392, over 4949850.89 frames. ], batch size: 100, lr: 6.35e-03, grad_scale: 64.0 2023-12-22 10:19:06,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=535040.0, ans=0.1 2023-12-22 10:19:15,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=535106.6666666666, ans=0.0 2023-12-22 10:19:15,927 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.90 vs. limit=22.5 2023-12-22 10:19:16,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=535106.6666666666, ans=0.125 2023-12-22 10:19:49,779 INFO [train.py:886] (1/4) Epoch 17, batch 4050, loss[loss=0.01342, audio_tagging_loss=0.01342, over 24750.00 frames. ], tot_loss[loss=0.01408, audio_tagging_loss=0.01408, over 4952132.14 frames. ], batch size: 99, lr: 6.34e-03, grad_scale: 64.0 2023-12-22 10:19:56,453 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.91 vs. limit=15.0 2023-12-22 10:20:04,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=535440.0, ans=0.1 2023-12-22 10:20:22,060 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.511e+01 2.893e+01 2.988e+01 3.133e+01 3.578e+01, threshold=5.975e+01, percent-clipped=0.0 2023-12-22 10:20:25,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=535573.3333333334, ans=0.125 2023-12-22 10:20:35,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=535640.0, ans=0.0 2023-12-22 10:20:42,128 INFO [train.py:886] (1/4) Epoch 17, batch 4100, loss[loss=0.01395, audio_tagging_loss=0.01395, over 25000.00 frames. ], tot_loss[loss=0.01415, audio_tagging_loss=0.01415, over 4949104.34 frames. ], batch size: 100, lr: 6.34e-03, grad_scale: 64.0 2023-12-22 10:21:04,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=535840.0, ans=0.2 2023-12-22 10:21:32,761 INFO [train.py:886] (1/4) Epoch 17, batch 4150, loss[loss=0.01174, audio_tagging_loss=0.01174, over 24750.00 frames. ], tot_loss[loss=0.0142, audio_tagging_loss=0.0142, over 4950883.33 frames. ], batch size: 99, lr: 6.34e-03, grad_scale: 64.0 2023-12-22 10:21:45,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=536106.6666666666, ans=0.2 2023-12-22 10:22:03,158 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.33 vs. limit=15.0 2023-12-22 10:22:04,549 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.507e+01 2.856e+01 2.974e+01 3.122e+01 3.730e+01, threshold=5.949e+01, percent-clipped=0.0 2023-12-22 10:22:16,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=536306.6666666666, ans=0.125 2023-12-22 10:22:24,187 INFO [train.py:886] (1/4) Epoch 17, batch 4200, loss[loss=0.01235, audio_tagging_loss=0.01235, over 25000.00 frames. ], tot_loss[loss=0.01401, audio_tagging_loss=0.01401, over 4950594.08 frames. ], batch size: 100, lr: 6.34e-03, grad_scale: 64.0 2023-12-22 10:22:59,084 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.53 vs. limit=10.0 2023-12-22 10:23:08,947 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.37 vs. limit=15.0 2023-12-22 10:23:14,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=536706.6666666666, ans=0.125 2023-12-22 10:23:15,970 INFO [train.py:886] (1/4) Epoch 17, batch 4250, loss[loss=0.0156, audio_tagging_loss=0.0156, over 25000.00 frames. ], tot_loss[loss=0.01395, audio_tagging_loss=0.01395, over 4954278.25 frames. ], batch size: 100, lr: 6.34e-03, grad_scale: 64.0 2023-12-22 10:23:16,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=536706.6666666666, ans=0.1 2023-12-22 10:23:18,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=536706.6666666666, ans=15.0 2023-12-22 10:23:28,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=536773.3333333334, ans=0.0 2023-12-22 10:23:32,664 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.14 vs. limit=6.0 2023-12-22 10:23:34,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=536773.3333333334, ans=0.2 2023-12-22 10:23:47,886 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.588e+01 2.846e+01 2.960e+01 3.073e+01 3.606e+01, threshold=5.921e+01, percent-clipped=0.0 2023-12-22 10:23:50,396 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.08 vs. limit=15.0 2023-12-22 10:23:52,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=536906.6666666666, ans=10.0 2023-12-22 10:24:06,825 INFO [train.py:886] (1/4) Epoch 17, batch 4300, loss[loss=0.009133, audio_tagging_loss=0.009133, over 25000.00 frames. ], tot_loss[loss=0.0139, audio_tagging_loss=0.0139, over 4958575.29 frames. ], batch size: 100, lr: 6.33e-03, grad_scale: 64.0 2023-12-22 10:24:07,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=537040.0, ans=0.2 2023-12-22 10:24:20,105 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.34 vs. limit=15.0 2023-12-22 10:24:23,693 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.59 vs. limit=6.0 2023-12-22 10:24:30,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=537173.3333333334, ans=0.125 2023-12-22 10:24:35,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=537173.3333333334, ans=0.2 2023-12-22 10:24:37,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=537240.0, ans=0.125 2023-12-22 10:24:38,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=537240.0, ans=0.125 2023-12-22 10:24:50,497 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=15.0 2023-12-22 10:24:59,304 INFO [train.py:886] (1/4) Epoch 17, batch 4350, loss[loss=0.0145, audio_tagging_loss=0.0145, over 25000.00 frames. ], tot_loss[loss=0.01397, audio_tagging_loss=0.01397, over 4962994.80 frames. ], batch size: 100, lr: 6.33e-03, grad_scale: 64.0 2023-12-22 10:24:59,758 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.27 vs. limit=12.0 2023-12-22 10:25:08,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=537440.0, ans=0.125 2023-12-22 10:25:18,073 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.83 vs. limit=15.0 2023-12-22 10:25:28,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=537506.6666666666, ans=0.125 2023-12-22 10:25:30,834 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.572e+01 2.833e+01 2.974e+01 3.137e+01 3.666e+01, threshold=5.948e+01, percent-clipped=0.0 2023-12-22 10:25:38,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=537573.3333333334, ans=0.125 2023-12-22 10:25:48,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=537640.0, ans=0.04949747468305833 2023-12-22 10:25:50,430 INFO [train.py:886] (1/4) Epoch 17, batch 4400, loss[loss=0.01474, audio_tagging_loss=0.01474, over 24054.00 frames. ], tot_loss[loss=0.01411, audio_tagging_loss=0.01411, over 4959383.94 frames. ], batch size: 100, lr: 6.33e-03, grad_scale: 64.0 2023-12-22 10:25:51,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=537706.6666666666, ans=0.2 2023-12-22 10:26:03,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=537773.3333333334, ans=0.0 2023-12-22 10:26:04,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=537773.3333333334, ans=0.125 2023-12-22 10:26:41,582 INFO [train.py:886] (1/4) Epoch 17, batch 4450, loss[loss=0.01239, audio_tagging_loss=0.01239, over 25000.00 frames. ], tot_loss[loss=0.01421, audio_tagging_loss=0.01421, over 4956474.15 frames. ], batch size: 100, lr: 6.33e-03, grad_scale: 64.0 2023-12-22 10:26:41,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=538040.0, ans=0.0 2023-12-22 10:26:41,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=538040.0, ans=0.2 2023-12-22 10:26:42,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=538040.0, ans=0.125 2023-12-22 10:26:50,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=538106.6666666666, ans=0.125 2023-12-22 10:26:55,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=538106.6666666666, ans=0.125 2023-12-22 10:26:57,064 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.58 vs. limit=22.5 2023-12-22 10:26:58,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=538106.6666666666, ans=0.0 2023-12-22 10:27:04,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=538173.3333333334, ans=0.0 2023-12-22 10:27:13,178 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.528e+01 2.814e+01 2.976e+01 3.129e+01 3.598e+01, threshold=5.951e+01, percent-clipped=0.0 2023-12-22 10:27:28,826 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.37 vs. limit=12.0 2023-12-22 10:27:32,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=538373.3333333334, ans=0.1 2023-12-22 10:27:32,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=538373.3333333334, ans=0.125 2023-12-22 10:27:32,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=538373.3333333334, ans=0.0 2023-12-22 10:27:32,757 INFO [train.py:886] (1/4) Epoch 17, batch 4500, loss[loss=0.01604, audio_tagging_loss=0.01604, over 25000.00 frames. ], tot_loss[loss=0.0141, audio_tagging_loss=0.0141, over 4954350.38 frames. ], batch size: 100, lr: 6.33e-03, grad_scale: 64.0 2023-12-22 10:27:37,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=538373.3333333334, ans=0.0 2023-12-22 10:27:41,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=538440.0, ans=0.125 2023-12-22 10:27:42,821 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.82 vs. limit=22.5 2023-12-22 10:27:51,807 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.92 vs. limit=15.0 2023-12-22 10:27:56,686 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.30 vs. limit=15.0 2023-12-22 10:28:24,642 INFO [train.py:886] (1/4) Epoch 17, batch 4550, loss[loss=0.01361, audio_tagging_loss=0.01361, over 22586.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4956883.65 frames. ], batch size: 107, lr: 6.32e-03, grad_scale: 64.0 2023-12-22 10:28:24,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=538706.6666666666, ans=0.125 2023-12-22 10:28:36,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=538773.3333333334, ans=0.0 2023-12-22 10:28:52,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=538840.0, ans=0.1 2023-12-22 10:28:57,008 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.472e+01 2.801e+01 2.924e+01 3.059e+01 3.634e+01, threshold=5.849e+01, percent-clipped=0.0 2023-12-22 10:29:03,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=538906.6666666666, ans=0.125 2023-12-22 10:29:11,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=538973.3333333334, ans=0.1 2023-12-22 10:29:16,009 INFO [train.py:886] (1/4) Epoch 17, batch 4600, loss[loss=0.01271, audio_tagging_loss=0.01271, over 24750.00 frames. ], tot_loss[loss=0.01397, audio_tagging_loss=0.01397, over 4952647.10 frames. ], batch size: 99, lr: 6.32e-03, grad_scale: 64.0 2023-12-22 10:29:20,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=539040.0, ans=0.125 2023-12-22 10:29:22,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=539040.0, ans=0.1 2023-12-22 10:29:34,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=539106.6666666666, ans=0.125 2023-12-22 10:29:44,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=539173.3333333334, ans=0.1 2023-12-22 10:29:48,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=539240.0, ans=0.09899494936611666 2023-12-22 10:29:50,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=539240.0, ans=0.125 2023-12-22 10:30:04,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=539306.6666666666, ans=0.125 2023-12-22 10:30:08,713 INFO [train.py:886] (1/4) Epoch 17, batch 4650, loss[loss=0.01499, audio_tagging_loss=0.01499, over 25000.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4956174.86 frames. ], batch size: 100, lr: 6.32e-03, grad_scale: 64.0 2023-12-22 10:30:20,533 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.15 vs. limit=6.0 2023-12-22 10:30:23,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=539440.0, ans=0.2 2023-12-22 10:30:31,917 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 10:30:38,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=539506.6666666666, ans=0.2 2023-12-22 10:30:38,262 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 10:30:41,873 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.569e+01 2.874e+01 3.014e+01 3.111e+01 3.634e+01, threshold=6.028e+01, percent-clipped=0.0 2023-12-22 10:30:52,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=539640.0, ans=0.2 2023-12-22 10:30:58,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=539640.0, ans=0.2 2023-12-22 10:31:00,407 INFO [train.py:886] (1/4) Epoch 17, batch 4700, loss[loss=0.01713, audio_tagging_loss=0.01713, over 24750.00 frames. ], tot_loss[loss=0.01407, audio_tagging_loss=0.01407, over 4957816.27 frames. ], batch size: 99, lr: 6.32e-03, grad_scale: 64.0 2023-12-22 10:31:00,942 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.40 vs. limit=22.5 2023-12-22 10:31:03,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=539706.6666666666, ans=0.05 2023-12-22 10:31:12,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=539773.3333333334, ans=0.1 2023-12-22 10:31:13,873 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.66 vs. limit=15.0 2023-12-22 10:31:48,496 INFO [train.py:886] (1/4) Epoch 17, batch 4750, loss[loss=0.01283, audio_tagging_loss=0.01283, over 24750.00 frames. ], tot_loss[loss=0.01411, audio_tagging_loss=0.01411, over 4952968.42 frames. ], batch size: 99, lr: 6.32e-03, grad_scale: 64.0 2023-12-22 10:31:55,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=540040.0, ans=0.125 2023-12-22 10:32:26,601 INFO [train.py:886] (1/4) Epoch 18, batch 0, loss[loss=0.03424, audio_tagging_loss=0.03424, over 24014.00 frames. ], tot_loss[loss=0.03424, audio_tagging_loss=0.03424, over 24014.00 frames. ], batch size: 100, lr: 6.14e-03, grad_scale: 32.0 2023-12-22 10:32:26,601 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 10:32:47,814 INFO [train.py:917] (1/4) Epoch 18, validation: loss=0.03336, audio_tagging_loss=0.03336, over 3737520.00 frames. 2023-12-22 10:32:47,815 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 10:32:50,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=540146.6666666666, ans=0.0 2023-12-22 10:32:57,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=540213.3333333334, ans=0.0 2023-12-22 10:33:02,899 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.634e+01 2.896e+01 3.092e+01 3.384e+01 9.418e+01, threshold=6.184e+01, percent-clipped=7.0 2023-12-22 10:33:20,507 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.23 vs. limit=15.0 2023-12-22 10:33:23,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=540346.6666666666, ans=0.125 2023-12-22 10:33:32,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=540413.3333333334, ans=0.0 2023-12-22 10:33:36,991 INFO [train.py:886] (1/4) Epoch 18, batch 50, loss[loss=0.01829, audio_tagging_loss=0.01829, over 25000.00 frames. ], tot_loss[loss=0.02198, audio_tagging_loss=0.02198, over 1113614.13 frames. ], batch size: 100, lr: 6.13e-03, grad_scale: 32.0 2023-12-22 10:33:55,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=540546.6666666666, ans=0.125 2023-12-22 10:34:00,225 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.48 vs. limit=15.0 2023-12-22 10:34:04,487 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 10:34:05,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=540613.3333333334, ans=0.125 2023-12-22 10:34:10,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=540680.0, ans=0.0 2023-12-22 10:34:26,553 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.60 vs. limit=15.0 2023-12-22 10:34:30,072 INFO [train.py:886] (1/4) Epoch 18, batch 100, loss[loss=0.01355, audio_tagging_loss=0.01355, over 25000.00 frames. ], tot_loss[loss=0.0192, audio_tagging_loss=0.0192, over 1968928.65 frames. ], batch size: 100, lr: 6.13e-03, grad_scale: 32.0 2023-12-22 10:34:35,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=540813.3333333334, ans=0.5 2023-12-22 10:34:45,959 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.772e+01 3.148e+01 3.407e+01 3.744e+01 5.066e+01, threshold=6.815e+01, percent-clipped=0.0 2023-12-22 10:34:52,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=540946.6666666666, ans=10.0 2023-12-22 10:34:58,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=540946.6666666666, ans=0.0 2023-12-22 10:35:20,931 INFO [train.py:886] (1/4) Epoch 18, batch 150, loss[loss=0.01413, audio_tagging_loss=0.01413, over 25000.00 frames. ], tot_loss[loss=0.0174, audio_tagging_loss=0.0174, over 2634847.17 frames. ], batch size: 100, lr: 6.13e-03, grad_scale: 32.0 2023-12-22 10:35:28,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=541146.6666666666, ans=0.025 2023-12-22 10:35:43,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=541280.0, ans=0.1 2023-12-22 10:35:47,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=541280.0, ans=0.0 2023-12-22 10:35:55,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=541346.6666666666, ans=0.0 2023-12-22 10:35:57,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=541346.6666666666, ans=0.125 2023-12-22 10:35:57,347 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.15 vs. limit=15.0 2023-12-22 10:36:12,542 INFO [train.py:886] (1/4) Epoch 18, batch 200, loss[loss=0.01569, audio_tagging_loss=0.01569, over 24750.00 frames. ], tot_loss[loss=0.01644, audio_tagging_loss=0.01644, over 3143865.56 frames. ], batch size: 99, lr: 6.13e-03, grad_scale: 32.0 2023-12-22 10:36:29,161 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.557e+01 2.889e+01 3.014e+01 3.175e+01 3.778e+01, threshold=6.028e+01, percent-clipped=0.0 2023-12-22 10:36:31,573 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.35 vs. limit=15.0 2023-12-22 10:36:52,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=541746.6666666666, ans=0.1 2023-12-22 10:37:04,255 INFO [train.py:886] (1/4) Epoch 18, batch 250, loss[loss=0.01416, audio_tagging_loss=0.01416, over 25000.00 frames. ], tot_loss[loss=0.01581, audio_tagging_loss=0.01581, over 3544471.19 frames. ], batch size: 100, lr: 6.13e-03, grad_scale: 32.0 2023-12-22 10:37:34,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=542013.3333333334, ans=0.1 2023-12-22 10:37:41,951 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 10:37:56,128 INFO [train.py:886] (1/4) Epoch 18, batch 300, loss[loss=0.01421, audio_tagging_loss=0.01421, over 24750.00 frames. ], tot_loss[loss=0.01543, audio_tagging_loss=0.01543, over 3848557.50 frames. ], batch size: 99, lr: 6.12e-03, grad_scale: 32.0 2023-12-22 10:37:58,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=542146.6666666666, ans=0.125 2023-12-22 10:38:14,088 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.479e+01 2.870e+01 3.034e+01 3.182e+01 3.757e+01, threshold=6.068e+01, percent-clipped=0.0 2023-12-22 10:38:18,348 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.33 vs. limit=15.0 2023-12-22 10:38:34,943 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=5.913e-03 2023-12-22 10:38:48,508 INFO [train.py:886] (1/4) Epoch 18, batch 350, loss[loss=0.01359, audio_tagging_loss=0.01359, over 24750.00 frames. ], tot_loss[loss=0.01505, audio_tagging_loss=0.01505, over 4089555.31 frames. ], batch size: 99, lr: 6.12e-03, grad_scale: 32.0 2023-12-22 10:38:51,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=542480.0, ans=0.1 2023-12-22 10:38:55,491 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2023-12-22 10:39:01,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=542546.6666666666, ans=0.1 2023-12-22 10:39:16,663 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.182e-03 2023-12-22 10:39:39,170 INFO [train.py:886] (1/4) Epoch 18, batch 400, loss[loss=0.01169, audio_tagging_loss=0.01169, over 25000.00 frames. ], tot_loss[loss=0.01469, audio_tagging_loss=0.01469, over 4278727.89 frames. ], batch size: 100, lr: 6.12e-03, grad_scale: 32.0 2023-12-22 10:39:42,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=542813.3333333334, ans=15.0 2023-12-22 10:39:45,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=542813.3333333334, ans=0.07 2023-12-22 10:39:45,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=542813.3333333334, ans=0.125 2023-12-22 10:39:52,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=542880.0, ans=0.125 2023-12-22 10:39:55,841 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.510e+01 2.811e+01 2.915e+01 3.061e+01 3.426e+01, threshold=5.831e+01, percent-clipped=0.0 2023-12-22 10:40:09,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=543013.3333333334, ans=0.0 2023-12-22 10:40:31,256 INFO [train.py:886] (1/4) Epoch 18, batch 450, loss[loss=0.0122, audio_tagging_loss=0.0122, over 25000.00 frames. ], tot_loss[loss=0.01437, audio_tagging_loss=0.01437, over 4432494.51 frames. ], batch size: 100, lr: 6.12e-03, grad_scale: 32.0 2023-12-22 10:40:45,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=543213.3333333334, ans=0.2 2023-12-22 10:40:52,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=543280.0, ans=0.125 2023-12-22 10:40:52,867 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.20 vs. limit=22.5 2023-12-22 10:41:17,722 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.12 vs. limit=15.0 2023-12-22 10:41:18,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=543413.3333333334, ans=0.125 2023-12-22 10:41:19,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=543413.3333333334, ans=0.2 2023-12-22 10:41:23,720 INFO [train.py:886] (1/4) Epoch 18, batch 500, loss[loss=0.01306, audio_tagging_loss=0.01306, over 25000.00 frames. ], tot_loss[loss=0.01423, audio_tagging_loss=0.01423, over 4553792.39 frames. ], batch size: 100, lr: 6.12e-03, grad_scale: 32.0 2023-12-22 10:41:31,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=543480.0, ans=0.125 2023-12-22 10:41:34,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=543546.6666666666, ans=0.125 2023-12-22 10:41:39,747 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.440e+01 2.753e+01 2.847e+01 2.993e+01 3.573e+01, threshold=5.694e+01, percent-clipped=0.0 2023-12-22 10:41:42,138 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.98 vs. limit=15.0 2023-12-22 10:41:54,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=543680.0, ans=0.125 2023-12-22 10:42:06,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=543746.6666666666, ans=0.0 2023-12-22 10:42:14,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=543813.3333333334, ans=0.0 2023-12-22 10:42:15,110 INFO [train.py:886] (1/4) Epoch 18, batch 550, loss[loss=0.01259, audio_tagging_loss=0.01259, over 24057.00 frames. ], tot_loss[loss=0.01414, audio_tagging_loss=0.01414, over 4648560.51 frames. ], batch size: 100, lr: 6.11e-03, grad_scale: 32.0 2023-12-22 10:42:28,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=543880.0, ans=0.125 2023-12-22 10:42:40,013 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.93 vs. limit=15.0 2023-12-22 10:42:44,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=543946.6666666666, ans=0.125 2023-12-22 10:43:07,224 INFO [train.py:886] (1/4) Epoch 18, batch 600, loss[loss=0.01648, audio_tagging_loss=0.01648, over 24750.00 frames. ], tot_loss[loss=0.01418, audio_tagging_loss=0.01418, over 4711423.94 frames. ], batch size: 99, lr: 6.11e-03, grad_scale: 32.0 2023-12-22 10:43:13,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=544146.6666666666, ans=0.125 2023-12-22 10:43:14,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=544146.6666666666, ans=0.0 2023-12-22 10:43:18,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=544213.3333333334, ans=0.125 2023-12-22 10:43:24,356 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.505e+01 2.858e+01 2.981e+01 3.093e+01 3.735e+01, threshold=5.962e+01, percent-clipped=0.0 2023-12-22 10:43:50,974 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 10:43:58,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=544480.0, ans=0.125 2023-12-22 10:43:59,137 INFO [train.py:886] (1/4) Epoch 18, batch 650, loss[loss=0.01312, audio_tagging_loss=0.01312, over 24750.00 frames. ], tot_loss[loss=0.01426, audio_tagging_loss=0.01426, over 4760783.68 frames. ], batch size: 99, lr: 6.11e-03, grad_scale: 32.0 2023-12-22 10:44:09,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=544546.6666666666, ans=0.125 2023-12-22 10:44:27,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=544613.3333333334, ans=0.95 2023-12-22 10:44:42,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=544746.6666666666, ans=0.125 2023-12-22 10:44:48,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=544746.6666666666, ans=0.125 2023-12-22 10:44:51,156 INFO [train.py:886] (1/4) Epoch 18, batch 700, loss[loss=0.01361, audio_tagging_loss=0.01361, over 25000.00 frames. ], tot_loss[loss=0.01426, audio_tagging_loss=0.01426, over 4804479.43 frames. ], batch size: 100, lr: 6.11e-03, grad_scale: 32.0 2023-12-22 10:44:57,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=544813.3333333334, ans=0.125 2023-12-22 10:45:08,242 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.47 vs. limit=15.0 2023-12-22 10:45:08,784 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.527e+01 2.825e+01 2.944e+01 3.061e+01 3.906e+01, threshold=5.887e+01, percent-clipped=0.0 2023-12-22 10:45:10,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=544880.0, ans=0.025 2023-12-22 10:45:14,197 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.19 vs. limit=22.5 2023-12-22 10:45:23,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=545013.3333333334, ans=0.07 2023-12-22 10:45:27,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=545013.3333333334, ans=0.2 2023-12-22 10:45:44,169 INFO [train.py:886] (1/4) Epoch 18, batch 750, loss[loss=0.01384, audio_tagging_loss=0.01384, over 25000.00 frames. ], tot_loss[loss=0.01418, audio_tagging_loss=0.01418, over 4838448.99 frames. ], batch size: 100, lr: 6.11e-03, grad_scale: 32.0 2023-12-22 10:45:45,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=545146.6666666666, ans=0.95 2023-12-22 10:46:04,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=545280.0, ans=0.125 2023-12-22 10:46:10,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=545280.0, ans=0.125 2023-12-22 10:46:24,806 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.10 vs. limit=15.0 2023-12-22 10:46:36,898 INFO [train.py:886] (1/4) Epoch 18, batch 800, loss[loss=0.01359, audio_tagging_loss=0.01359, over 22876.00 frames. ], tot_loss[loss=0.01408, audio_tagging_loss=0.01408, over 4863865.49 frames. ], batch size: 107, lr: 6.11e-03, grad_scale: 32.0 2023-12-22 10:46:42,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=545480.0, ans=0.0 2023-12-22 10:46:46,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=545546.6666666666, ans=0.125 2023-12-22 10:46:52,749 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.515e+01 2.820e+01 2.937e+01 3.093e+01 3.443e+01, threshold=5.874e+01, percent-clipped=0.0 2023-12-22 10:46:56,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=545613.3333333334, ans=0.1 2023-12-22 10:46:57,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=545613.3333333334, ans=0.1 2023-12-22 10:46:59,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.05 vs. limit=10.0 2023-12-22 10:47:10,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=545680.0, ans=0.1 2023-12-22 10:47:16,143 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.21 vs. limit=15.0 2023-12-22 10:47:25,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=545746.6666666666, ans=0.015 2023-12-22 10:47:27,812 INFO [train.py:886] (1/4) Epoch 18, batch 850, loss[loss=0.01448, audio_tagging_loss=0.01448, over 25000.00 frames. ], tot_loss[loss=0.01401, audio_tagging_loss=0.01401, over 4879361.13 frames. ], batch size: 100, lr: 6.10e-03, grad_scale: 32.0 2023-12-22 10:47:29,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=545813.3333333334, ans=0.125 2023-12-22 10:47:42,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=545880.0, ans=0.2 2023-12-22 10:47:49,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=545946.6666666666, ans=0.125 2023-12-22 10:48:19,673 INFO [train.py:886] (1/4) Epoch 18, batch 900, loss[loss=0.01857, audio_tagging_loss=0.01857, over 24750.00 frames. ], tot_loss[loss=0.01392, audio_tagging_loss=0.01392, over 4895805.02 frames. ], batch size: 99, lr: 6.10e-03, grad_scale: 32.0 2023-12-22 10:48:20,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=546146.6666666666, ans=0.125 2023-12-22 10:48:34,037 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 10:48:34,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=546213.3333333334, ans=0.2 2023-12-22 10:48:35,720 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.627e+01 2.855e+01 2.963e+01 3.115e+01 3.581e+01, threshold=5.925e+01, percent-clipped=0.0 2023-12-22 10:48:49,079 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.39 vs. limit=15.0 2023-12-22 10:48:56,109 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.413e-03 2023-12-22 10:48:56,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=546346.6666666666, ans=0.04949747468305833 2023-12-22 10:49:00,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=546413.3333333334, ans=0.1 2023-12-22 10:49:01,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=546413.3333333334, ans=0.125 2023-12-22 10:49:02,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=546413.3333333334, ans=0.0 2023-12-22 10:49:09,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=546480.0, ans=0.05 2023-12-22 10:49:10,084 INFO [train.py:886] (1/4) Epoch 18, batch 950, loss[loss=0.01459, audio_tagging_loss=0.01459, over 25000.00 frames. ], tot_loss[loss=0.01404, audio_tagging_loss=0.01404, over 4905982.24 frames. ], batch size: 100, lr: 6.10e-03, grad_scale: 32.0 2023-12-22 10:49:11,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=546480.0, ans=0.125 2023-12-22 10:49:13,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=546480.0, ans=0.125 2023-12-22 10:49:21,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=546546.6666666666, ans=0.0 2023-12-22 10:49:42,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=546680.0, ans=0.02 2023-12-22 10:49:59,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=546746.6666666666, ans=0.2 2023-12-22 10:50:02,901 INFO [train.py:886] (1/4) Epoch 18, batch 1000, loss[loss=0.01171, audio_tagging_loss=0.01171, over 24750.00 frames. ], tot_loss[loss=0.01407, audio_tagging_loss=0.01407, over 4908467.62 frames. ], batch size: 99, lr: 6.10e-03, grad_scale: 32.0 2023-12-22 10:50:09,022 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.87 vs. limit=15.0 2023-12-22 10:50:10,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=546813.3333333334, ans=0.035 2023-12-22 10:50:14,863 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.37 vs. limit=15.0 2023-12-22 10:50:18,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=546880.0, ans=0.125 2023-12-22 10:50:19,623 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.455e+01 2.848e+01 2.998e+01 3.205e+01 5.243e+01, threshold=5.996e+01, percent-clipped=0.0 2023-12-22 10:50:23,863 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.63 vs. limit=15.0 2023-12-22 10:50:40,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=547013.3333333334, ans=0.0 2023-12-22 10:50:42,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=547013.3333333334, ans=0.1 2023-12-22 10:50:42,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=547080.0, ans=0.125 2023-12-22 10:50:47,127 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.48 vs. limit=15.0 2023-12-22 10:50:48,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=547080.0, ans=0.125 2023-12-22 10:50:53,947 INFO [train.py:886] (1/4) Epoch 18, batch 1050, loss[loss=0.01377, audio_tagging_loss=0.01377, over 25000.00 frames. ], tot_loss[loss=0.01404, audio_tagging_loss=0.01404, over 4918364.94 frames. ], batch size: 100, lr: 6.10e-03, grad_scale: 32.0 2023-12-22 10:51:19,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=547280.0, ans=0.0 2023-12-22 10:51:22,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=547280.0, ans=0.0 2023-12-22 10:51:30,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=547346.6666666666, ans=0.0 2023-12-22 10:51:44,390 INFO [train.py:886] (1/4) Epoch 18, batch 1100, loss[loss=0.01307, audio_tagging_loss=0.01307, over 25000.00 frames. ], tot_loss[loss=0.01392, audio_tagging_loss=0.01392, over 4926356.75 frames. ], batch size: 100, lr: 6.09e-03, grad_scale: 32.0 2023-12-22 10:51:46,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=547480.0, ans=0.0 2023-12-22 10:51:46,805 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.37 vs. limit=15.0 2023-12-22 10:51:51,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=547480.0, ans=0.1 2023-12-22 10:51:58,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=547546.6666666666, ans=0.2 2023-12-22 10:52:02,014 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.486e+01 2.837e+01 2.977e+01 3.116e+01 4.656e+01, threshold=5.954e+01, percent-clipped=0.0 2023-12-22 10:52:06,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=547613.3333333334, ans=0.125 2023-12-22 10:52:25,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=547746.6666666666, ans=0.125 2023-12-22 10:52:30,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=547746.6666666666, ans=0.125 2023-12-22 10:52:36,578 INFO [train.py:886] (1/4) Epoch 18, batch 1150, loss[loss=0.01451, audio_tagging_loss=0.01451, over 25000.00 frames. ], tot_loss[loss=0.01395, audio_tagging_loss=0.01395, over 4936344.47 frames. ], batch size: 100, lr: 6.09e-03, grad_scale: 32.0 2023-12-22 10:52:54,254 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.56 vs. limit=12.0 2023-12-22 10:52:58,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=547946.6666666666, ans=0.125 2023-12-22 10:53:07,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=548013.3333333334, ans=0.2 2023-12-22 10:53:23,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=548080.0, ans=0.125 2023-12-22 10:53:27,146 INFO [train.py:886] (1/4) Epoch 18, batch 1200, loss[loss=0.0136, audio_tagging_loss=0.0136, over 25000.00 frames. ], tot_loss[loss=0.01402, audio_tagging_loss=0.01402, over 4948067.19 frames. ], batch size: 100, lr: 6.09e-03, grad_scale: 32.0 2023-12-22 10:53:30,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=548146.6666666666, ans=0.0 2023-12-22 10:53:31,739 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.83 vs. limit=15.0 2023-12-22 10:53:45,236 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.582e+01 2.823e+01 2.963e+01 3.147e+01 3.546e+01, threshold=5.926e+01, percent-clipped=0.0 2023-12-22 10:53:59,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=548346.6666666666, ans=0.125 2023-12-22 10:54:01,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=548346.6666666666, ans=10.0 2023-12-22 10:54:20,542 INFO [train.py:886] (1/4) Epoch 18, batch 1250, loss[loss=0.01145, audio_tagging_loss=0.01145, over 24750.00 frames. ], tot_loss[loss=0.01406, audio_tagging_loss=0.01406, over 4947083.36 frames. ], batch size: 99, lr: 6.09e-03, grad_scale: 32.0 2023-12-22 10:54:39,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=548546.6666666666, ans=0.125 2023-12-22 10:54:39,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.32 vs. limit=15.0 2023-12-22 10:54:59,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=548680.0, ans=0.125 2023-12-22 10:55:01,956 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2023-12-22 10:55:09,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=548746.6666666666, ans=0.125 2023-12-22 10:55:11,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=548813.3333333334, ans=0.125 2023-12-22 10:55:13,229 INFO [train.py:886] (1/4) Epoch 18, batch 1300, loss[loss=0.0137, audio_tagging_loss=0.0137, over 24047.00 frames. ], tot_loss[loss=0.01415, audio_tagging_loss=0.01415, over 4943818.69 frames. ], batch size: 100, lr: 6.09e-03, grad_scale: 32.0 2023-12-22 10:55:24,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=548880.0, ans=0.125 2023-12-22 10:55:24,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=548880.0, ans=0.0 2023-12-22 10:55:29,145 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.688e+01 2.885e+01 3.017e+01 3.228e+01 3.822e+01, threshold=6.035e+01, percent-clipped=0.0 2023-12-22 10:55:55,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=549080.0, ans=0.125 2023-12-22 10:56:01,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=549080.0, ans=10.0 2023-12-22 10:56:03,809 INFO [train.py:886] (1/4) Epoch 18, batch 1350, loss[loss=0.01479, audio_tagging_loss=0.01479, over 25000.00 frames. ], tot_loss[loss=0.01409, audio_tagging_loss=0.01409, over 4944927.02 frames. ], batch size: 100, lr: 6.08e-03, grad_scale: 32.0 2023-12-22 10:56:14,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=549213.3333333334, ans=0.125 2023-12-22 10:56:24,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=549213.3333333334, ans=0.2 2023-12-22 10:56:25,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=549280.0, ans=0.125 2023-12-22 10:56:50,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=549413.3333333334, ans=0.125 2023-12-22 10:56:57,170 INFO [train.py:886] (1/4) Epoch 18, batch 1400, loss[loss=0.0179, audio_tagging_loss=0.0179, over 25000.00 frames. ], tot_loss[loss=0.01405, audio_tagging_loss=0.01405, over 4946011.94 frames. ], batch size: 100, lr: 6.08e-03, grad_scale: 32.0 2023-12-22 10:57:10,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=549546.6666666666, ans=0.1 2023-12-22 10:57:11,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.90 vs. limit=22.5 2023-12-22 10:57:13,170 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.382e+01 2.790e+01 2.898e+01 3.128e+01 3.665e+01, threshold=5.796e+01, percent-clipped=0.0 2023-12-22 10:57:19,251 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.83 vs. limit=22.5 2023-12-22 10:57:24,834 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.20 vs. limit=22.5 2023-12-22 10:57:38,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=549746.6666666666, ans=0.0 2023-12-22 10:57:48,211 INFO [train.py:886] (1/4) Epoch 18, batch 1450, loss[loss=0.01353, audio_tagging_loss=0.01353, over 25000.00 frames. ], tot_loss[loss=0.01402, audio_tagging_loss=0.01402, over 4952551.65 frames. ], batch size: 100, lr: 6.08e-03, grad_scale: 32.0 2023-12-22 10:57:52,452 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.30 vs. limit=15.0 2023-12-22 10:58:20,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=550013.3333333334, ans=0.0 2023-12-22 10:58:30,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=550080.0, ans=0.2 2023-12-22 10:58:31,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=550080.0, ans=0.1 2023-12-22 10:58:40,470 INFO [train.py:886] (1/4) Epoch 18, batch 1500, loss[loss=0.01449, audio_tagging_loss=0.01449, over 25000.00 frames. ], tot_loss[loss=0.01407, audio_tagging_loss=0.01407, over 4954470.67 frames. ], batch size: 100, lr: 6.08e-03, grad_scale: 32.0 2023-12-22 10:58:46,798 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=12.0 2023-12-22 10:58:56,300 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.501e+01 2.873e+01 2.986e+01 3.160e+01 3.608e+01, threshold=5.972e+01, percent-clipped=0.0 2023-12-22 10:59:17,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=550346.6666666666, ans=0.0 2023-12-22 10:59:26,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=550413.3333333334, ans=0.0 2023-12-22 10:59:27,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=550413.3333333334, ans=0.2 2023-12-22 10:59:31,568 INFO [train.py:886] (1/4) Epoch 18, batch 1550, loss[loss=0.01265, audio_tagging_loss=0.01265, over 25000.00 frames. ], tot_loss[loss=0.01407, audio_tagging_loss=0.01407, over 4953938.06 frames. ], batch size: 100, lr: 6.08e-03, grad_scale: 32.0 2023-12-22 10:59:41,198 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.57 vs. limit=10.0 2023-12-22 10:59:54,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=550613.3333333334, ans=0.04949747468305833 2023-12-22 10:59:57,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=550613.3333333334, ans=0.0 2023-12-22 10:59:58,858 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.514e-03 2023-12-22 10:59:58,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=550613.3333333334, ans=0.125 2023-12-22 11:00:06,513 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=18.90 vs. limit=15.0 2023-12-22 11:00:15,022 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.27 vs. limit=10.0 2023-12-22 11:00:15,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=550746.6666666666, ans=0.2 2023-12-22 11:00:17,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=550746.6666666666, ans=0.125 2023-12-22 11:00:23,825 INFO [train.py:886] (1/4) Epoch 18, batch 1600, loss[loss=0.01581, audio_tagging_loss=0.01581, over 24750.00 frames. ], tot_loss[loss=0.01411, audio_tagging_loss=0.01411, over 4946689.82 frames. ], batch size: 99, lr: 6.08e-03, grad_scale: 32.0 2023-12-22 11:00:40,563 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.582e+01 2.899e+01 3.029e+01 3.152e+01 3.450e+01, threshold=6.059e+01, percent-clipped=0.0 2023-12-22 11:00:49,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=550946.6666666666, ans=0.125 2023-12-22 11:00:49,297 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.92 vs. limit=22.5 2023-12-22 11:00:55,667 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.88 vs. limit=22.5 2023-12-22 11:01:03,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=551013.3333333334, ans=0.0 2023-12-22 11:01:13,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=551080.0, ans=0.125 2023-12-22 11:01:14,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=551080.0, ans=0.0 2023-12-22 11:01:15,675 INFO [train.py:886] (1/4) Epoch 18, batch 1650, loss[loss=0.01412, audio_tagging_loss=0.01412, over 24750.00 frames. ], tot_loss[loss=0.01412, audio_tagging_loss=0.01412, over 4947061.95 frames. ], batch size: 99, lr: 6.07e-03, grad_scale: 32.0 2023-12-22 11:01:18,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=551146.6666666666, ans=0.0 2023-12-22 11:01:23,552 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.76 vs. limit=15.0 2023-12-22 11:01:26,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=551213.3333333334, ans=0.1 2023-12-22 11:01:33,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=551213.3333333334, ans=0.0 2023-12-22 11:01:54,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=551346.6666666666, ans=0.125 2023-12-22 11:02:01,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=551413.3333333334, ans=0.1 2023-12-22 11:02:05,210 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.94 vs. limit=10.0 2023-12-22 11:02:08,344 INFO [train.py:886] (1/4) Epoch 18, batch 1700, loss[loss=0.01035, audio_tagging_loss=0.01035, over 24750.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4952478.41 frames. ], batch size: 99, lr: 6.07e-03, grad_scale: 32.0 2023-12-22 11:02:12,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=551480.0, ans=0.125 2023-12-22 11:02:21,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=551546.6666666666, ans=0.125 2023-12-22 11:02:24,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=551546.6666666666, ans=0.2 2023-12-22 11:02:24,928 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.540e+01 2.844e+01 2.991e+01 3.158e+01 3.709e+01, threshold=5.982e+01, percent-clipped=0.0 2023-12-22 11:02:32,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=551613.3333333334, ans=0.125 2023-12-22 11:02:34,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=551613.3333333334, ans=0.0 2023-12-22 11:02:38,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=551680.0, ans=0.0 2023-12-22 11:02:44,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=551680.0, ans=0.125 2023-12-22 11:02:47,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=551680.0, ans=0.1 2023-12-22 11:02:57,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=551746.6666666666, ans=0.0 2023-12-22 11:02:59,965 INFO [train.py:886] (1/4) Epoch 18, batch 1750, loss[loss=0.009966, audio_tagging_loss=0.009966, over 25000.00 frames. ], tot_loss[loss=0.01388, audio_tagging_loss=0.01388, over 4955678.61 frames. ], batch size: 100, lr: 6.07e-03, grad_scale: 32.0 2023-12-22 11:03:16,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=551880.0, ans=0.125 2023-12-22 11:03:25,531 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.68 vs. limit=15.0 2023-12-22 11:03:26,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.84 vs. limit=15.0 2023-12-22 11:03:52,488 INFO [train.py:886] (1/4) Epoch 18, batch 1800, loss[loss=0.015, audio_tagging_loss=0.015, over 25000.00 frames. ], tot_loss[loss=0.01387, audio_tagging_loss=0.01387, over 4952598.24 frames. ], batch size: 100, lr: 6.07e-03, grad_scale: 32.0 2023-12-22 11:04:03,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=552213.3333333334, ans=0.125 2023-12-22 11:04:04,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=552213.3333333334, ans=0.125 2023-12-22 11:04:08,385 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.424e+01 2.840e+01 2.988e+01 3.166e+01 4.187e+01, threshold=5.976e+01, percent-clipped=0.0 2023-12-22 11:04:14,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=552280.0, ans=0.125 2023-12-22 11:04:17,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=552280.0, ans=0.09899494936611666 2023-12-22 11:04:24,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=552346.6666666666, ans=0.125 2023-12-22 11:04:33,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=552413.3333333334, ans=0.125 2023-12-22 11:04:44,142 INFO [train.py:886] (1/4) Epoch 18, batch 1850, loss[loss=0.01659, audio_tagging_loss=0.01659, over 24750.00 frames. ], tot_loss[loss=0.01406, audio_tagging_loss=0.01406, over 4948816.43 frames. ], batch size: 99, lr: 6.07e-03, grad_scale: 32.0 2023-12-22 11:04:47,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=552480.0, ans=0.0 2023-12-22 11:04:49,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=552480.0, ans=0.125 2023-12-22 11:05:04,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=552613.3333333334, ans=0.0 2023-12-22 11:05:20,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=552680.0, ans=0.125 2023-12-22 11:05:22,065 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.72 vs. limit=15.0 2023-12-22 11:05:35,440 INFO [train.py:886] (1/4) Epoch 18, batch 1900, loss[loss=0.01597, audio_tagging_loss=0.01597, over 24750.00 frames. ], tot_loss[loss=0.01414, audio_tagging_loss=0.01414, over 4941858.92 frames. ], batch size: 99, lr: 6.06e-03, grad_scale: 32.0 2023-12-22 11:05:48,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=552880.0, ans=0.125 2023-12-22 11:05:52,655 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.383e+01 2.849e+01 3.028e+01 3.160e+01 3.565e+01, threshold=6.056e+01, percent-clipped=0.0 2023-12-22 11:05:53,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=552880.0, ans=0.125 2023-12-22 11:05:55,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=552946.6666666666, ans=0.125 2023-12-22 11:06:00,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=552946.6666666666, ans=0.1 2023-12-22 11:06:06,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=553013.3333333334, ans=0.1 2023-12-22 11:06:07,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=553013.3333333334, ans=0.0 2023-12-22 11:06:08,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=553013.3333333334, ans=0.125 2023-12-22 11:06:16,951 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.81 vs. limit=15.0 2023-12-22 11:06:26,116 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.89 vs. limit=6.0 2023-12-22 11:06:27,549 INFO [train.py:886] (1/4) Epoch 18, batch 1950, loss[loss=0.01356, audio_tagging_loss=0.01356, over 24750.00 frames. ], tot_loss[loss=0.01397, audio_tagging_loss=0.01397, over 4940886.55 frames. ], batch size: 99, lr: 6.06e-03, grad_scale: 32.0 2023-12-22 11:07:03,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=553346.6666666666, ans=0.0 2023-12-22 11:07:04,047 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.72 vs. limit=15.0 2023-12-22 11:07:12,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=553413.3333333334, ans=0.0 2023-12-22 11:07:18,207 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.99 vs. limit=15.0 2023-12-22 11:07:18,703 INFO [train.py:886] (1/4) Epoch 18, batch 2000, loss[loss=0.0122, audio_tagging_loss=0.0122, over 25000.00 frames. ], tot_loss[loss=0.01395, audio_tagging_loss=0.01395, over 4946563.67 frames. ], batch size: 100, lr: 6.06e-03, grad_scale: 64.0 2023-12-22 11:07:22,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=553480.0, ans=0.125 2023-12-22 11:07:25,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=553480.0, ans=0.125 2023-12-22 11:07:35,523 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.02 vs. limit=15.0 2023-12-22 11:07:35,970 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.475e+01 2.820e+01 2.955e+01 3.133e+01 3.622e+01, threshold=5.910e+01, percent-clipped=0.0 2023-12-22 11:07:45,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=553613.3333333334, ans=0.125 2023-12-22 11:07:47,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=553613.3333333334, ans=10.0 2023-12-22 11:08:08,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=553746.6666666666, ans=0.025 2023-12-22 11:08:11,139 INFO [train.py:886] (1/4) Epoch 18, batch 2050, loss[loss=0.01373, audio_tagging_loss=0.01373, over 25000.00 frames. ], tot_loss[loss=0.01396, audio_tagging_loss=0.01396, over 4947967.08 frames. ], batch size: 100, lr: 6.06e-03, grad_scale: 64.0 2023-12-22 11:08:17,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=553813.3333333334, ans=0.09899494936611666 2023-12-22 11:08:33,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=553946.6666666666, ans=0.05 2023-12-22 11:08:47,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=554013.3333333334, ans=0.0 2023-12-22 11:08:55,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=554080.0, ans=0.125 2023-12-22 11:09:01,972 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.29 vs. limit=22.5 2023-12-22 11:09:02,569 INFO [train.py:886] (1/4) Epoch 18, batch 2100, loss[loss=0.0126, audio_tagging_loss=0.0126, over 25000.00 frames. ], tot_loss[loss=0.01391, audio_tagging_loss=0.01391, over 4946720.34 frames. ], batch size: 100, lr: 6.06e-03, grad_scale: 64.0 2023-12-22 11:09:18,378 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.520e+01 2.806e+01 2.962e+01 3.157e+01 3.752e+01, threshold=5.925e+01, percent-clipped=0.0 2023-12-22 11:09:19,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=554213.3333333334, ans=0.125 2023-12-22 11:09:35,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=554346.6666666666, ans=0.5 2023-12-22 11:09:46,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=554413.3333333334, ans=0.0 2023-12-22 11:09:46,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=554413.3333333334, ans=0.0 2023-12-22 11:09:53,762 INFO [train.py:886] (1/4) Epoch 18, batch 2150, loss[loss=0.01318, audio_tagging_loss=0.01318, over 24750.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4951957.49 frames. ], batch size: 99, lr: 6.06e-03, grad_scale: 64.0 2023-12-22 11:09:57,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=554480.0, ans=0.0 2023-12-22 11:10:26,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=554680.0, ans=0.0 2023-12-22 11:10:47,197 INFO [train.py:886] (1/4) Epoch 18, batch 2200, loss[loss=0.01443, audio_tagging_loss=0.01443, over 24750.00 frames. ], tot_loss[loss=0.01404, audio_tagging_loss=0.01404, over 4946091.46 frames. ], batch size: 99, lr: 6.05e-03, grad_scale: 64.0 2023-12-22 11:10:56,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=554880.0, ans=0.0 2023-12-22 11:10:56,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=554880.0, ans=0.125 2023-12-22 11:11:02,287 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.571e+01 2.880e+01 3.019e+01 3.160e+01 3.670e+01, threshold=6.038e+01, percent-clipped=0.0 2023-12-22 11:11:38,054 INFO [train.py:886] (1/4) Epoch 18, batch 2250, loss[loss=0.01059, audio_tagging_loss=0.01059, over 24750.00 frames. ], tot_loss[loss=0.01413, audio_tagging_loss=0.01413, over 4941748.63 frames. ], batch size: 99, lr: 6.05e-03, grad_scale: 64.0 2023-12-22 11:11:40,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=555146.6666666666, ans=10.0 2023-12-22 11:11:44,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2023-12-22 11:11:50,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=555213.3333333334, ans=0.125 2023-12-22 11:11:51,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=555213.3333333334, ans=0.0 2023-12-22 11:11:57,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=555213.3333333334, ans=0.2 2023-12-22 11:12:02,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=555280.0, ans=0.0 2023-12-22 11:12:12,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=555346.6666666666, ans=0.0 2023-12-22 11:12:13,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=555346.6666666666, ans=0.05 2023-12-22 11:12:15,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=555346.6666666666, ans=0.2 2023-12-22 11:12:26,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=555413.3333333334, ans=0.125 2023-12-22 11:12:30,211 INFO [train.py:886] (1/4) Epoch 18, batch 2300, loss[loss=0.01341, audio_tagging_loss=0.01341, over 25000.00 frames. ], tot_loss[loss=0.01405, audio_tagging_loss=0.01405, over 4944518.68 frames. ], batch size: 100, lr: 6.05e-03, grad_scale: 64.0 2023-12-22 11:12:33,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=555480.0, ans=0.125 2023-12-22 11:12:38,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=555546.6666666666, ans=0.0 2023-12-22 11:12:45,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=555546.6666666666, ans=0.125 2023-12-22 11:12:46,748 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.443e+01 2.767e+01 2.927e+01 3.038e+01 3.631e+01, threshold=5.853e+01, percent-clipped=0.0 2023-12-22 11:12:46,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=555546.6666666666, ans=0.0 2023-12-22 11:13:21,709 INFO [train.py:886] (1/4) Epoch 18, batch 2350, loss[loss=0.01576, audio_tagging_loss=0.01576, over 25000.00 frames. ], tot_loss[loss=0.01405, audio_tagging_loss=0.01405, over 4944330.83 frames. ], batch size: 100, lr: 6.05e-03, grad_scale: 64.0 2023-12-22 11:13:22,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=555813.3333333334, ans=0.125 2023-12-22 11:13:27,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=555813.3333333334, ans=0.09899494936611666 2023-12-22 11:13:42,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=555946.6666666666, ans=0.0 2023-12-22 11:13:45,601 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.09 vs. limit=10.0 2023-12-22 11:13:48,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=555946.6666666666, ans=0.125 2023-12-22 11:13:55,832 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=15.0 2023-12-22 11:14:05,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=556080.0, ans=0.125 2023-12-22 11:14:12,065 INFO [train.py:886] (1/4) Epoch 18, batch 2400, loss[loss=0.01314, audio_tagging_loss=0.01314, over 21941.00 frames. ], tot_loss[loss=0.01397, audio_tagging_loss=0.01397, over 4943719.10 frames. ], batch size: 107, lr: 6.05e-03, grad_scale: 64.0 2023-12-22 11:14:18,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=556146.6666666666, ans=0.1 2023-12-22 11:14:29,875 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.546e+01 2.855e+01 2.962e+01 3.074e+01 3.618e+01, threshold=5.924e+01, percent-clipped=0.0 2023-12-22 11:15:05,109 INFO [train.py:886] (1/4) Epoch 18, batch 2450, loss[loss=0.01456, audio_tagging_loss=0.01456, over 25000.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4949052.92 frames. ], batch size: 100, lr: 6.05e-03, grad_scale: 64.0 2023-12-22 11:15:22,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.72 vs. limit=6.0 2023-12-22 11:15:31,949 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.39 vs. limit=15.0 2023-12-22 11:15:45,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=556746.6666666666, ans=0.1 2023-12-22 11:15:56,170 INFO [train.py:886] (1/4) Epoch 18, batch 2500, loss[loss=0.0135, audio_tagging_loss=0.0135, over 24750.00 frames. ], tot_loss[loss=0.01412, audio_tagging_loss=0.01412, over 4950316.18 frames. ], batch size: 99, lr: 6.04e-03, grad_scale: 64.0 2023-12-22 11:16:09,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=556880.0, ans=0.1 2023-12-22 11:16:13,365 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.591e+01 2.911e+01 3.040e+01 3.167e+01 3.837e+01, threshold=6.080e+01, percent-clipped=0.0 2023-12-22 11:16:15,758 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2023-12-22 11:16:18,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.89 vs. limit=22.5 2023-12-22 11:16:20,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=556946.6666666666, ans=0.125 2023-12-22 11:16:21,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=556946.6666666666, ans=0.1 2023-12-22 11:16:30,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=557013.3333333334, ans=0.0 2023-12-22 11:16:48,289 INFO [train.py:886] (1/4) Epoch 18, batch 2550, loss[loss=0.01261, audio_tagging_loss=0.01261, over 24750.00 frames. ], tot_loss[loss=0.0142, audio_tagging_loss=0.0142, over 4941924.59 frames. ], batch size: 99, lr: 6.04e-03, grad_scale: 64.0 2023-12-22 11:16:49,641 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=15.0 2023-12-22 11:16:57,070 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.41 vs. limit=15.0 2023-12-22 11:17:06,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=557213.3333333334, ans=0.125 2023-12-22 11:17:12,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=557280.0, ans=0.0 2023-12-22 11:17:23,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=557346.6666666666, ans=0.1 2023-12-22 11:17:40,767 INFO [train.py:886] (1/4) Epoch 18, batch 2600, loss[loss=0.01433, audio_tagging_loss=0.01433, over 24750.00 frames. ], tot_loss[loss=0.0142, audio_tagging_loss=0.0142, over 4944926.64 frames. ], batch size: 99, lr: 6.04e-03, grad_scale: 64.0 2023-12-22 11:17:41,305 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.42 vs. limit=15.0 2023-12-22 11:17:52,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=557546.6666666666, ans=0.0 2023-12-22 11:17:56,619 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.461e+01 2.828e+01 2.988e+01 3.124e+01 3.523e+01, threshold=5.975e+01, percent-clipped=0.0 2023-12-22 11:17:56,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=557546.6666666666, ans=0.0 2023-12-22 11:17:58,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=557546.6666666666, ans=0.04949747468305833 2023-12-22 11:18:00,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=557613.3333333334, ans=0.0 2023-12-22 11:18:27,763 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.82 vs. limit=12.0 2023-12-22 11:18:32,521 INFO [train.py:886] (1/4) Epoch 18, batch 2650, loss[loss=0.01633, audio_tagging_loss=0.01633, over 25000.00 frames. ], tot_loss[loss=0.01404, audio_tagging_loss=0.01404, over 4948564.15 frames. ], batch size: 100, lr: 6.04e-03, grad_scale: 64.0 2023-12-22 11:18:32,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=557813.3333333334, ans=0.025 2023-12-22 11:18:35,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=557813.3333333334, ans=0.2 2023-12-22 11:18:48,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=557880.0, ans=0.125 2023-12-22 11:19:11,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=558013.3333333334, ans=0.0 2023-12-22 11:19:12,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=558013.3333333334, ans=0.1 2023-12-22 11:19:13,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=558080.0, ans=22.5 2023-12-22 11:19:24,361 INFO [train.py:886] (1/4) Epoch 18, batch 2700, loss[loss=0.01307, audio_tagging_loss=0.01307, over 25000.00 frames. ], tot_loss[loss=0.014, audio_tagging_loss=0.014, over 4952596.90 frames. ], batch size: 100, lr: 6.04e-03, grad_scale: 64.0 2023-12-22 11:19:39,186 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.13 vs. limit=22.5 2023-12-22 11:19:40,799 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.546e+01 2.868e+01 2.973e+01 3.116e+01 3.614e+01, threshold=5.947e+01, percent-clipped=0.0 2023-12-22 11:20:05,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=558413.3333333334, ans=10.0 2023-12-22 11:20:09,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=558413.3333333334, ans=0.07 2023-12-22 11:20:16,854 INFO [train.py:886] (1/4) Epoch 18, batch 2750, loss[loss=0.01597, audio_tagging_loss=0.01597, over 25000.00 frames. ], tot_loss[loss=0.01402, audio_tagging_loss=0.01402, over 4948786.91 frames. ], batch size: 100, lr: 6.03e-03, grad_scale: 64.0 2023-12-22 11:20:21,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=558480.0, ans=0.0 2023-12-22 11:20:26,800 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.39 vs. limit=12.0 2023-12-22 11:20:37,637 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 11:20:41,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=558613.3333333334, ans=0.1 2023-12-22 11:20:51,703 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.80 vs. limit=22.5 2023-12-22 11:21:02,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=558746.6666666666, ans=0.125 2023-12-22 11:21:08,770 INFO [train.py:886] (1/4) Epoch 18, batch 2800, loss[loss=0.01434, audio_tagging_loss=0.01434, over 24750.00 frames. ], tot_loss[loss=0.01403, audio_tagging_loss=0.01403, over 4945034.82 frames. ], batch size: 99, lr: 6.03e-03, grad_scale: 64.0 2023-12-22 11:21:17,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=558813.3333333334, ans=0.125 2023-12-22 11:21:26,345 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.484e+01 2.878e+01 3.022e+01 3.166e+01 3.737e+01, threshold=6.045e+01, percent-clipped=0.0 2023-12-22 11:21:45,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=559013.3333333334, ans=0.2 2023-12-22 11:21:53,326 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.66 vs. limit=15.0 2023-12-22 11:22:00,833 INFO [train.py:886] (1/4) Epoch 18, batch 2850, loss[loss=0.01019, audio_tagging_loss=0.01019, over 24750.00 frames. ], tot_loss[loss=0.01395, audio_tagging_loss=0.01395, over 4943996.76 frames. ], batch size: 99, lr: 6.03e-03, grad_scale: 32.0 2023-12-22 11:22:16,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=559213.3333333334, ans=0.0 2023-12-22 11:22:35,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=559346.6666666666, ans=0.0 2023-12-22 11:22:42,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=559413.3333333334, ans=0.125 2023-12-22 11:22:52,677 INFO [train.py:886] (1/4) Epoch 18, batch 2900, loss[loss=0.01362, audio_tagging_loss=0.01362, over 25000.00 frames. ], tot_loss[loss=0.0139, audio_tagging_loss=0.0139, over 4941523.46 frames. ], batch size: 100, lr: 6.03e-03, grad_scale: 32.0 2023-12-22 11:22:55,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=559480.0, ans=0.125 2023-12-22 11:22:56,651 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.41 vs. limit=22.5 2023-12-22 11:23:00,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=559480.0, ans=0.125 2023-12-22 11:23:10,144 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.547e+01 2.901e+01 3.031e+01 3.155e+01 3.612e+01, threshold=6.062e+01, percent-clipped=0.0 2023-12-22 11:23:13,556 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.78 vs. limit=6.0 2023-12-22 11:23:37,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=559746.6666666666, ans=0.125 2023-12-22 11:23:38,075 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=15.0 2023-12-22 11:23:44,253 INFO [train.py:886] (1/4) Epoch 18, batch 2950, loss[loss=0.01146, audio_tagging_loss=0.01146, over 25000.00 frames. ], tot_loss[loss=0.01378, audio_tagging_loss=0.01378, over 4946179.80 frames. ], batch size: 100, lr: 6.03e-03, grad_scale: 32.0 2023-12-22 11:24:01,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=559880.0, ans=0.1 2023-12-22 11:24:20,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=560013.3333333334, ans=0.125 2023-12-22 11:24:32,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=560080.0, ans=0.2 2023-12-22 11:24:38,503 INFO [train.py:886] (1/4) Epoch 18, batch 3000, loss[loss=0.01223, audio_tagging_loss=0.01223, over 25000.00 frames. ], tot_loss[loss=0.01373, audio_tagging_loss=0.01373, over 4949504.27 frames. ], batch size: 100, lr: 6.03e-03, grad_scale: 32.0 2023-12-22 11:24:38,504 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 11:24:56,031 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.1429, 3.0537, 3.5310, 3.6533], device='cuda:1') 2023-12-22 11:24:59,975 INFO [train.py:917] (1/4) Epoch 18, validation: loss=0.03414, audio_tagging_loss=0.03414, over 3737520.00 frames. 2023-12-22 11:24:59,975 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 11:25:16,889 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.569e+01 2.845e+01 2.970e+01 3.099e+01 3.862e+01, threshold=5.940e+01, percent-clipped=0.0 2023-12-22 11:25:31,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=560346.6666666666, ans=0.0 2023-12-22 11:25:50,264 INFO [train.py:886] (1/4) Epoch 18, batch 3050, loss[loss=0.01436, audio_tagging_loss=0.01436, over 25000.00 frames. ], tot_loss[loss=0.01374, audio_tagging_loss=0.01374, over 4957808.07 frames. ], batch size: 100, lr: 6.02e-03, grad_scale: 32.0 2023-12-22 11:25:58,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=560546.6666666666, ans=0.125 2023-12-22 11:26:12,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=560613.3333333334, ans=0.2 2023-12-22 11:26:12,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=560613.3333333334, ans=0.1 2023-12-22 11:26:17,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=560613.3333333334, ans=0.1 2023-12-22 11:26:19,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=560613.3333333334, ans=0.0 2023-12-22 11:26:33,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=560746.6666666666, ans=0.0 2023-12-22 11:26:42,427 INFO [train.py:886] (1/4) Epoch 18, batch 3100, loss[loss=0.01315, audio_tagging_loss=0.01315, over 24750.00 frames. ], tot_loss[loss=0.01375, audio_tagging_loss=0.01375, over 4963179.16 frames. ], batch size: 99, lr: 6.02e-03, grad_scale: 32.0 2023-12-22 11:26:42,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=560813.3333333334, ans=0.1 2023-12-22 11:26:48,243 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.53 vs. limit=12.0 2023-12-22 11:26:52,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.11 vs. limit=22.5 2023-12-22 11:26:59,217 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.547e+01 2.848e+01 3.018e+01 3.166e+01 3.503e+01, threshold=6.036e+01, percent-clipped=0.0 2023-12-22 11:26:59,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=560880.0, ans=0.1 2023-12-22 11:27:03,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=560946.6666666666, ans=0.0 2023-12-22 11:27:11,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=560946.6666666666, ans=0.1 2023-12-22 11:27:32,487 INFO [train.py:886] (1/4) Epoch 18, batch 3150, loss[loss=0.01658, audio_tagging_loss=0.01658, over 24952.00 frames. ], tot_loss[loss=0.01386, audio_tagging_loss=0.01386, over 4959020.19 frames. ], batch size: 100, lr: 6.02e-03, grad_scale: 32.0 2023-12-22 11:27:51,806 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.60 vs. limit=15.0 2023-12-22 11:27:55,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=561280.0, ans=0.0 2023-12-22 11:28:12,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=561346.6666666666, ans=0.2 2023-12-22 11:28:23,927 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 11:28:25,747 INFO [train.py:886] (1/4) Epoch 18, batch 3200, loss[loss=0.01787, audio_tagging_loss=0.01787, over 24933.00 frames. ], tot_loss[loss=0.01395, audio_tagging_loss=0.01395, over 4956555.45 frames. ], batch size: 100, lr: 6.02e-03, grad_scale: 32.0 2023-12-22 11:28:42,523 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.381e+01 2.819e+01 2.957e+01 3.105e+01 3.510e+01, threshold=5.913e+01, percent-clipped=0.0 2023-12-22 11:29:16,277 INFO [train.py:886] (1/4) Epoch 18, batch 3250, loss[loss=0.01372, audio_tagging_loss=0.01372, over 25000.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4957126.06 frames. ], batch size: 100, lr: 6.02e-03, grad_scale: 32.0 2023-12-22 11:29:44,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=561946.6666666666, ans=0.125 2023-12-22 11:30:02,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=562080.0, ans=0.0 2023-12-22 11:30:06,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=562146.6666666666, ans=0.125 2023-12-22 11:30:07,139 INFO [train.py:886] (1/4) Epoch 18, batch 3300, loss[loss=0.01446, audio_tagging_loss=0.01446, over 25000.00 frames. ], tot_loss[loss=0.01392, audio_tagging_loss=0.01392, over 4954597.50 frames. ], batch size: 100, lr: 6.01e-03, grad_scale: 32.0 2023-12-22 11:30:24,547 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.551e+01 2.841e+01 2.981e+01 3.120e+01 3.559e+01, threshold=5.962e+01, percent-clipped=0.0 2023-12-22 11:30:38,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=562346.6666666666, ans=0.0 2023-12-22 11:30:53,763 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.46 vs. limit=15.0 2023-12-22 11:30:58,508 INFO [train.py:886] (1/4) Epoch 18, batch 3350, loss[loss=0.01364, audio_tagging_loss=0.01364, over 25000.00 frames. ], tot_loss[loss=0.01384, audio_tagging_loss=0.01384, over 4951330.28 frames. ], batch size: 100, lr: 6.01e-03, grad_scale: 32.0 2023-12-22 11:31:14,939 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.07 vs. limit=22.5 2023-12-22 11:31:20,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=562613.3333333334, ans=0.0 2023-12-22 11:31:47,322 INFO [train.py:886] (1/4) Epoch 18, batch 3400, loss[loss=0.0132, audio_tagging_loss=0.0132, over 25000.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4950753.56 frames. ], batch size: 100, lr: 6.01e-03, grad_scale: 32.0 2023-12-22 11:31:49,380 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 11:31:52,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=562813.3333333334, ans=0.125 2023-12-22 11:32:03,388 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.35 vs. limit=10.0 2023-12-22 11:32:04,797 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.564e+01 2.896e+01 3.047e+01 3.185e+01 3.707e+01, threshold=6.093e+01, percent-clipped=0.0 2023-12-22 11:32:08,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=562946.6666666666, ans=0.0 2023-12-22 11:32:26,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=563013.3333333334, ans=0.0 2023-12-22 11:32:26,516 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.60 vs. limit=12.0 2023-12-22 11:32:27,356 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 11:32:38,976 INFO [train.py:886] (1/4) Epoch 18, batch 3450, loss[loss=0.0154, audio_tagging_loss=0.0154, over 24750.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4946679.56 frames. ], batch size: 99, lr: 6.01e-03, grad_scale: 32.0 2023-12-22 11:32:58,094 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.57 vs. limit=6.0 2023-12-22 11:32:59,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=563280.0, ans=0.125 2023-12-22 11:33:08,978 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.84 vs. limit=15.0 2023-12-22 11:33:10,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=563346.6666666666, ans=15.0 2023-12-22 11:33:10,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=563346.6666666666, ans=10.0 2023-12-22 11:33:27,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=563413.3333333334, ans=0.0 2023-12-22 11:33:29,160 INFO [train.py:886] (1/4) Epoch 18, batch 3500, loss[loss=0.01347, audio_tagging_loss=0.01347, over 24750.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4941883.14 frames. ], batch size: 99, lr: 6.01e-03, grad_scale: 32.0 2023-12-22 11:33:40,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=563546.6666666666, ans=0.0 2023-12-22 11:33:46,474 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.620e+01 2.893e+01 3.039e+01 3.166e+01 3.781e+01, threshold=6.078e+01, percent-clipped=0.0 2023-12-22 11:33:49,014 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.40 vs. limit=15.0 2023-12-22 11:33:51,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=563613.3333333334, ans=0.125 2023-12-22 11:34:05,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=563680.0, ans=0.0 2023-12-22 11:34:12,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=563746.6666666666, ans=0.2 2023-12-22 11:34:14,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=563746.6666666666, ans=0.125 2023-12-22 11:34:20,061 INFO [train.py:886] (1/4) Epoch 18, batch 3550, loss[loss=0.01271, audio_tagging_loss=0.01271, over 24750.00 frames. ], tot_loss[loss=0.01391, audio_tagging_loss=0.01391, over 4944725.94 frames. ], batch size: 99, lr: 6.01e-03, grad_scale: 32.0 2023-12-22 11:34:27,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=563813.3333333334, ans=0.2 2023-12-22 11:34:41,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=563946.6666666666, ans=0.025 2023-12-22 11:35:05,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=564080.0, ans=0.0 2023-12-22 11:35:11,886 INFO [train.py:886] (1/4) Epoch 18, batch 3600, loss[loss=0.01095, audio_tagging_loss=0.01095, over 25000.00 frames. ], tot_loss[loss=0.01371, audio_tagging_loss=0.01371, over 4947125.07 frames. ], batch size: 100, lr: 6.00e-03, grad_scale: 32.0 2023-12-22 11:35:14,936 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.75 vs. limit=15.0 2023-12-22 11:35:21,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=564213.3333333334, ans=0.0 2023-12-22 11:35:28,818 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.502e+01 2.848e+01 2.990e+01 3.158e+01 3.756e+01, threshold=5.981e+01, percent-clipped=0.0 2023-12-22 11:36:03,815 INFO [train.py:886] (1/4) Epoch 18, batch 3650, loss[loss=0.01419, audio_tagging_loss=0.01419, over 24750.00 frames. ], tot_loss[loss=0.01372, audio_tagging_loss=0.01372, over 4943427.33 frames. ], batch size: 99, lr: 6.00e-03, grad_scale: 32.0 2023-12-22 11:36:04,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=564480.0, ans=0.5 2023-12-22 11:36:23,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=564546.6666666666, ans=0.125 2023-12-22 11:36:26,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=564613.3333333334, ans=0.125 2023-12-22 11:36:28,202 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.90 vs. limit=15.0 2023-12-22 11:36:40,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=564680.0, ans=0.125 2023-12-22 11:36:56,217 INFO [train.py:886] (1/4) Epoch 18, batch 3700, loss[loss=0.01207, audio_tagging_loss=0.01207, over 22679.00 frames. ], tot_loss[loss=0.01376, audio_tagging_loss=0.01376, over 4945728.76 frames. ], batch size: 107, lr: 6.00e-03, grad_scale: 32.0 2023-12-22 11:36:59,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=564813.3333333334, ans=0.125 2023-12-22 11:37:13,791 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.550e+01 2.838e+01 2.944e+01 3.115e+01 3.574e+01, threshold=5.888e+01, percent-clipped=0.0 2023-12-22 11:37:15,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=564880.0, ans=0.125 2023-12-22 11:37:18,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=564946.6666666666, ans=15.0 2023-12-22 11:37:28,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=565013.3333333334, ans=0.1 2023-12-22 11:37:47,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=565146.6666666666, ans=0.125 2023-12-22 11:37:48,060 INFO [train.py:886] (1/4) Epoch 18, batch 3750, loss[loss=0.01338, audio_tagging_loss=0.01338, over 24750.00 frames. ], tot_loss[loss=0.01389, audio_tagging_loss=0.01389, over 4943255.69 frames. ], batch size: 99, lr: 6.00e-03, grad_scale: 32.0 2023-12-22 11:38:03,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=565213.3333333334, ans=0.125 2023-12-22 11:38:39,883 INFO [train.py:886] (1/4) Epoch 18, batch 3800, loss[loss=0.01191, audio_tagging_loss=0.01191, over 22249.00 frames. ], tot_loss[loss=0.01389, audio_tagging_loss=0.01389, over 4935183.52 frames. ], batch size: 107, lr: 6.00e-03, grad_scale: 32.0 2023-12-22 11:38:57,911 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.529e+01 2.848e+01 2.976e+01 3.119e+01 3.728e+01, threshold=5.951e+01, percent-clipped=0.0 2023-12-22 11:39:11,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=565680.0, ans=0.1 2023-12-22 11:39:19,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.52 vs. limit=15.0 2023-12-22 11:39:31,788 INFO [train.py:886] (1/4) Epoch 18, batch 3850, loss[loss=0.01229, audio_tagging_loss=0.01229, over 24750.00 frames. ], tot_loss[loss=0.01383, audio_tagging_loss=0.01383, over 4932333.31 frames. ], batch size: 99, lr: 6.00e-03, grad_scale: 32.0 2023-12-22 11:39:33,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=565813.3333333334, ans=0.0 2023-12-22 11:39:47,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=565880.0, ans=0.125 2023-12-22 11:39:59,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=565946.6666666666, ans=0.125 2023-12-22 11:40:00,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=565946.6666666666, ans=0.0 2023-12-22 11:40:01,155 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.61 vs. limit=15.0 2023-12-22 11:40:05,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=566013.3333333334, ans=0.0 2023-12-22 11:40:13,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=566080.0, ans=0.125 2023-12-22 11:40:14,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=566080.0, ans=0.125 2023-12-22 11:40:24,663 INFO [train.py:886] (1/4) Epoch 18, batch 3900, loss[loss=0.01155, audio_tagging_loss=0.01155, over 25000.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4933446.19 frames. ], batch size: 100, lr: 5.99e-03, grad_scale: 32.0 2023-12-22 11:40:24,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=566146.6666666666, ans=0.0 2023-12-22 11:40:31,464 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 11:40:41,628 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.561e+01 2.851e+01 2.964e+01 3.170e+01 3.515e+01, threshold=5.928e+01, percent-clipped=0.0 2023-12-22 11:40:46,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=566280.0, ans=0.125 2023-12-22 11:41:15,956 INFO [train.py:886] (1/4) Epoch 18, batch 3950, loss[loss=0.01491, audio_tagging_loss=0.01491, over 25000.00 frames. ], tot_loss[loss=0.01381, audio_tagging_loss=0.01381, over 4940493.49 frames. ], batch size: 100, lr: 5.99e-03, grad_scale: 32.0 2023-12-22 11:41:17,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=566480.0, ans=0.125 2023-12-22 11:41:20,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=566480.0, ans=6.0 2023-12-22 11:41:24,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=566480.0, ans=15.0 2023-12-22 11:41:41,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=566613.3333333334, ans=0.2 2023-12-22 11:41:45,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=566613.3333333334, ans=0.0 2023-12-22 11:42:00,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=566746.6666666666, ans=0.125 2023-12-22 11:42:09,139 INFO [train.py:886] (1/4) Epoch 18, batch 4000, loss[loss=0.01242, audio_tagging_loss=0.01242, over 25000.00 frames. ], tot_loss[loss=0.01378, audio_tagging_loss=0.01378, over 4950043.95 frames. ], batch size: 100, lr: 5.99e-03, grad_scale: 32.0 2023-12-22 11:42:20,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=566880.0, ans=0.125 2023-12-22 11:42:25,332 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.489e+01 2.905e+01 3.054e+01 3.173e+01 3.628e+01, threshold=6.109e+01, percent-clipped=0.0 2023-12-22 11:42:59,847 INFO [train.py:886] (1/4) Epoch 18, batch 4050, loss[loss=0.01407, audio_tagging_loss=0.01407, over 24750.00 frames. ], tot_loss[loss=0.01386, audio_tagging_loss=0.01386, over 4949746.81 frames. ], batch size: 99, lr: 5.99e-03, grad_scale: 32.0 2023-12-22 11:43:49,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=567413.3333333334, ans=0.2 2023-12-22 11:43:49,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=567413.3333333334, ans=0.0 2023-12-22 11:43:52,211 INFO [train.py:886] (1/4) Epoch 18, batch 4100, loss[loss=0.01314, audio_tagging_loss=0.01314, over 24750.00 frames. ], tot_loss[loss=0.01397, audio_tagging_loss=0.01397, over 4950576.10 frames. ], batch size: 99, lr: 5.99e-03, grad_scale: 32.0 2023-12-22 11:43:55,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=567480.0, ans=0.125 2023-12-22 11:43:56,407 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.64 vs. limit=15.0 2023-12-22 11:43:58,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=567480.0, ans=0.0 2023-12-22 11:44:02,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=567546.6666666666, ans=0.0 2023-12-22 11:44:09,653 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.648e+01 2.912e+01 3.022e+01 3.164e+01 3.761e+01, threshold=6.044e+01, percent-clipped=0.0 2023-12-22 11:44:14,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=567613.3333333334, ans=0.02 2023-12-22 11:44:43,780 INFO [train.py:886] (1/4) Epoch 18, batch 4150, loss[loss=0.01508, audio_tagging_loss=0.01508, over 24058.00 frames. ], tot_loss[loss=0.01396, audio_tagging_loss=0.01396, over 4940653.22 frames. ], batch size: 100, lr: 5.98e-03, grad_scale: 32.0 2023-12-22 11:44:54,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=567880.0, ans=0.125 2023-12-22 11:45:03,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=567946.6666666666, ans=0.1 2023-12-22 11:45:10,163 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.31 vs. limit=15.0 2023-12-22 11:45:24,701 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.116e-03 2023-12-22 11:45:30,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=568080.0, ans=0.1 2023-12-22 11:45:33,874 INFO [train.py:886] (1/4) Epoch 18, batch 4200, loss[loss=0.01492, audio_tagging_loss=0.01492, over 24750.00 frames. ], tot_loss[loss=0.01387, audio_tagging_loss=0.01387, over 4943738.39 frames. ], batch size: 99, lr: 5.98e-03, grad_scale: 32.0 2023-12-22 11:45:52,749 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+01 2.822e+01 2.922e+01 3.153e+01 3.772e+01, threshold=5.845e+01, percent-clipped=0.0 2023-12-22 11:45:55,152 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.30 vs. limit=15.0 2023-12-22 11:46:11,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=568346.6666666666, ans=0.1 2023-12-22 11:46:22,988 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.64 vs. limit=15.0 2023-12-22 11:46:27,023 INFO [train.py:886] (1/4) Epoch 18, batch 4250, loss[loss=0.01434, audio_tagging_loss=0.01434, over 24750.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4952205.60 frames. ], batch size: 99, lr: 5.98e-03, grad_scale: 32.0 2023-12-22 11:46:31,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=568480.0, ans=0.125 2023-12-22 11:46:46,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=568613.3333333334, ans=0.2 2023-12-22 11:46:53,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=568613.3333333334, ans=0.2 2023-12-22 11:46:57,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=568680.0, ans=0.125 2023-12-22 11:47:17,863 INFO [train.py:886] (1/4) Epoch 18, batch 4300, loss[loss=0.01318, audio_tagging_loss=0.01318, over 25000.00 frames. ], tot_loss[loss=0.01371, audio_tagging_loss=0.01371, over 4948302.40 frames. ], batch size: 100, lr: 5.98e-03, grad_scale: 32.0 2023-12-22 11:47:27,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=568880.0, ans=0.125 2023-12-22 11:47:36,148 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.510e+01 2.847e+01 2.955e+01 3.077e+01 3.608e+01, threshold=5.910e+01, percent-clipped=0.0 2023-12-22 11:48:00,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=569080.0, ans=0.1 2023-12-22 11:48:10,195 INFO [train.py:886] (1/4) Epoch 18, batch 4350, loss[loss=0.01334, audio_tagging_loss=0.01334, over 24750.00 frames. ], tot_loss[loss=0.01377, audio_tagging_loss=0.01377, over 4954458.77 frames. ], batch size: 99, lr: 5.98e-03, grad_scale: 32.0 2023-12-22 11:48:15,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=569146.6666666666, ans=0.0 2023-12-22 11:48:28,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=569213.3333333334, ans=0.125 2023-12-22 11:48:36,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=569280.0, ans=0.0 2023-12-22 11:48:40,252 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.02 vs. limit=15.0 2023-12-22 11:48:42,120 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.55 vs. limit=12.0 2023-12-22 11:48:45,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=569346.6666666666, ans=0.1 2023-12-22 11:48:56,818 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.81 vs. limit=6.0 2023-12-22 11:48:59,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=569413.3333333334, ans=0.125 2023-12-22 11:49:00,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=569413.3333333334, ans=0.07 2023-12-22 11:49:01,974 INFO [train.py:886] (1/4) Epoch 18, batch 4400, loss[loss=0.01183, audio_tagging_loss=0.01183, over 24064.00 frames. ], tot_loss[loss=0.01396, audio_tagging_loss=0.01396, over 4953912.19 frames. ], batch size: 100, lr: 5.98e-03, grad_scale: 32.0 2023-12-22 11:49:17,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=569546.6666666666, ans=0.0 2023-12-22 11:49:19,485 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.540e+01 2.944e+01 3.084e+01 3.234e+01 4.247e+01, threshold=6.167e+01, percent-clipped=0.0 2023-12-22 11:49:23,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=569613.3333333334, ans=0.125 2023-12-22 11:49:40,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=569680.0, ans=0.0 2023-12-22 11:49:52,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=569746.6666666666, ans=0.1 2023-12-22 11:49:53,664 INFO [train.py:886] (1/4) Epoch 18, batch 4450, loss[loss=0.01324, audio_tagging_loss=0.01324, over 25000.00 frames. ], tot_loss[loss=0.014, audio_tagging_loss=0.014, over 4946285.92 frames. ], batch size: 100, lr: 5.97e-03, grad_scale: 32.0 2023-12-22 11:49:54,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=569813.3333333334, ans=0.125 2023-12-22 11:50:14,297 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.63 vs. limit=12.0 2023-12-22 11:50:26,568 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.32 vs. limit=15.0 2023-12-22 11:50:28,135 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.27 vs. limit=15.0 2023-12-22 11:50:30,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.37 vs. limit=10.0 2023-12-22 11:50:44,247 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.14 vs. limit=15.0 2023-12-22 11:50:45,242 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.93 vs. limit=15.0 2023-12-22 11:50:45,782 INFO [train.py:886] (1/4) Epoch 18, batch 4500, loss[loss=0.0114, audio_tagging_loss=0.0114, over 25000.00 frames. ], tot_loss[loss=0.01391, audio_tagging_loss=0.01391, over 4945232.02 frames. ], batch size: 100, lr: 5.97e-03, grad_scale: 32.0 2023-12-22 11:50:46,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=570146.6666666666, ans=0.125 2023-12-22 11:50:53,616 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=7.772e-03 2023-12-22 11:51:03,229 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.526e+01 2.863e+01 2.984e+01 3.180e+01 3.715e+01, threshold=5.969e+01, percent-clipped=0.0 2023-12-22 11:51:06,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=570280.0, ans=0.125 2023-12-22 11:51:18,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=570346.6666666666, ans=0.125 2023-12-22 11:51:20,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.19 vs. limit=15.0 2023-12-22 11:51:37,357 INFO [train.py:886] (1/4) Epoch 18, batch 4550, loss[loss=0.01467, audio_tagging_loss=0.01467, over 24750.00 frames. ], tot_loss[loss=0.01385, audio_tagging_loss=0.01385, over 4953627.21 frames. ], batch size: 99, lr: 5.97e-03, grad_scale: 32.0 2023-12-22 11:51:47,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=570546.6666666666, ans=0.09899494936611666 2023-12-22 11:51:57,373 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2023-12-22 11:52:11,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=570680.0, ans=0.2 2023-12-22 11:52:12,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=570680.0, ans=0.0 2023-12-22 11:52:20,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=570746.6666666666, ans=0.1 2023-12-22 11:52:29,120 INFO [train.py:886] (1/4) Epoch 18, batch 4600, loss[loss=0.01383, audio_tagging_loss=0.01383, over 25000.00 frames. ], tot_loss[loss=0.01386, audio_tagging_loss=0.01386, over 4953742.28 frames. ], batch size: 100, lr: 5.97e-03, grad_scale: 32.0 2023-12-22 11:52:31,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=570813.3333333334, ans=0.0 2023-12-22 11:52:33,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=570813.3333333334, ans=0.125 2023-12-22 11:52:41,055 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 11:52:41,091 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 11:52:46,540 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.514e+01 2.915e+01 3.083e+01 3.222e+01 3.663e+01, threshold=6.165e+01, percent-clipped=0.0 2023-12-22 11:52:47,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=570880.0, ans=0.125 2023-12-22 11:53:16,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=571080.0, ans=0.1 2023-12-22 11:53:20,550 INFO [train.py:886] (1/4) Epoch 18, batch 4650, loss[loss=0.0141, audio_tagging_loss=0.0141, over 25000.00 frames. ], tot_loss[loss=0.01389, audio_tagging_loss=0.01389, over 4956787.60 frames. ], batch size: 100, lr: 5.97e-03, grad_scale: 32.0 2023-12-22 11:53:26,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=571146.6666666666, ans=0.025 2023-12-22 11:53:35,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=571213.3333333334, ans=0.0 2023-12-22 11:53:40,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=571280.0, ans=0.125 2023-12-22 11:53:57,696 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.83 vs. limit=12.0 2023-12-22 11:54:01,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=571413.3333333334, ans=0.0 2023-12-22 11:54:03,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=571413.3333333334, ans=0.0 2023-12-22 11:54:06,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=571413.3333333334, ans=0.0 2023-12-22 11:54:07,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=571413.3333333334, ans=0.0 2023-12-22 11:54:11,478 INFO [train.py:886] (1/4) Epoch 18, batch 4700, loss[loss=0.01673, audio_tagging_loss=0.01673, over 24750.00 frames. ], tot_loss[loss=0.01392, audio_tagging_loss=0.01392, over 4954750.91 frames. ], batch size: 99, lr: 5.97e-03, grad_scale: 32.0 2023-12-22 11:54:17,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=571480.0, ans=0.125 2023-12-22 11:54:26,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=571546.6666666666, ans=0.2 2023-12-22 11:54:26,976 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.542e+01 2.832e+01 2.995e+01 3.118e+01 3.636e+01, threshold=5.991e+01, percent-clipped=0.0 2023-12-22 11:54:40,248 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.06 vs. limit=10.0 2023-12-22 11:54:41,136 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.69 vs. limit=15.0 2023-12-22 11:54:46,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=571680.0, ans=0.2 2023-12-22 11:54:57,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=571813.3333333334, ans=0.125 2023-12-22 11:54:58,487 INFO [train.py:886] (1/4) Epoch 18, batch 4750, loss[loss=0.01662, audio_tagging_loss=0.01662, over 25000.00 frames. ], tot_loss[loss=0.01415, audio_tagging_loss=0.01415, over 4953480.73 frames. ], batch size: 100, lr: 5.96e-03, grad_scale: 32.0 2023-12-22 11:55:11,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=571880.0, ans=0.0 2023-12-22 11:55:34,688 INFO [train.py:886] (1/4) Epoch 19, batch 0, loss[loss=0.03635, audio_tagging_loss=0.03635, over 20849.00 frames. ], tot_loss[loss=0.03635, audio_tagging_loss=0.03635, over 20849.00 frames. ], batch size: 107, lr: 5.80e-03, grad_scale: 32.0 2023-12-22 11:55:34,688 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 11:55:56,081 INFO [train.py:917] (1/4) Epoch 19, validation: loss=0.03209, audio_tagging_loss=0.03209, over 3737520.00 frames. 2023-12-22 11:55:56,082 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 11:55:56,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=571920.0, ans=0.2 2023-12-22 11:56:12,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=571986.6666666666, ans=0.2 2023-12-22 11:56:13,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=571986.6666666666, ans=0.125 2023-12-22 11:56:31,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=572120.0, ans=6.0 2023-12-22 11:56:32,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=572120.0, ans=0.125 2023-12-22 11:56:36,962 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=6.356e-01 2023-12-22 11:56:46,861 INFO [train.py:886] (1/4) Epoch 19, batch 50, loss[loss=0.01912, audio_tagging_loss=0.01912, over 25000.00 frames. ], tot_loss[loss=0.02187, audio_tagging_loss=0.02187, over 1112519.54 frames. ], batch size: 100, lr: 5.80e-03, grad_scale: 64.0 2023-12-22 11:56:47,743 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.638e+01 3.056e+01 3.497e+01 4.227e+01 9.985e+01, threshold=6.993e+01, percent-clipped=7.0 2023-12-22 11:56:53,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=572253.3333333334, ans=0.125 2023-12-22 11:56:55,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=572253.3333333334, ans=0.125 2023-12-22 11:57:16,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=572453.3333333334, ans=0.0 2023-12-22 11:57:16,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=572453.3333333334, ans=0.1 2023-12-22 11:57:23,882 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.99 vs. limit=10.0 2023-12-22 11:57:38,771 INFO [train.py:886] (1/4) Epoch 19, batch 100, loss[loss=0.01764, audio_tagging_loss=0.01764, over 25000.00 frames. ], tot_loss[loss=0.01898, audio_tagging_loss=0.01898, over 1969309.28 frames. ], batch size: 100, lr: 5.80e-03, grad_scale: 64.0 2023-12-22 11:57:39,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=572586.6666666666, ans=0.125 2023-12-22 11:57:41,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=572586.6666666666, ans=0.125 2023-12-22 11:57:44,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=572586.6666666666, ans=0.125 2023-12-22 11:57:49,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=572653.3333333334, ans=0.2 2023-12-22 11:57:52,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=572653.3333333334, ans=0.2 2023-12-22 11:58:01,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=572720.0, ans=0.1 2023-12-22 11:58:02,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=572720.0, ans=0.2 2023-12-22 11:58:15,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=572786.6666666666, ans=0.125 2023-12-22 11:58:17,069 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.06 vs. limit=12.0 2023-12-22 11:58:22,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=572853.3333333334, ans=0.0 2023-12-22 11:58:30,350 INFO [train.py:886] (1/4) Epoch 19, batch 150, loss[loss=0.01248, audio_tagging_loss=0.01248, over 25000.00 frames. ], tot_loss[loss=0.01728, audio_tagging_loss=0.01728, over 2638733.76 frames. ], batch size: 100, lr: 5.80e-03, grad_scale: 64.0 2023-12-22 11:58:31,285 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.740e+01 3.036e+01 3.224e+01 3.385e+01 4.253e+01, threshold=6.449e+01, percent-clipped=0.0 2023-12-22 11:58:34,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=572920.0, ans=0.125 2023-12-22 11:58:34,927 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.97 vs. limit=15.0 2023-12-22 11:59:06,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=573120.0, ans=0.125 2023-12-22 11:59:07,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=573120.0, ans=0.1 2023-12-22 11:59:18,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=573186.6666666666, ans=0.125 2023-12-22 11:59:19,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=573186.6666666666, ans=0.125 2023-12-22 11:59:22,110 INFO [train.py:886] (1/4) Epoch 19, batch 200, loss[loss=0.01436, audio_tagging_loss=0.01436, over 24025.00 frames. ], tot_loss[loss=0.01631, audio_tagging_loss=0.01631, over 3154608.98 frames. ], batch size: 100, lr: 5.80e-03, grad_scale: 64.0 2023-12-22 11:59:25,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=573253.3333333334, ans=0.0 2023-12-22 11:59:33,044 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 11:59:33,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=573320.0, ans=0.125 2023-12-22 11:59:34,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=573320.0, ans=0.125 2023-12-22 12:00:02,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=573520.0, ans=0.0 2023-12-22 12:00:03,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=573520.0, ans=0.09899494936611666 2023-12-22 12:00:14,378 INFO [train.py:886] (1/4) Epoch 19, batch 250, loss[loss=0.01293, audio_tagging_loss=0.01293, over 25000.00 frames. ], tot_loss[loss=0.01554, audio_tagging_loss=0.01554, over 3559928.77 frames. ], batch size: 100, lr: 5.79e-03, grad_scale: 64.0 2023-12-22 12:00:15,317 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.661e+01 2.866e+01 3.015e+01 3.184e+01 3.683e+01, threshold=6.030e+01, percent-clipped=0.0 2023-12-22 12:00:15,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=573586.6666666666, ans=0.125 2023-12-22 12:00:20,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=573586.6666666666, ans=0.125 2023-12-22 12:00:20,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=573586.6666666666, ans=0.2 2023-12-22 12:00:28,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=573653.3333333334, ans=0.04949747468305833 2023-12-22 12:00:53,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=573786.6666666666, ans=0.125 2023-12-22 12:00:54,576 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.73 vs. limit=6.0 2023-12-22 12:01:01,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=573853.3333333334, ans=0.0 2023-12-22 12:01:03,206 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.41 vs. limit=15.0 2023-12-22 12:01:06,097 INFO [train.py:886] (1/4) Epoch 19, batch 300, loss[loss=0.01259, audio_tagging_loss=0.01259, over 24750.00 frames. ], tot_loss[loss=0.01518, audio_tagging_loss=0.01518, over 3869183.42 frames. ], batch size: 99, lr: 5.79e-03, grad_scale: 64.0 2023-12-22 12:01:07,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=573920.0, ans=0.125 2023-12-22 12:01:16,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=573986.6666666666, ans=0.0 2023-12-22 12:01:33,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=574053.3333333334, ans=0.0 2023-12-22 12:01:35,435 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.33 vs. limit=15.0 2023-12-22 12:01:42,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=574120.0, ans=0.125 2023-12-22 12:01:57,906 INFO [train.py:886] (1/4) Epoch 19, batch 350, loss[loss=0.01645, audio_tagging_loss=0.01645, over 24750.00 frames. ], tot_loss[loss=0.01492, audio_tagging_loss=0.01492, over 4099647.90 frames. ], batch size: 99, lr: 5.79e-03, grad_scale: 64.0 2023-12-22 12:01:58,805 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.631e+01 2.896e+01 3.022e+01 3.147e+01 3.983e+01, threshold=6.044e+01, percent-clipped=0.0 2023-12-22 12:01:59,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=574253.3333333334, ans=0.0 2023-12-22 12:02:07,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=574320.0, ans=0.1 2023-12-22 12:02:30,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.62 vs. limit=22.5 2023-12-22 12:02:46,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=574520.0, ans=0.125 2023-12-22 12:02:46,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=574520.0, ans=0.1 2023-12-22 12:02:50,302 INFO [train.py:886] (1/4) Epoch 19, batch 400, loss[loss=0.01251, audio_tagging_loss=0.01251, over 25000.00 frames. ], tot_loss[loss=0.01463, audio_tagging_loss=0.01463, over 4291076.57 frames. ], batch size: 100, lr: 5.79e-03, grad_scale: 64.0 2023-12-22 12:03:11,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=574720.0, ans=0.0 2023-12-22 12:03:27,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=574786.6666666666, ans=0.0 2023-12-22 12:03:42,121 INFO [train.py:886] (1/4) Epoch 19, batch 450, loss[loss=0.01439, audio_tagging_loss=0.01439, over 25000.00 frames. ], tot_loss[loss=0.01431, audio_tagging_loss=0.01431, over 4438696.20 frames. ], batch size: 100, lr: 5.79e-03, grad_scale: 64.0 2023-12-22 12:03:43,024 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.542e+01 2.821e+01 2.989e+01 3.118e+01 4.084e+01, threshold=5.979e+01, percent-clipped=0.0 2023-12-22 12:03:46,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=574920.0, ans=0.0 2023-12-22 12:03:49,540 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.39 vs. limit=15.0 2023-12-22 12:03:50,249 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 12:04:08,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=575053.3333333334, ans=0.125 2023-12-22 12:04:08,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=575053.3333333334, ans=0.2 2023-12-22 12:04:08,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=575053.3333333334, ans=0.2 2023-12-22 12:04:09,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=575053.3333333334, ans=0.0 2023-12-22 12:04:18,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=575120.0, ans=0.125 2023-12-22 12:04:34,415 INFO [train.py:886] (1/4) Epoch 19, batch 500, loss[loss=0.01074, audio_tagging_loss=0.01074, over 23922.00 frames. ], tot_loss[loss=0.01406, audio_tagging_loss=0.01406, over 4551827.43 frames. ], batch size: 100, lr: 5.79e-03, grad_scale: 64.0 2023-12-22 12:04:35,485 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 12:04:37,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=575253.3333333334, ans=0.09899494936611666 2023-12-22 12:04:51,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=575320.0, ans=0.125 2023-12-22 12:05:04,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=575453.3333333334, ans=0.0 2023-12-22 12:05:20,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=575520.0, ans=0.0 2023-12-22 12:05:25,819 INFO [train.py:886] (1/4) Epoch 19, batch 550, loss[loss=0.01168, audio_tagging_loss=0.01168, over 24750.00 frames. ], tot_loss[loss=0.01403, audio_tagging_loss=0.01403, over 4642174.83 frames. ], batch size: 99, lr: 5.78e-03, grad_scale: 64.0 2023-12-22 12:05:27,457 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.644e+01 2.851e+01 2.999e+01 3.143e+01 3.549e+01, threshold=5.998e+01, percent-clipped=0.0 2023-12-22 12:05:30,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=575586.6666666666, ans=0.2 2023-12-22 12:05:31,624 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.26 vs. limit=12.0 2023-12-22 12:05:36,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=575653.3333333334, ans=0.09899494936611666 2023-12-22 12:05:38,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=575653.3333333334, ans=0.1 2023-12-22 12:05:40,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=575653.3333333334, ans=0.125 2023-12-22 12:05:40,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=575653.3333333334, ans=0.2 2023-12-22 12:05:41,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=575653.3333333334, ans=0.0 2023-12-22 12:06:17,526 INFO [train.py:886] (1/4) Epoch 19, batch 600, loss[loss=0.01169, audio_tagging_loss=0.01169, over 24750.00 frames. ], tot_loss[loss=0.01408, audio_tagging_loss=0.01408, over 4708631.39 frames. ], batch size: 99, lr: 5.78e-03, grad_scale: 64.0 2023-12-22 12:06:17,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=575920.0, ans=0.125 2023-12-22 12:06:23,778 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.00 vs. limit=15.0 2023-12-22 12:06:32,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=575986.6666666666, ans=0.0 2023-12-22 12:06:44,676 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 12:06:57,954 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 12:07:10,151 INFO [train.py:886] (1/4) Epoch 19, batch 650, loss[loss=0.0124, audio_tagging_loss=0.0124, over 24750.00 frames. ], tot_loss[loss=0.01405, audio_tagging_loss=0.01405, over 4753391.86 frames. ], batch size: 99, lr: 5.78e-03, grad_scale: 64.0 2023-12-22 12:07:10,357 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 12:07:11,728 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.622e+01 2.907e+01 3.049e+01 3.128e+01 3.545e+01, threshold=6.097e+01, percent-clipped=0.0 2023-12-22 12:07:13,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=576253.3333333334, ans=0.125 2023-12-22 12:07:21,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=576320.0, ans=0.125 2023-12-22 12:07:29,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=576386.6666666666, ans=0.2 2023-12-22 12:07:37,968 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.10 vs. limit=15.0 2023-12-22 12:07:45,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=576453.3333333334, ans=0.125 2023-12-22 12:07:57,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=576520.0, ans=0.05 2023-12-22 12:08:01,305 INFO [train.py:886] (1/4) Epoch 19, batch 700, loss[loss=0.01439, audio_tagging_loss=0.01439, over 25000.00 frames. ], tot_loss[loss=0.01409, audio_tagging_loss=0.01409, over 4796009.58 frames. ], batch size: 100, lr: 5.78e-03, grad_scale: 64.0 2023-12-22 12:08:14,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=576653.3333333334, ans=0.125 2023-12-22 12:08:16,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=576653.3333333334, ans=0.0 2023-12-22 12:08:16,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=576653.3333333334, ans=0.0 2023-12-22 12:08:28,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=576720.0, ans=0.125 2023-12-22 12:08:30,707 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 12:08:41,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=576786.6666666666, ans=0.0 2023-12-22 12:08:49,032 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2023-12-22 12:08:49,995 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.25 vs. limit=15.0 2023-12-22 12:08:53,212 INFO [train.py:886] (1/4) Epoch 19, batch 750, loss[loss=0.01319, audio_tagging_loss=0.01319, over 25000.00 frames. ], tot_loss[loss=0.014, audio_tagging_loss=0.014, over 4830196.38 frames. ], batch size: 100, lr: 5.78e-03, grad_scale: 64.0 2023-12-22 12:08:54,164 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.545e+01 2.842e+01 3.000e+01 3.145e+01 3.640e+01, threshold=6.000e+01, percent-clipped=0.0 2023-12-22 12:08:57,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=576920.0, ans=0.125 2023-12-22 12:08:59,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=576920.0, ans=0.125 2023-12-22 12:09:02,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=576986.6666666666, ans=0.125 2023-12-22 12:09:06,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=576986.6666666666, ans=0.0 2023-12-22 12:09:06,769 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.33 vs. limit=22.5 2023-12-22 12:09:08,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=27.62 vs. limit=15.0 2023-12-22 12:09:23,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=577120.0, ans=0.125 2023-12-22 12:09:24,608 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 12:09:30,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=577120.0, ans=0.125 2023-12-22 12:09:37,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=577186.6666666666, ans=0.0 2023-12-22 12:09:44,888 INFO [train.py:886] (1/4) Epoch 19, batch 800, loss[loss=0.01372, audio_tagging_loss=0.01372, over 25000.00 frames. ], tot_loss[loss=0.01387, audio_tagging_loss=0.01387, over 4862304.38 frames. ], batch size: 100, lr: 5.78e-03, grad_scale: 64.0 2023-12-22 12:09:53,239 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.09 vs. limit=12.0 2023-12-22 12:10:25,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=577453.3333333334, ans=0.125 2023-12-22 12:10:28,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=577520.0, ans=0.125 2023-12-22 12:10:36,765 INFO [train.py:886] (1/4) Epoch 19, batch 850, loss[loss=0.01357, audio_tagging_loss=0.01357, over 25000.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4882192.29 frames. ], batch size: 100, lr: 5.77e-03, grad_scale: 64.0 2023-12-22 12:10:37,695 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.538e+01 2.855e+01 2.981e+01 3.141e+01 3.623e+01, threshold=5.962e+01, percent-clipped=0.0 2023-12-22 12:10:46,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=577653.3333333334, ans=0.2 2023-12-22 12:10:56,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=577653.3333333334, ans=0.0 2023-12-22 12:11:11,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=577786.6666666666, ans=0.125 2023-12-22 12:11:22,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=577853.3333333334, ans=0.125 2023-12-22 12:11:29,935 INFO [train.py:886] (1/4) Epoch 19, batch 900, loss[loss=0.01235, audio_tagging_loss=0.01235, over 25000.00 frames. ], tot_loss[loss=0.01387, audio_tagging_loss=0.01387, over 4899130.89 frames. ], batch size: 100, lr: 5.77e-03, grad_scale: 64.0 2023-12-22 12:11:30,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=577920.0, ans=0.125 2023-12-22 12:11:37,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=577920.0, ans=0.0 2023-12-22 12:11:40,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=577986.6666666666, ans=0.0 2023-12-22 12:11:41,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=577986.6666666666, ans=0.125 2023-12-22 12:11:43,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=577986.6666666666, ans=0.1 2023-12-22 12:11:45,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=577986.6666666666, ans=0.125 2023-12-22 12:11:56,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=578053.3333333334, ans=0.2 2023-12-22 12:11:56,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=578053.3333333334, ans=0.125 2023-12-22 12:11:56,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=578053.3333333334, ans=0.2 2023-12-22 12:11:57,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=578053.3333333334, ans=0.0 2023-12-22 12:11:59,660 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 12:12:19,891 INFO [train.py:886] (1/4) Epoch 19, batch 950, loss[loss=0.01299, audio_tagging_loss=0.01299, over 24750.00 frames. ], tot_loss[loss=0.01396, audio_tagging_loss=0.01396, over 4906423.31 frames. ], batch size: 99, lr: 5.77e-03, grad_scale: 64.0 2023-12-22 12:12:20,791 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.573e+01 2.852e+01 2.989e+01 3.124e+01 4.073e+01, threshold=5.979e+01, percent-clipped=0.0 2023-12-22 12:12:27,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=578253.3333333334, ans=0.125 2023-12-22 12:12:32,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=578320.0, ans=0.2 2023-12-22 12:12:37,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=578320.0, ans=0.0 2023-12-22 12:12:40,399 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.40 vs. limit=15.0 2023-12-22 12:12:43,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=578386.6666666666, ans=0.09899494936611666 2023-12-22 12:12:57,768 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.03 vs. limit=6.0 2023-12-22 12:12:58,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=578453.3333333334, ans=0.5 2023-12-22 12:13:06,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=578520.0, ans=0.125 2023-12-22 12:13:11,958 INFO [train.py:886] (1/4) Epoch 19, batch 1000, loss[loss=0.01581, audio_tagging_loss=0.01581, over 25000.00 frames. ], tot_loss[loss=0.01404, audio_tagging_loss=0.01404, over 4917561.19 frames. ], batch size: 100, lr: 5.77e-03, grad_scale: 64.0 2023-12-22 12:13:15,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=578586.6666666666, ans=0.2 2023-12-22 12:13:21,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=578653.3333333334, ans=0.1 2023-12-22 12:13:29,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=578653.3333333334, ans=0.125 2023-12-22 12:13:45,272 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.59 vs. limit=5.0 2023-12-22 12:13:59,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=578853.3333333334, ans=0.125 2023-12-22 12:14:04,566 INFO [train.py:886] (1/4) Epoch 19, batch 1050, loss[loss=0.01436, audio_tagging_loss=0.01436, over 25000.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4929770.44 frames. ], batch size: 100, lr: 5.77e-03, grad_scale: 64.0 2023-12-22 12:14:05,503 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.555e+01 2.856e+01 2.980e+01 3.114e+01 3.633e+01, threshold=5.960e+01, percent-clipped=0.0 2023-12-22 12:14:07,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=578920.0, ans=0.125 2023-12-22 12:14:44,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=579120.0, ans=0.0 2023-12-22 12:14:52,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=579186.6666666666, ans=0.0 2023-12-22 12:14:55,540 INFO [train.py:886] (1/4) Epoch 19, batch 1100, loss[loss=0.01377, audio_tagging_loss=0.01377, over 25000.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4934306.88 frames. ], batch size: 100, lr: 5.77e-03, grad_scale: 64.0 2023-12-22 12:14:56,928 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.97 vs. limit=22.5 2023-12-22 12:15:03,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=579253.3333333334, ans=0.125 2023-12-22 12:15:18,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=579386.6666666666, ans=0.125 2023-12-22 12:15:33,465 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.23 vs. limit=15.0 2023-12-22 12:15:47,817 INFO [train.py:886] (1/4) Epoch 19, batch 1150, loss[loss=0.01203, audio_tagging_loss=0.01203, over 24006.00 frames. ], tot_loss[loss=0.01373, audio_tagging_loss=0.01373, over 4932120.76 frames. ], batch size: 100, lr: 5.76e-03, grad_scale: 64.0 2023-12-22 12:15:49,405 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.419e+01 2.855e+01 2.981e+01 3.079e+01 3.493e+01, threshold=5.963e+01, percent-clipped=0.0 2023-12-22 12:16:25,779 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.34 vs. limit=12.0 2023-12-22 12:16:26,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=579786.6666666666, ans=0.2 2023-12-22 12:16:39,355 INFO [train.py:886] (1/4) Epoch 19, batch 1200, loss[loss=0.0149, audio_tagging_loss=0.0149, over 22237.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4937041.51 frames. ], batch size: 107, lr: 5.76e-03, grad_scale: 64.0 2023-12-22 12:16:43,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=579920.0, ans=0.125 2023-12-22 12:16:55,766 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.07 vs. limit=15.0 2023-12-22 12:17:04,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=580053.3333333334, ans=0.125 2023-12-22 12:17:10,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=580120.0, ans=0.125 2023-12-22 12:17:21,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=580186.6666666666, ans=0.125 2023-12-22 12:17:31,934 INFO [train.py:886] (1/4) Epoch 19, batch 1250, loss[loss=0.01385, audio_tagging_loss=0.01385, over 21933.00 frames. ], tot_loss[loss=0.01383, audio_tagging_loss=0.01383, over 4935515.86 frames. ], batch size: 107, lr: 5.76e-03, grad_scale: 64.0 2023-12-22 12:17:32,871 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.574e+01 2.968e+01 3.088e+01 3.270e+01 3.810e+01, threshold=6.176e+01, percent-clipped=0.0 2023-12-22 12:17:41,836 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.99 vs. limit=15.0 2023-12-22 12:17:56,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=580386.6666666666, ans=0.05 2023-12-22 12:18:00,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=580386.6666666666, ans=0.1 2023-12-22 12:18:07,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=580453.3333333334, ans=0.125 2023-12-22 12:18:11,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=580453.3333333334, ans=0.125 2023-12-22 12:18:12,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=580520.0, ans=0.0 2023-12-22 12:18:24,363 INFO [train.py:886] (1/4) Epoch 19, batch 1300, loss[loss=0.01507, audio_tagging_loss=0.01507, over 24750.00 frames. ], tot_loss[loss=0.01394, audio_tagging_loss=0.01394, over 4937995.65 frames. ], batch size: 99, lr: 5.76e-03, grad_scale: 64.0 2023-12-22 12:18:30,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=580586.6666666666, ans=0.1 2023-12-22 12:18:53,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=580720.0, ans=0.07 2023-12-22 12:18:59,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=580786.6666666666, ans=0.07 2023-12-22 12:19:16,602 INFO [train.py:886] (1/4) Epoch 19, batch 1350, loss[loss=0.0147, audio_tagging_loss=0.0147, over 25000.00 frames. ], tot_loss[loss=0.01381, audio_tagging_loss=0.01381, over 4944923.62 frames. ], batch size: 100, lr: 5.76e-03, grad_scale: 64.0 2023-12-22 12:19:17,517 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.439e+01 2.933e+01 3.068e+01 3.210e+01 4.169e+01, threshold=6.137e+01, percent-clipped=0.0 2023-12-22 12:19:17,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=580920.0, ans=0.2 2023-12-22 12:19:26,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=580986.6666666666, ans=0.0 2023-12-22 12:19:44,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=581053.3333333334, ans=0.125 2023-12-22 12:19:54,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=581120.0, ans=0.1 2023-12-22 12:20:01,936 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.09 vs. limit=12.0 2023-12-22 12:20:02,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=581186.6666666666, ans=0.0 2023-12-22 12:20:04,684 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2023-12-22 12:20:06,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=581186.6666666666, ans=0.125 2023-12-22 12:20:08,147 INFO [train.py:886] (1/4) Epoch 19, batch 1400, loss[loss=0.0123, audio_tagging_loss=0.0123, over 24750.00 frames. ], tot_loss[loss=0.01374, audio_tagging_loss=0.01374, over 4943334.94 frames. ], batch size: 99, lr: 5.76e-03, grad_scale: 64.0 2023-12-22 12:20:08,720 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.47 vs. limit=15.0 2023-12-22 12:20:10,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=581253.3333333334, ans=0.1 2023-12-22 12:20:25,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=581320.0, ans=0.125 2023-12-22 12:20:40,866 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.99 vs. limit=15.0 2023-12-22 12:21:00,410 INFO [train.py:886] (1/4) Epoch 19, batch 1450, loss[loss=0.01183, audio_tagging_loss=0.01183, over 25000.00 frames. ], tot_loss[loss=0.01377, audio_tagging_loss=0.01377, over 4950741.12 frames. ], batch size: 100, lr: 5.75e-03, grad_scale: 64.0 2023-12-22 12:21:01,344 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.498e+01 2.831e+01 2.949e+01 3.105e+01 4.075e+01, threshold=5.898e+01, percent-clipped=0.0 2023-12-22 12:21:15,776 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 12:21:24,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=581720.0, ans=0.0 2023-12-22 12:21:27,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=581720.0, ans=0.0 2023-12-22 12:21:28,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=581720.0, ans=0.0 2023-12-22 12:21:36,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=581786.6666666666, ans=0.125 2023-12-22 12:21:43,482 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.47 vs. limit=15.0 2023-12-22 12:21:44,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=581853.3333333334, ans=0.0 2023-12-22 12:21:46,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=581853.3333333334, ans=0.125 2023-12-22 12:21:47,321 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.50 vs. limit=22.5 2023-12-22 12:21:51,573 INFO [train.py:886] (1/4) Epoch 19, batch 1500, loss[loss=0.01358, audio_tagging_loss=0.01358, over 24750.00 frames. ], tot_loss[loss=0.01377, audio_tagging_loss=0.01377, over 4954063.43 frames. ], batch size: 99, lr: 5.75e-03, grad_scale: 64.0 2023-12-22 12:22:07,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=581986.6666666666, ans=0.125 2023-12-22 12:22:11,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=581986.6666666666, ans=0.2 2023-12-22 12:22:17,123 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 12:22:20,447 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.61 vs. limit=22.5 2023-12-22 12:22:22,353 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.30 vs. limit=15.0 2023-12-22 12:22:36,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=582186.6666666666, ans=0.2 2023-12-22 12:22:44,283 INFO [train.py:886] (1/4) Epoch 19, batch 1550, loss[loss=0.01283, audio_tagging_loss=0.01283, over 24750.00 frames. ], tot_loss[loss=0.01388, audio_tagging_loss=0.01388, over 4954569.83 frames. ], batch size: 99, lr: 5.75e-03, grad_scale: 64.0 2023-12-22 12:22:45,196 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.569e+01 2.901e+01 3.016e+01 3.205e+01 3.562e+01, threshold=6.032e+01, percent-clipped=0.0 2023-12-22 12:23:02,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=582320.0, ans=0.0 2023-12-22 12:23:35,105 INFO [train.py:886] (1/4) Epoch 19, batch 1600, loss[loss=0.01391, audio_tagging_loss=0.01391, over 24750.00 frames. ], tot_loss[loss=0.01399, audio_tagging_loss=0.01399, over 4953257.79 frames. ], batch size: 99, lr: 5.75e-03, grad_scale: 64.0 2023-12-22 12:23:44,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=582586.6666666666, ans=0.0 2023-12-22 12:23:44,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=582586.6666666666, ans=0.125 2023-12-22 12:23:48,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=582653.3333333334, ans=0.125 2023-12-22 12:23:55,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=582720.0, ans=0.2 2023-12-22 12:23:56,140 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.09 vs. limit=15.0 2023-12-22 12:24:16,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=582853.3333333334, ans=0.1 2023-12-22 12:24:17,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=582853.3333333334, ans=0.125 2023-12-22 12:24:26,985 INFO [train.py:886] (1/4) Epoch 19, batch 1650, loss[loss=0.01254, audio_tagging_loss=0.01254, over 25000.00 frames. ], tot_loss[loss=0.01389, audio_tagging_loss=0.01389, over 4950889.94 frames. ], batch size: 100, lr: 5.75e-03, grad_scale: 64.0 2023-12-22 12:24:27,922 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.591e+01 2.832e+01 3.010e+01 3.174e+01 4.519e+01, threshold=6.020e+01, percent-clipped=0.0 2023-12-22 12:24:48,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=583053.3333333334, ans=0.0 2023-12-22 12:24:55,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=583053.3333333334, ans=0.125 2023-12-22 12:25:07,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=583186.6666666666, ans=0.0 2023-12-22 12:25:20,035 INFO [train.py:886] (1/4) Epoch 19, batch 1700, loss[loss=0.01142, audio_tagging_loss=0.01142, over 25000.00 frames. ], tot_loss[loss=0.01381, audio_tagging_loss=0.01381, over 4950478.91 frames. ], batch size: 100, lr: 5.75e-03, grad_scale: 64.0 2023-12-22 12:25:36,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=583320.0, ans=0.125 2023-12-22 12:25:45,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=583386.6666666666, ans=0.1 2023-12-22 12:25:50,970 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.36 vs. limit=10.0 2023-12-22 12:25:53,329 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 12:25:59,962 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 12:26:02,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=583520.0, ans=0.125 2023-12-22 12:26:10,347 INFO [train.py:886] (1/4) Epoch 19, batch 1750, loss[loss=0.01242, audio_tagging_loss=0.01242, over 24750.00 frames. ], tot_loss[loss=0.01376, audio_tagging_loss=0.01376, over 4951347.00 frames. ], batch size: 99, lr: 5.74e-03, grad_scale: 64.0 2023-12-22 12:26:11,961 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.516e+01 2.867e+01 2.991e+01 3.170e+01 3.971e+01, threshold=5.982e+01, percent-clipped=0.0 2023-12-22 12:26:21,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.63 vs. limit=12.0 2023-12-22 12:26:32,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.34 vs. limit=10.0 2023-12-22 12:26:32,893 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.37 vs. limit=15.0 2023-12-22 12:26:51,380 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.33 vs. limit=22.5 2023-12-22 12:27:01,899 INFO [train.py:886] (1/4) Epoch 19, batch 1800, loss[loss=0.01413, audio_tagging_loss=0.01413, over 21998.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 4946940.47 frames. ], batch size: 107, lr: 5.74e-03, grad_scale: 64.0 2023-12-22 12:27:02,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=583920.0, ans=0.1 2023-12-22 12:27:02,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=583920.0, ans=0.04949747468305833 2023-12-22 12:27:03,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=583920.0, ans=0.125 2023-12-22 12:27:05,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=583920.0, ans=0.0 2023-12-22 12:27:20,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=583986.6666666666, ans=0.125 2023-12-22 12:27:54,624 INFO [train.py:886] (1/4) Epoch 19, batch 1850, loss[loss=0.01312, audio_tagging_loss=0.01312, over 25000.00 frames. ], tot_loss[loss=0.01376, audio_tagging_loss=0.01376, over 4952627.51 frames. ], batch size: 100, lr: 5.74e-03, grad_scale: 64.0 2023-12-22 12:27:55,542 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.523e+01 2.889e+01 3.007e+01 3.145e+01 3.702e+01, threshold=6.015e+01, percent-clipped=0.0 2023-12-22 12:27:55,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=584253.3333333334, ans=0.0 2023-12-22 12:28:04,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=584320.0, ans=0.125 2023-12-22 12:28:05,199 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2023-12-22 12:28:09,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=584320.0, ans=0.07 2023-12-22 12:28:11,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=584320.0, ans=0.125 2023-12-22 12:28:46,475 INFO [train.py:886] (1/4) Epoch 19, batch 1900, loss[loss=0.01356, audio_tagging_loss=0.01356, over 24750.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4945860.11 frames. ], batch size: 99, lr: 5.74e-03, grad_scale: 64.0 2023-12-22 12:28:54,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=584586.6666666666, ans=0.125 2023-12-22 12:28:55,357 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.26 vs. limit=15.0 2023-12-22 12:29:08,107 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.64 vs. limit=15.0 2023-12-22 12:29:09,704 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.72 vs. limit=15.0 2023-12-22 12:29:39,775 INFO [train.py:886] (1/4) Epoch 19, batch 1950, loss[loss=0.01248, audio_tagging_loss=0.01248, over 25000.00 frames. ], tot_loss[loss=0.01387, audio_tagging_loss=0.01387, over 4941608.26 frames. ], batch size: 100, lr: 5.74e-03, grad_scale: 64.0 2023-12-22 12:29:40,704 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.571e+01 2.861e+01 2.979e+01 3.122e+01 3.684e+01, threshold=5.958e+01, percent-clipped=0.0 2023-12-22 12:29:44,221 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.98 vs. limit=10.0 2023-12-22 12:30:09,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=585053.3333333334, ans=0.0 2023-12-22 12:30:13,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=585120.0, ans=0.1 2023-12-22 12:30:20,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=585186.6666666666, ans=0.0 2023-12-22 12:30:21,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=585186.6666666666, ans=0.0 2023-12-22 12:30:24,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=585186.6666666666, ans=0.125 2023-12-22 12:30:25,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.86 vs. limit=15.0 2023-12-22 12:30:30,755 INFO [train.py:886] (1/4) Epoch 19, batch 2000, loss[loss=0.01658, audio_tagging_loss=0.01658, over 25000.00 frames. ], tot_loss[loss=0.01378, audio_tagging_loss=0.01378, over 4940767.70 frames. ], batch size: 100, lr: 5.74e-03, grad_scale: 64.0 2023-12-22 12:30:31,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=585253.3333333334, ans=0.2 2023-12-22 12:30:46,642 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.474e-03 2023-12-22 12:30:58,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=585386.6666666666, ans=0.0 2023-12-22 12:31:11,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=585453.3333333334, ans=0.1 2023-12-22 12:31:23,496 INFO [train.py:886] (1/4) Epoch 19, batch 2050, loss[loss=0.01238, audio_tagging_loss=0.01238, over 25000.00 frames. ], tot_loss[loss=0.01368, audio_tagging_loss=0.01368, over 4943401.62 frames. ], batch size: 100, lr: 5.73e-03, grad_scale: 128.0 2023-12-22 12:31:24,385 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.636e+01 2.887e+01 3.027e+01 3.183e+01 3.649e+01, threshold=6.055e+01, percent-clipped=0.0 2023-12-22 12:31:39,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=585653.3333333334, ans=15.0 2023-12-22 12:31:54,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=585786.6666666666, ans=0.0 2023-12-22 12:32:04,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=585853.3333333334, ans=0.0 2023-12-22 12:32:13,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=585853.3333333334, ans=0.125 2023-12-22 12:32:15,256 INFO [train.py:886] (1/4) Epoch 19, batch 2100, loss[loss=0.01508, audio_tagging_loss=0.01508, over 25000.00 frames. ], tot_loss[loss=0.01375, audio_tagging_loss=0.01375, over 4948952.31 frames. ], batch size: 100, lr: 5.73e-03, grad_scale: 128.0 2023-12-22 12:32:18,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=585920.0, ans=0.1 2023-12-22 12:32:22,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=585920.0, ans=0.2 2023-12-22 12:32:25,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=585986.6666666666, ans=0.1 2023-12-22 12:32:29,180 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.657e-03 2023-12-22 12:32:40,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=586053.3333333334, ans=0.09899494936611666 2023-12-22 12:32:50,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=586120.0, ans=0.2 2023-12-22 12:33:06,972 INFO [train.py:886] (1/4) Epoch 19, batch 2150, loss[loss=0.01589, audio_tagging_loss=0.01589, over 25000.00 frames. ], tot_loss[loss=0.01384, audio_tagging_loss=0.01384, over 4951750.87 frames. ], batch size: 100, lr: 5.73e-03, grad_scale: 128.0 2023-12-22 12:33:07,900 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.379e+01 2.855e+01 2.979e+01 3.117e+01 3.544e+01, threshold=5.958e+01, percent-clipped=0.0 2023-12-22 12:33:51,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=586520.0, ans=0.125 2023-12-22 12:33:55,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=586520.0, ans=0.0 2023-12-22 12:33:57,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=586520.0, ans=0.125 2023-12-22 12:33:59,118 INFO [train.py:886] (1/4) Epoch 19, batch 2200, loss[loss=0.01272, audio_tagging_loss=0.01272, over 24750.00 frames. ], tot_loss[loss=0.01386, audio_tagging_loss=0.01386, over 4940574.20 frames. ], batch size: 99, lr: 5.73e-03, grad_scale: 128.0 2023-12-22 12:34:03,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=586586.6666666666, ans=15.0 2023-12-22 12:34:16,263 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.41 vs. limit=6.0 2023-12-22 12:34:17,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=586653.3333333334, ans=0.0 2023-12-22 12:34:18,005 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.48 vs. limit=15.0 2023-12-22 12:34:24,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.87 vs. limit=12.0 2023-12-22 12:34:25,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=586720.0, ans=0.1 2023-12-22 12:34:36,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=586786.6666666666, ans=0.1 2023-12-22 12:34:53,655 INFO [train.py:886] (1/4) Epoch 19, batch 2250, loss[loss=0.01448, audio_tagging_loss=0.01448, over 25000.00 frames. ], tot_loss[loss=0.01385, audio_tagging_loss=0.01385, over 4940774.40 frames. ], batch size: 100, lr: 5.73e-03, grad_scale: 64.0 2023-12-22 12:34:55,531 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.594e+01 2.899e+01 3.031e+01 3.221e+01 3.742e+01, threshold=6.061e+01, percent-clipped=0.0 2023-12-22 12:35:05,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=586986.6666666666, ans=0.0 2023-12-22 12:35:16,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=587053.3333333334, ans=0.125 2023-12-22 12:35:33,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=587120.0, ans=0.2 2023-12-22 12:35:45,211 INFO [train.py:886] (1/4) Epoch 19, batch 2300, loss[loss=0.01444, audio_tagging_loss=0.01444, over 25000.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4938595.92 frames. ], batch size: 100, lr: 5.73e-03, grad_scale: 64.0 2023-12-22 12:35:51,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=587253.3333333334, ans=0.0 2023-12-22 12:35:51,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=587253.3333333334, ans=0.125 2023-12-22 12:35:56,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=587320.0, ans=0.07 2023-12-22 12:36:25,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=587453.3333333334, ans=0.2 2023-12-22 12:36:37,604 INFO [train.py:886] (1/4) Epoch 19, batch 2350, loss[loss=0.01682, audio_tagging_loss=0.01682, over 25000.00 frames. ], tot_loss[loss=0.01375, audio_tagging_loss=0.01375, over 4938486.48 frames. ], batch size: 100, lr: 5.72e-03, grad_scale: 64.0 2023-12-22 12:36:39,495 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.629e+01 2.873e+01 2.999e+01 3.144e+01 3.595e+01, threshold=5.997e+01, percent-clipped=0.0 2023-12-22 12:36:47,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=587653.3333333334, ans=0.1 2023-12-22 12:36:52,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=587653.3333333334, ans=0.2 2023-12-22 12:36:58,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=587720.0, ans=0.125 2023-12-22 12:37:01,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=587720.0, ans=0.125 2023-12-22 12:37:29,349 INFO [train.py:886] (1/4) Epoch 19, batch 2400, loss[loss=0.01359, audio_tagging_loss=0.01359, over 25000.00 frames. ], tot_loss[loss=0.01377, audio_tagging_loss=0.01377, over 4943266.03 frames. ], batch size: 100, lr: 5.72e-03, grad_scale: 64.0 2023-12-22 12:37:47,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=587986.6666666666, ans=15.0 2023-12-22 12:37:48,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=587986.6666666666, ans=0.125 2023-12-22 12:38:02,028 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.54 vs. limit=12.0 2023-12-22 12:38:15,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=588186.6666666666, ans=0.1 2023-12-22 12:38:18,599 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.78 vs. limit=22.5 2023-12-22 12:38:19,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=588186.6666666666, ans=0.0 2023-12-22 12:38:20,931 INFO [train.py:886] (1/4) Epoch 19, batch 2450, loss[loss=0.01562, audio_tagging_loss=0.01562, over 25000.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4946388.25 frames. ], batch size: 100, lr: 5.72e-03, grad_scale: 64.0 2023-12-22 12:38:22,761 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.448e+01 2.869e+01 3.044e+01 3.154e+01 3.723e+01, threshold=6.089e+01, percent-clipped=0.0 2023-12-22 12:39:00,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=588453.3333333334, ans=0.1 2023-12-22 12:39:00,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=588453.3333333334, ans=0.0 2023-12-22 12:39:07,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=588520.0, ans=15.0 2023-12-22 12:39:10,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=588520.0, ans=0.125 2023-12-22 12:39:13,124 INFO [train.py:886] (1/4) Epoch 19, batch 2500, loss[loss=0.01484, audio_tagging_loss=0.01484, over 24750.00 frames. ], tot_loss[loss=0.01387, audio_tagging_loss=0.01387, over 4948555.35 frames. ], batch size: 99, lr: 5.72e-03, grad_scale: 64.0 2023-12-22 12:39:27,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=588653.3333333334, ans=0.125 2023-12-22 12:39:31,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=588653.3333333334, ans=0.1 2023-12-22 12:39:46,384 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.81 vs. limit=10.0 2023-12-22 12:39:58,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.00 vs. limit=22.5 2023-12-22 12:40:03,879 INFO [train.py:886] (1/4) Epoch 19, batch 2550, loss[loss=0.01273, audio_tagging_loss=0.01273, over 24750.00 frames. ], tot_loss[loss=0.01395, audio_tagging_loss=0.01395, over 4947819.04 frames. ], batch size: 99, lr: 5.72e-03, grad_scale: 64.0 2023-12-22 12:40:06,708 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.587e+01 2.976e+01 3.078e+01 3.235e+01 3.985e+01, threshold=6.155e+01, percent-clipped=0.0 2023-12-22 12:40:21,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=588986.6666666666, ans=0.1 2023-12-22 12:40:32,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=589053.3333333334, ans=0.1 2023-12-22 12:40:39,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=589120.0, ans=0.0 2023-12-22 12:40:56,968 INFO [train.py:886] (1/4) Epoch 19, batch 2600, loss[loss=0.01182, audio_tagging_loss=0.01182, over 25000.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4947447.71 frames. ], batch size: 100, lr: 5.72e-03, grad_scale: 64.0 2023-12-22 12:41:03,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=589253.3333333334, ans=0.2 2023-12-22 12:41:15,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=589320.0, ans=0.0 2023-12-22 12:41:27,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=589453.3333333334, ans=0.0 2023-12-22 12:41:38,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=589520.0, ans=0.1 2023-12-22 12:41:48,942 INFO [train.py:886] (1/4) Epoch 19, batch 2650, loss[loss=0.01676, audio_tagging_loss=0.01676, over 24750.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4949578.99 frames. ], batch size: 99, lr: 5.71e-03, grad_scale: 64.0 2023-12-22 12:41:51,500 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.297e+01 2.856e+01 2.983e+01 3.159e+01 3.716e+01, threshold=5.966e+01, percent-clipped=0.0 2023-12-22 12:42:04,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=589653.3333333334, ans=0.125 2023-12-22 12:42:05,146 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.26 vs. limit=12.0 2023-12-22 12:42:09,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=589720.0, ans=0.0 2023-12-22 12:42:11,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=589720.0, ans=0.2 2023-12-22 12:42:20,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=589786.6666666666, ans=0.125 2023-12-22 12:42:23,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=589786.6666666666, ans=0.125 2023-12-22 12:42:25,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=589786.6666666666, ans=0.125 2023-12-22 12:42:38,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=589853.3333333334, ans=0.125 2023-12-22 12:42:41,023 INFO [train.py:886] (1/4) Epoch 19, batch 2700, loss[loss=0.01306, audio_tagging_loss=0.01306, over 25000.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4957198.64 frames. ], batch size: 100, lr: 5.71e-03, grad_scale: 64.0 2023-12-22 12:43:12,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=590120.0, ans=0.125 2023-12-22 12:43:19,344 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.46 vs. limit=15.0 2023-12-22 12:43:20,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=590120.0, ans=0.125 2023-12-22 12:43:23,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=590186.6666666666, ans=0.0 2023-12-22 12:43:24,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=590186.6666666666, ans=0.0 2023-12-22 12:43:33,972 INFO [train.py:886] (1/4) Epoch 19, batch 2750, loss[loss=0.01314, audio_tagging_loss=0.01314, over 25000.00 frames. ], tot_loss[loss=0.01374, audio_tagging_loss=0.01374, over 4955525.90 frames. ], batch size: 100, lr: 5.71e-03, grad_scale: 64.0 2023-12-22 12:43:35,870 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.593e+01 2.856e+01 3.011e+01 3.165e+01 3.589e+01, threshold=6.021e+01, percent-clipped=0.0 2023-12-22 12:43:44,041 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.27 vs. limit=12.0 2023-12-22 12:43:44,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=590320.0, ans=0.1 2023-12-22 12:44:11,094 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.37 vs. limit=15.0 2023-12-22 12:44:18,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=590520.0, ans=0.125 2023-12-22 12:44:21,640 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.95 vs. limit=22.5 2023-12-22 12:44:24,033 INFO [train.py:886] (1/4) Epoch 19, batch 2800, loss[loss=0.01107, audio_tagging_loss=0.01107, over 24750.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4956589.80 frames. ], batch size: 99, lr: 5.71e-03, grad_scale: 64.0 2023-12-22 12:44:28,829 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.97 vs. limit=22.5 2023-12-22 12:44:53,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=590720.0, ans=0.1 2023-12-22 12:45:07,796 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.43 vs. limit=6.0 2023-12-22 12:45:16,450 INFO [train.py:886] (1/4) Epoch 19, batch 2850, loss[loss=0.01165, audio_tagging_loss=0.01165, over 24750.00 frames. ], tot_loss[loss=0.01392, audio_tagging_loss=0.01392, over 4949043.01 frames. ], batch size: 99, lr: 5.71e-03, grad_scale: 64.0 2023-12-22 12:45:18,393 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.657e+01 2.937e+01 3.059e+01 3.223e+01 3.901e+01, threshold=6.118e+01, percent-clipped=0.0 2023-12-22 12:45:23,745 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.83 vs. limit=15.0 2023-12-22 12:45:24,494 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.86 vs. limit=22.5 2023-12-22 12:45:28,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=590986.6666666666, ans=0.0 2023-12-22 12:45:33,382 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.70 vs. limit=15.0 2023-12-22 12:45:34,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=590986.6666666666, ans=0.0 2023-12-22 12:45:38,727 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 12:45:46,496 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.51 vs. limit=15.0 2023-12-22 12:46:02,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=591186.6666666666, ans=0.05 2023-12-22 12:46:02,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=591186.6666666666, ans=0.125 2023-12-22 12:46:08,814 INFO [train.py:886] (1/4) Epoch 19, batch 2900, loss[loss=0.01784, audio_tagging_loss=0.01784, over 24933.00 frames. ], tot_loss[loss=0.0138, audio_tagging_loss=0.0138, over 4947515.56 frames. ], batch size: 100, lr: 5.71e-03, grad_scale: 64.0 2023-12-22 12:46:19,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=591320.0, ans=0.0 2023-12-22 12:46:32,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=591386.6666666666, ans=0.0 2023-12-22 12:46:47,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=591453.3333333334, ans=0.125 2023-12-22 12:47:00,312 INFO [train.py:886] (1/4) Epoch 19, batch 2950, loss[loss=0.01498, audio_tagging_loss=0.01498, over 24750.00 frames. ], tot_loss[loss=0.01373, audio_tagging_loss=0.01373, over 4948377.32 frames. ], batch size: 99, lr: 5.71e-03, grad_scale: 64.0 2023-12-22 12:47:02,196 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.513e+01 2.834e+01 2.936e+01 3.112e+01 3.649e+01, threshold=5.872e+01, percent-clipped=0.0 2023-12-22 12:47:03,323 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 12:47:05,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=591586.6666666666, ans=0.1 2023-12-22 12:47:10,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=591653.3333333334, ans=0.0 2023-12-22 12:47:14,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=591653.3333333334, ans=0.125 2023-12-22 12:47:15,285 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.82 vs. limit=15.0 2023-12-22 12:47:47,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=591853.3333333334, ans=0.0 2023-12-22 12:47:49,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=591853.3333333334, ans=0.125 2023-12-22 12:47:53,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=591920.0, ans=0.125 2023-12-22 12:47:54,106 INFO [train.py:886] (1/4) Epoch 19, batch 3000, loss[loss=0.01563, audio_tagging_loss=0.01563, over 24055.00 frames. ], tot_loss[loss=0.01365, audio_tagging_loss=0.01365, over 4949465.14 frames. ], batch size: 100, lr: 5.70e-03, grad_scale: 64.0 2023-12-22 12:47:54,106 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 12:48:15,450 INFO [train.py:917] (1/4) Epoch 19, validation: loss=0.0333, audio_tagging_loss=0.0333, over 3737520.00 frames. 2023-12-22 12:48:15,451 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 12:48:21,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=591920.0, ans=0.125 2023-12-22 12:48:30,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=591986.6666666666, ans=0.125 2023-12-22 12:48:48,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=592120.0, ans=0.04949747468305833 2023-12-22 12:48:52,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=592120.0, ans=0.125 2023-12-22 12:49:06,694 INFO [train.py:886] (1/4) Epoch 19, batch 3050, loss[loss=0.0122, audio_tagging_loss=0.0122, over 25000.00 frames. ], tot_loss[loss=0.01363, audio_tagging_loss=0.01363, over 4954272.11 frames. ], batch size: 100, lr: 5.70e-03, grad_scale: 64.0 2023-12-22 12:49:07,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=592253.3333333334, ans=0.1 2023-12-22 12:49:08,540 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.628e+01 2.886e+01 3.017e+01 3.136e+01 3.625e+01, threshold=6.033e+01, percent-clipped=0.0 2023-12-22 12:49:16,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=592320.0, ans=0.0 2023-12-22 12:49:18,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=592320.0, ans=0.0 2023-12-22 12:49:18,931 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.67 vs. limit=22.5 2023-12-22 12:49:31,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=592386.6666666666, ans=0.2 2023-12-22 12:49:38,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=592453.3333333334, ans=0.0 2023-12-22 12:49:42,760 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.62 vs. limit=5.0 2023-12-22 12:49:43,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=592453.3333333334, ans=0.125 2023-12-22 12:49:44,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=592453.3333333334, ans=0.2 2023-12-22 12:49:45,038 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 12:49:49,806 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.70 vs. limit=15.0 2023-12-22 12:49:50,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=592520.0, ans=0.125 2023-12-22 12:49:59,761 INFO [train.py:886] (1/4) Epoch 19, batch 3100, loss[loss=0.0168, audio_tagging_loss=0.0168, over 24750.00 frames. ], tot_loss[loss=0.01364, audio_tagging_loss=0.01364, over 4956573.01 frames. ], batch size: 99, lr: 5.70e-03, grad_scale: 64.0 2023-12-22 12:50:01,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=592586.6666666666, ans=0.0 2023-12-22 12:50:02,952 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=12.0 2023-12-22 12:50:20,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=592720.0, ans=0.125 2023-12-22 12:50:25,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=592720.0, ans=0.2 2023-12-22 12:50:26,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=592720.0, ans=0.2 2023-12-22 12:50:28,793 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.60 vs. limit=15.0 2023-12-22 12:50:37,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=592786.6666666666, ans=0.125 2023-12-22 12:50:50,373 INFO [train.py:886] (1/4) Epoch 19, batch 3150, loss[loss=0.01348, audio_tagging_loss=0.01348, over 24750.00 frames. ], tot_loss[loss=0.0137, audio_tagging_loss=0.0137, over 4949274.31 frames. ], batch size: 99, lr: 5.70e-03, grad_scale: 64.0 2023-12-22 12:50:52,269 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.598e+01 2.871e+01 2.985e+01 3.126e+01 3.979e+01, threshold=5.970e+01, percent-clipped=0.0 2023-12-22 12:51:05,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=592986.6666666666, ans=0.125 2023-12-22 12:51:42,997 INFO [train.py:886] (1/4) Epoch 19, batch 3200, loss[loss=0.01108, audio_tagging_loss=0.01108, over 24750.00 frames. ], tot_loss[loss=0.01376, audio_tagging_loss=0.01376, over 4949928.12 frames. ], batch size: 99, lr: 5.70e-03, grad_scale: 64.0 2023-12-22 12:51:57,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=593320.0, ans=0.125 2023-12-22 12:52:04,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=593386.6666666666, ans=0.0 2023-12-22 12:52:18,665 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.98 vs. limit=10.0 2023-12-22 12:52:35,663 INFO [train.py:886] (1/4) Epoch 19, batch 3250, loss[loss=0.01389, audio_tagging_loss=0.01389, over 25000.00 frames. ], tot_loss[loss=0.01372, audio_tagging_loss=0.01372, over 4949929.97 frames. ], batch size: 100, lr: 5.70e-03, grad_scale: 64.0 2023-12-22 12:52:37,602 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.543e+01 2.874e+01 3.008e+01 3.221e+01 3.622e+01, threshold=6.016e+01, percent-clipped=0.0 2023-12-22 12:52:38,089 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.39 vs. limit=10.0 2023-12-22 12:53:07,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=593786.6666666666, ans=0.0 2023-12-22 12:53:07,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=593786.6666666666, ans=0.125 2023-12-22 12:53:13,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=593786.6666666666, ans=0.125 2023-12-22 12:53:18,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=593853.3333333334, ans=0.125 2023-12-22 12:53:25,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=593853.3333333334, ans=0.2 2023-12-22 12:53:26,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=593920.0, ans=0.1 2023-12-22 12:53:27,375 INFO [train.py:886] (1/4) Epoch 19, batch 3300, loss[loss=0.01454, audio_tagging_loss=0.01454, over 24750.00 frames. ], tot_loss[loss=0.01366, audio_tagging_loss=0.01366, over 4944389.88 frames. ], batch size: 99, lr: 5.69e-03, grad_scale: 64.0 2023-12-22 12:53:28,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.37 vs. limit=15.0 2023-12-22 12:53:55,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=594053.3333333334, ans=0.125 2023-12-22 12:53:56,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=594053.3333333334, ans=0.125 2023-12-22 12:54:16,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=594186.6666666666, ans=0.125 2023-12-22 12:54:19,662 INFO [train.py:886] (1/4) Epoch 19, batch 3350, loss[loss=0.01213, audio_tagging_loss=0.01213, over 25000.00 frames. ], tot_loss[loss=0.01364, audio_tagging_loss=0.01364, over 4939635.84 frames. ], batch size: 100, lr: 5.69e-03, grad_scale: 64.0 2023-12-22 12:54:21,565 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.652e+01 2.848e+01 3.000e+01 3.148e+01 3.687e+01, threshold=5.999e+01, percent-clipped=0.0 2023-12-22 12:54:21,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=594253.3333333334, ans=0.0 2023-12-22 12:54:25,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=594253.3333333334, ans=0.0 2023-12-22 12:54:41,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=594386.6666666666, ans=0.2 2023-12-22 12:54:52,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=594453.3333333334, ans=0.125 2023-12-22 12:54:53,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=594453.3333333334, ans=22.5 2023-12-22 12:54:55,257 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.21 vs. limit=22.5 2023-12-22 12:54:56,900 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.97 vs. limit=22.5 2023-12-22 12:55:06,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=594520.0, ans=0.0 2023-12-22 12:55:10,461 INFO [train.py:886] (1/4) Epoch 19, batch 3400, loss[loss=0.01322, audio_tagging_loss=0.01322, over 24750.00 frames. ], tot_loss[loss=0.0137, audio_tagging_loss=0.0137, over 4943198.03 frames. ], batch size: 99, lr: 5.69e-03, grad_scale: 64.0 2023-12-22 12:56:03,789 INFO [train.py:886] (1/4) Epoch 19, batch 3450, loss[loss=0.01637, audio_tagging_loss=0.01637, over 24750.00 frames. ], tot_loss[loss=0.01372, audio_tagging_loss=0.01372, over 4942728.37 frames. ], batch size: 99, lr: 5.69e-03, grad_scale: 64.0 2023-12-22 12:56:05,663 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.655e+01 2.881e+01 2.999e+01 3.150e+01 3.664e+01, threshold=5.998e+01, percent-clipped=0.0 2023-12-22 12:56:43,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=595186.6666666666, ans=0.0 2023-12-22 12:56:53,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=595186.6666666666, ans=0.0 2023-12-22 12:56:55,976 INFO [train.py:886] (1/4) Epoch 19, batch 3500, loss[loss=0.01298, audio_tagging_loss=0.01298, over 24750.00 frames. ], tot_loss[loss=0.01375, audio_tagging_loss=0.01375, over 4939384.59 frames. ], batch size: 99, lr: 5.69e-03, grad_scale: 64.0 2023-12-22 12:57:03,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=595253.3333333334, ans=0.1 2023-12-22 12:57:04,895 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.19 vs. limit=22.5 2023-12-22 12:57:11,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=595320.0, ans=0.1 2023-12-22 12:57:11,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=595320.0, ans=0.125 2023-12-22 12:57:30,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=595453.3333333334, ans=0.0 2023-12-22 12:57:46,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=595586.6666666666, ans=0.0 2023-12-22 12:57:46,944 INFO [train.py:886] (1/4) Epoch 19, batch 3550, loss[loss=0.01514, audio_tagging_loss=0.01514, over 24750.00 frames. ], tot_loss[loss=0.01372, audio_tagging_loss=0.01372, over 4944834.55 frames. ], batch size: 99, lr: 5.69e-03, grad_scale: 64.0 2023-12-22 12:57:49,601 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.664e+01 2.876e+01 3.031e+01 3.189e+01 3.844e+01, threshold=6.062e+01, percent-clipped=0.0 2023-12-22 12:57:59,102 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.14 vs. limit=22.5 2023-12-22 12:58:02,100 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.32 vs. limit=10.0 2023-12-22 12:58:07,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=595720.0, ans=0.1 2023-12-22 12:58:26,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=595786.6666666666, ans=0.125 2023-12-22 12:58:39,946 INFO [train.py:886] (1/4) Epoch 19, batch 3600, loss[loss=0.01368, audio_tagging_loss=0.01368, over 25000.00 frames. ], tot_loss[loss=0.01364, audio_tagging_loss=0.01364, over 4944659.86 frames. ], batch size: 100, lr: 5.68e-03, grad_scale: 64.0 2023-12-22 12:58:45,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=595920.0, ans=0.05 2023-12-22 12:58:50,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=595986.6666666666, ans=0.1 2023-12-22 12:58:54,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=595986.6666666666, ans=0.05 2023-12-22 12:59:01,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=596053.3333333334, ans=0.09899494936611666 2023-12-22 12:59:02,233 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.76 vs. limit=15.0 2023-12-22 12:59:09,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=596053.3333333334, ans=0.1 2023-12-22 12:59:10,814 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.70 vs. limit=15.0 2023-12-22 12:59:24,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=596186.6666666666, ans=0.1 2023-12-22 12:59:26,779 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.30 vs. limit=10.0 2023-12-22 12:59:32,379 INFO [train.py:886] (1/4) Epoch 19, batch 3650, loss[loss=0.01286, audio_tagging_loss=0.01286, over 25000.00 frames. ], tot_loss[loss=0.0136, audio_tagging_loss=0.0136, over 4951021.99 frames. ], batch size: 100, lr: 5.68e-03, grad_scale: 64.0 2023-12-22 12:59:34,453 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.61 vs. limit=15.0 2023-12-22 12:59:35,051 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.592e+01 2.803e+01 2.946e+01 3.056e+01 3.583e+01, threshold=5.891e+01, percent-clipped=0.0 2023-12-22 13:00:05,142 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.15 vs. limit=15.0 2023-12-22 13:00:08,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=596453.3333333334, ans=0.0 2023-12-22 13:00:23,288 INFO [train.py:886] (1/4) Epoch 19, batch 3700, loss[loss=0.01144, audio_tagging_loss=0.01144, over 24750.00 frames. ], tot_loss[loss=0.01364, audio_tagging_loss=0.01364, over 4954494.27 frames. ], batch size: 99, lr: 5.68e-03, grad_scale: 64.0 2023-12-22 13:00:49,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=596720.0, ans=0.0 2023-12-22 13:00:53,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=596786.6666666666, ans=0.125 2023-12-22 13:01:03,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=596853.3333333334, ans=0.125 2023-12-22 13:01:12,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=596853.3333333334, ans=0.09899494936611666 2023-12-22 13:01:15,894 INFO [train.py:886] (1/4) Epoch 19, batch 3750, loss[loss=0.01314, audio_tagging_loss=0.01314, over 24750.00 frames. ], tot_loss[loss=0.01377, audio_tagging_loss=0.01377, over 4955417.92 frames. ], batch size: 99, lr: 5.68e-03, grad_scale: 64.0 2023-12-22 13:01:17,774 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.628e+01 2.917e+01 3.055e+01 3.190e+01 3.624e+01, threshold=6.111e+01, percent-clipped=0.0 2023-12-22 13:01:22,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=596920.0, ans=0.95 2023-12-22 13:01:27,678 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.32 vs. limit=15.0 2023-12-22 13:01:38,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=597053.3333333334, ans=0.1 2023-12-22 13:01:43,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=597053.3333333334, ans=0.0 2023-12-22 13:01:48,769 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.31 vs. limit=15.0 2023-12-22 13:01:57,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=597186.6666666666, ans=0.125 2023-12-22 13:02:06,086 INFO [train.py:886] (1/4) Epoch 19, batch 3800, loss[loss=0.01235, audio_tagging_loss=0.01235, over 24750.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4952064.67 frames. ], batch size: 99, lr: 5.68e-03, grad_scale: 64.0 2023-12-22 13:02:06,483 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.70 vs. limit=10.0 2023-12-22 13:02:12,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=597253.3333333334, ans=0.1 2023-12-22 13:02:18,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=597320.0, ans=0.2 2023-12-22 13:02:18,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=597320.0, ans=0.2 2023-12-22 13:02:47,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=597520.0, ans=0.125 2023-12-22 13:02:57,558 INFO [train.py:886] (1/4) Epoch 19, batch 3850, loss[loss=0.01176, audio_tagging_loss=0.01176, over 25000.00 frames. ], tot_loss[loss=0.01377, audio_tagging_loss=0.01377, over 4948991.72 frames. ], batch size: 100, lr: 5.68e-03, grad_scale: 64.0 2023-12-22 13:02:59,402 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.693e+01 2.917e+01 3.058e+01 3.165e+01 3.564e+01, threshold=6.116e+01, percent-clipped=0.0 2023-12-22 13:02:59,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=597586.6666666666, ans=0.0 2023-12-22 13:03:00,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=597586.6666666666, ans=0.05 2023-12-22 13:03:31,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=597786.6666666666, ans=0.0 2023-12-22 13:03:39,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=597853.3333333334, ans=0.0 2023-12-22 13:03:47,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=597853.3333333334, ans=0.2 2023-12-22 13:03:49,342 INFO [train.py:886] (1/4) Epoch 19, batch 3900, loss[loss=0.01073, audio_tagging_loss=0.01073, over 24750.00 frames. ], tot_loss[loss=0.01366, audio_tagging_loss=0.01366, over 4953656.32 frames. ], batch size: 99, lr: 5.68e-03, grad_scale: 64.0 2023-12-22 13:04:09,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=598053.3333333334, ans=0.0 2023-12-22 13:04:35,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=598186.6666666666, ans=0.125 2023-12-22 13:04:36,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=598186.6666666666, ans=0.0 2023-12-22 13:04:39,095 INFO [train.py:886] (1/4) Epoch 19, batch 3950, loss[loss=0.01359, audio_tagging_loss=0.01359, over 25000.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4954052.84 frames. ], batch size: 100, lr: 5.67e-03, grad_scale: 64.0 2023-12-22 13:04:40,998 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.494e+01 2.903e+01 2.997e+01 3.155e+01 3.492e+01, threshold=5.994e+01, percent-clipped=0.0 2023-12-22 13:04:44,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=598253.3333333334, ans=0.5 2023-12-22 13:05:00,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=598386.6666666666, ans=0.1 2023-12-22 13:05:05,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=598386.6666666666, ans=0.2 2023-12-22 13:05:12,966 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.30 vs. limit=10.0 2023-12-22 13:05:25,841 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.73 vs. limit=15.0 2023-12-22 13:05:31,247 INFO [train.py:886] (1/4) Epoch 19, batch 4000, loss[loss=0.01253, audio_tagging_loss=0.01253, over 25000.00 frames. ], tot_loss[loss=0.01359, audio_tagging_loss=0.01359, over 4957936.50 frames. ], batch size: 100, lr: 5.67e-03, grad_scale: 64.0 2023-12-22 13:05:39,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=598586.6666666666, ans=0.125 2023-12-22 13:05:47,222 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.64 vs. limit=15.0 2023-12-22 13:05:50,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=598720.0, ans=0.125 2023-12-22 13:05:54,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=598720.0, ans=0.125 2023-12-22 13:05:55,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=598720.0, ans=0.125 2023-12-22 13:05:55,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=598720.0, ans=0.125 2023-12-22 13:06:21,984 INFO [train.py:886] (1/4) Epoch 19, batch 4050, loss[loss=0.01227, audio_tagging_loss=0.01227, over 24750.00 frames. ], tot_loss[loss=0.01372, audio_tagging_loss=0.01372, over 4950262.21 frames. ], batch size: 99, lr: 5.67e-03, grad_scale: 64.0 2023-12-22 13:06:24,466 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.602e+01 2.925e+01 3.047e+01 3.151e+01 3.558e+01, threshold=6.093e+01, percent-clipped=0.0 2023-12-22 13:06:29,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=598920.0, ans=0.0 2023-12-22 13:06:34,940 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.63 vs. limit=22.5 2023-12-22 13:06:45,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=599053.3333333334, ans=0.125 2023-12-22 13:07:14,255 INFO [train.py:886] (1/4) Epoch 19, batch 4100, loss[loss=0.01376, audio_tagging_loss=0.01376, over 24750.00 frames. ], tot_loss[loss=0.01388, audio_tagging_loss=0.01388, over 4947328.49 frames. ], batch size: 99, lr: 5.67e-03, grad_scale: 64.0 2023-12-22 13:07:20,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=599253.3333333334, ans=0.0 2023-12-22 13:07:26,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=599320.0, ans=0.0 2023-12-22 13:07:30,505 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.63 vs. limit=22.5 2023-12-22 13:07:41,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=599386.6666666666, ans=0.0 2023-12-22 13:07:51,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=599453.3333333334, ans=0.125 2023-12-22 13:07:54,038 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.86 vs. limit=10.0 2023-12-22 13:08:06,025 INFO [train.py:886] (1/4) Epoch 19, batch 4150, loss[loss=0.01359, audio_tagging_loss=0.01359, over 25000.00 frames. ], tot_loss[loss=0.01385, audio_tagging_loss=0.01385, over 4944685.96 frames. ], batch size: 100, lr: 5.67e-03, grad_scale: 64.0 2023-12-22 13:08:07,966 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.722e+01 2.927e+01 3.054e+01 3.225e+01 3.796e+01, threshold=6.108e+01, percent-clipped=0.0 2023-12-22 13:08:08,218 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:08:12,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=599586.6666666666, ans=0.125 2023-12-22 13:08:23,661 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.68 vs. limit=15.0 2023-12-22 13:08:24,559 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.78 vs. limit=22.5 2023-12-22 13:08:29,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=599720.0, ans=0.125 2023-12-22 13:08:49,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=599853.3333333334, ans=10.0 2023-12-22 13:08:51,260 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.97 vs. limit=12.0 2023-12-22 13:08:55,597 INFO [train.py:886] (1/4) Epoch 19, batch 4200, loss[loss=0.01403, audio_tagging_loss=0.01403, over 25000.00 frames. ], tot_loss[loss=0.01384, audio_tagging_loss=0.01384, over 4943947.61 frames. ], batch size: 100, lr: 5.67e-03, grad_scale: 64.0 2023-12-22 13:09:01,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=599920.0, ans=0.125 2023-12-22 13:09:28,867 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.19 vs. limit=15.0 2023-12-22 13:09:41,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=600186.6666666666, ans=0.125 2023-12-22 13:09:43,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=600186.6666666666, ans=0.125 2023-12-22 13:09:45,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=600186.6666666666, ans=0.125 2023-12-22 13:09:48,161 INFO [train.py:886] (1/4) Epoch 19, batch 4250, loss[loss=0.01223, audio_tagging_loss=0.01223, over 25000.00 frames. ], tot_loss[loss=0.0138, audio_tagging_loss=0.0138, over 4946639.57 frames. ], batch size: 100, lr: 5.66e-03, grad_scale: 64.0 2023-12-22 13:09:48,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=600253.3333333334, ans=0.2 2023-12-22 13:09:50,966 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.636e+01 2.908e+01 3.025e+01 3.145e+01 3.602e+01, threshold=6.050e+01, percent-clipped=0.0 2023-12-22 13:09:59,351 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=15.0 2023-12-22 13:10:04,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=600320.0, ans=0.05 2023-12-22 13:10:12,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=600386.6666666666, ans=0.125 2023-12-22 13:10:17,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=600386.6666666666, ans=0.125 2023-12-22 13:10:23,751 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.20 vs. limit=22.5 2023-12-22 13:10:34,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=600520.0, ans=0.0 2023-12-22 13:10:39,406 INFO [train.py:886] (1/4) Epoch 19, batch 4300, loss[loss=0.01331, audio_tagging_loss=0.01331, over 24750.00 frames. ], tot_loss[loss=0.01365, audio_tagging_loss=0.01365, over 4954887.71 frames. ], batch size: 99, lr: 5.66e-03, grad_scale: 64.0 2023-12-22 13:10:47,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=600586.6666666666, ans=0.125 2023-12-22 13:10:57,251 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.52 vs. limit=15.0 2023-12-22 13:10:58,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=600653.3333333334, ans=0.125 2023-12-22 13:11:01,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=600720.0, ans=0.125 2023-12-22 13:11:05,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=600720.0, ans=0.0 2023-12-22 13:11:32,040 INFO [train.py:886] (1/4) Epoch 19, batch 4350, loss[loss=0.01202, audio_tagging_loss=0.01202, over 25000.00 frames. ], tot_loss[loss=0.01369, audio_tagging_loss=0.01369, over 4953184.64 frames. ], batch size: 100, lr: 5.66e-03, grad_scale: 64.0 2023-12-22 13:11:32,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=600920.0, ans=0.1 2023-12-22 13:11:34,864 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.519e+01 2.892e+01 3.027e+01 3.169e+01 3.833e+01, threshold=6.053e+01, percent-clipped=0.0 2023-12-22 13:11:52,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=601053.3333333334, ans=10.0 2023-12-22 13:11:53,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=601053.3333333334, ans=0.02 2023-12-22 13:12:03,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=601120.0, ans=0.07 2023-12-22 13:12:15,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=601186.6666666666, ans=0.0 2023-12-22 13:12:15,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=601186.6666666666, ans=0.125 2023-12-22 13:12:19,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=601186.6666666666, ans=0.0 2023-12-22 13:12:20,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=601186.6666666666, ans=0.1 2023-12-22 13:12:21,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=601186.6666666666, ans=0.2 2023-12-22 13:12:23,991 INFO [train.py:886] (1/4) Epoch 19, batch 4400, loss[loss=0.01615, audio_tagging_loss=0.01615, over 24750.00 frames. ], tot_loss[loss=0.01385, audio_tagging_loss=0.01385, over 4950453.15 frames. ], batch size: 99, lr: 5.66e-03, grad_scale: 64.0 2023-12-22 13:12:34,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=601320.0, ans=0.015 2023-12-22 13:12:40,420 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.48 vs. limit=12.0 2023-12-22 13:12:45,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=601386.6666666666, ans=0.125 2023-12-22 13:12:57,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=601453.3333333334, ans=0.0 2023-12-22 13:13:08,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=601520.0, ans=0.125 2023-12-22 13:13:15,230 INFO [train.py:886] (1/4) Epoch 19, batch 4450, loss[loss=0.01266, audio_tagging_loss=0.01266, over 25000.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4949137.11 frames. ], batch size: 100, lr: 5.66e-03, grad_scale: 64.0 2023-12-22 13:13:19,443 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.583e+01 2.870e+01 3.001e+01 3.186e+01 3.529e+01, threshold=6.001e+01, percent-clipped=0.0 2023-12-22 13:13:27,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=601653.3333333334, ans=0.0 2023-12-22 13:13:43,135 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.07 vs. limit=15.0 2023-12-22 13:13:55,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=601786.6666666666, ans=0.0 2023-12-22 13:14:02,723 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.024e-02 2023-12-22 13:14:07,311 INFO [train.py:886] (1/4) Epoch 19, batch 4500, loss[loss=0.01287, audio_tagging_loss=0.01287, over 25000.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4948444.23 frames. ], batch size: 100, lr: 5.66e-03, grad_scale: 64.0 2023-12-22 13:14:09,379 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.63 vs. limit=15.0 2023-12-22 13:14:14,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=601920.0, ans=0.0 2023-12-22 13:14:42,010 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.49 vs. limit=15.0 2023-12-22 13:14:42,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=602120.0, ans=0.125 2023-12-22 13:14:59,672 INFO [train.py:886] (1/4) Epoch 19, batch 4550, loss[loss=0.01412, audio_tagging_loss=0.01412, over 25000.00 frames. ], tot_loss[loss=0.01391, audio_tagging_loss=0.01391, over 4956112.13 frames. ], batch size: 100, lr: 5.65e-03, grad_scale: 64.0 2023-12-22 13:15:01,153 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.86 vs. limit=15.0 2023-12-22 13:15:02,431 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.607e+01 2.855e+01 2.993e+01 3.155e+01 3.667e+01, threshold=5.986e+01, percent-clipped=0.0 2023-12-22 13:15:09,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=602320.0, ans=0.125 2023-12-22 13:15:33,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=602453.3333333334, ans=0.0 2023-12-22 13:15:49,164 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.20 vs. limit=22.5 2023-12-22 13:15:49,760 INFO [train.py:886] (1/4) Epoch 19, batch 4600, loss[loss=0.01368, audio_tagging_loss=0.01368, over 21949.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4953757.59 frames. ], batch size: 107, lr: 5.65e-03, grad_scale: 64.0 2023-12-22 13:16:18,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=602720.0, ans=0.125 2023-12-22 13:16:41,041 INFO [train.py:886] (1/4) Epoch 19, batch 4650, loss[loss=0.01251, audio_tagging_loss=0.01251, over 25000.00 frames. ], tot_loss[loss=0.01378, audio_tagging_loss=0.01378, over 4955468.02 frames. ], batch size: 100, lr: 5.65e-03, grad_scale: 64.0 2023-12-22 13:16:43,886 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.645e+01 2.933e+01 3.063e+01 3.173e+01 3.851e+01, threshold=6.126e+01, percent-clipped=0.0 2023-12-22 13:16:45,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=602920.0, ans=0.125 2023-12-22 13:16:48,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=602920.0, ans=0.09899494936611666 2023-12-22 13:16:51,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=602986.6666666666, ans=0.125 2023-12-22 13:16:55,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=602986.6666666666, ans=0.2 2023-12-22 13:16:56,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=602986.6666666666, ans=0.0 2023-12-22 13:16:56,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=602986.6666666666, ans=0.1 2023-12-22 13:16:56,454 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.73 vs. limit=22.5 2023-12-22 13:17:04,570 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.41 vs. limit=15.0 2023-12-22 13:17:06,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=603053.3333333334, ans=0.0 2023-12-22 13:17:19,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=603120.0, ans=0.0 2023-12-22 13:17:30,327 INFO [train.py:886] (1/4) Epoch 19, batch 4700, loss[loss=0.01622, audio_tagging_loss=0.01622, over 24750.00 frames. ], tot_loss[loss=0.01389, audio_tagging_loss=0.01389, over 4947179.89 frames. ], batch size: 99, lr: 5.65e-03, grad_scale: 64.0 2023-12-22 13:17:36,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=603253.3333333334, ans=0.125 2023-12-22 13:17:42,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=603320.0, ans=0.125 2023-12-22 13:17:42,230 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.00 vs. limit=15.0 2023-12-22 13:17:45,050 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.48 vs. limit=10.0 2023-12-22 13:18:14,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=603520.0, ans=0.0 2023-12-22 13:18:17,481 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.525e-03 2023-12-22 13:18:18,161 INFO [train.py:886] (1/4) Epoch 19, batch 4750, loss[loss=0.01423, audio_tagging_loss=0.01423, over 24750.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4945161.02 frames. ], batch size: 99, lr: 5.65e-03, grad_scale: 64.0 2023-12-22 13:18:20,766 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.666e+01 2.986e+01 3.070e+01 3.231e+01 3.664e+01, threshold=6.140e+01, percent-clipped=0.0 2023-12-22 13:18:51,858 INFO [train.py:886] (1/4) Epoch 20, batch 0, loss[loss=0.03496, audio_tagging_loss=0.03496, over 20639.00 frames. ], tot_loss[loss=0.03496, audio_tagging_loss=0.03496, over 20639.00 frames. ], batch size: 107, lr: 5.50e-03, grad_scale: 32.0 2023-12-22 13:18:51,859 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 13:19:12,480 INFO [train.py:917] (1/4) Epoch 20, validation: loss=0.03315, audio_tagging_loss=0.03315, over 3737520.00 frames. 2023-12-22 13:19:12,481 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 13:19:20,262 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.66 vs. limit=15.0 2023-12-22 13:19:22,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=603760.0, ans=0.0 2023-12-22 13:19:54,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=603960.0, ans=0.07 2023-12-22 13:19:56,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=603960.0, ans=0.125 2023-12-22 13:20:04,378 INFO [train.py:886] (1/4) Epoch 20, batch 50, loss[loss=0.01683, audio_tagging_loss=0.01683, over 25000.00 frames. ], tot_loss[loss=0.02164, audio_tagging_loss=0.02164, over 1113662.82 frames. ], batch size: 100, lr: 5.50e-03, grad_scale: 32.0 2023-12-22 13:20:07,687 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.80 vs. limit=12.0 2023-12-22 13:20:11,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=604026.6666666666, ans=0.2 2023-12-22 13:20:13,521 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.54 vs. limit=6.0 2023-12-22 13:20:15,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=604093.3333333334, ans=0.1 2023-12-22 13:20:42,980 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.780e+01 3.377e+01 3.713e+01 4.293e+01 9.552e+01, threshold=7.426e+01, percent-clipped=7.0 2023-12-22 13:20:54,303 INFO [train.py:886] (1/4) Epoch 20, batch 100, loss[loss=0.01516, audio_tagging_loss=0.01516, over 25000.00 frames. ], tot_loss[loss=0.01864, audio_tagging_loss=0.01864, over 1963244.74 frames. ], batch size: 100, lr: 5.50e-03, grad_scale: 32.0 2023-12-22 13:21:08,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=604426.6666666666, ans=0.2 2023-12-22 13:21:15,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=604493.3333333334, ans=0.0 2023-12-22 13:21:17,194 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:21:20,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=604493.3333333334, ans=0.04949747468305833 2023-12-22 13:21:25,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=604560.0, ans=0.0 2023-12-22 13:21:30,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=604560.0, ans=0.125 2023-12-22 13:21:33,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=604560.0, ans=0.125 2023-12-22 13:21:36,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=604626.6666666666, ans=0.1 2023-12-22 13:21:46,247 INFO [train.py:886] (1/4) Epoch 20, batch 150, loss[loss=0.01663, audio_tagging_loss=0.01663, over 24750.00 frames. ], tot_loss[loss=0.01697, audio_tagging_loss=0.01697, over 2622414.31 frames. ], batch size: 99, lr: 5.50e-03, grad_scale: 32.0 2023-12-22 13:21:50,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=604693.3333333334, ans=0.1 2023-12-22 13:21:51,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=604693.3333333334, ans=0.125 2023-12-22 13:22:19,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=604893.3333333334, ans=0.125 2023-12-22 13:22:24,478 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.643e+01 2.963e+01 3.102e+01 3.243e+01 3.699e+01, threshold=6.204e+01, percent-clipped=0.0 2023-12-22 13:22:29,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=604960.0, ans=0.125 2023-12-22 13:22:33,392 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.33 vs. limit=15.0 2023-12-22 13:22:35,859 INFO [train.py:886] (1/4) Epoch 20, batch 200, loss[loss=0.01549, audio_tagging_loss=0.01549, over 25000.00 frames. ], tot_loss[loss=0.01606, audio_tagging_loss=0.01606, over 3142378.12 frames. ], batch size: 100, lr: 5.50e-03, grad_scale: 32.0 2023-12-22 13:22:37,170 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=15.0 2023-12-22 13:22:50,936 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.61 vs. limit=22.5 2023-12-22 13:22:53,803 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.86 vs. limit=22.5 2023-12-22 13:23:01,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=605160.0, ans=0.125 2023-12-22 13:23:08,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=605226.6666666666, ans=0.125 2023-12-22 13:23:16,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=605293.3333333334, ans=0.125 2023-12-22 13:23:21,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=605293.3333333334, ans=0.0 2023-12-22 13:23:25,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=605293.3333333334, ans=0.125 2023-12-22 13:23:27,220 INFO [train.py:886] (1/4) Epoch 20, batch 250, loss[loss=0.01121, audio_tagging_loss=0.01121, over 24008.00 frames. ], tot_loss[loss=0.01537, audio_tagging_loss=0.01537, over 3547447.66 frames. ], batch size: 100, lr: 5.50e-03, grad_scale: 32.0 2023-12-22 13:23:41,931 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.00 vs. limit=10.0 2023-12-22 13:23:54,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=605493.3333333334, ans=0.035 2023-12-22 13:24:01,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=605560.0, ans=0.0 2023-12-22 13:24:04,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=605560.0, ans=0.125 2023-12-22 13:24:04,788 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.79 vs. limit=15.0 2023-12-22 13:24:05,957 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.603e+01 2.907e+01 3.036e+01 3.184e+01 3.948e+01, threshold=6.071e+01, percent-clipped=0.0 2023-12-22 13:24:17,996 INFO [train.py:886] (1/4) Epoch 20, batch 300, loss[loss=0.01499, audio_tagging_loss=0.01499, over 24750.00 frames. ], tot_loss[loss=0.01491, audio_tagging_loss=0.01491, over 3860244.69 frames. ], batch size: 99, lr: 5.49e-03, grad_scale: 32.0 2023-12-22 13:24:45,440 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:25:08,345 INFO [train.py:886] (1/4) Epoch 20, batch 350, loss[loss=0.01505, audio_tagging_loss=0.01505, over 25000.00 frames. ], tot_loss[loss=0.01483, audio_tagging_loss=0.01483, over 4095882.73 frames. ], batch size: 100, lr: 5.49e-03, grad_scale: 32.0 2023-12-22 13:25:36,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=606160.0, ans=0.2 2023-12-22 13:25:41,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=606226.6666666666, ans=0.04949747468305833 2023-12-22 13:25:44,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=606226.6666666666, ans=0.125 2023-12-22 13:25:47,582 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.469e+01 2.893e+01 2.999e+01 3.154e+01 3.798e+01, threshold=5.997e+01, percent-clipped=0.0 2023-12-22 13:25:49,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=606293.3333333334, ans=0.0 2023-12-22 13:26:00,540 INFO [train.py:886] (1/4) Epoch 20, batch 400, loss[loss=0.01498, audio_tagging_loss=0.01498, over 25000.00 frames. ], tot_loss[loss=0.0145, audio_tagging_loss=0.0145, over 4286244.41 frames. ], batch size: 100, lr: 5.49e-03, grad_scale: 32.0 2023-12-22 13:26:06,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=606360.0, ans=0.125 2023-12-22 13:26:13,930 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:26:19,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=606493.3333333334, ans=0.0 2023-12-22 13:26:20,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=606493.3333333334, ans=0.125 2023-12-22 13:26:23,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=606493.3333333334, ans=0.2 2023-12-22 13:26:50,007 INFO [train.py:886] (1/4) Epoch 20, batch 450, loss[loss=0.01148, audio_tagging_loss=0.01148, over 24750.00 frames. ], tot_loss[loss=0.0142, audio_tagging_loss=0.0142, over 4429830.49 frames. ], batch size: 99, lr: 5.49e-03, grad_scale: 32.0 2023-12-22 13:27:02,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=606760.0, ans=0.125 2023-12-22 13:27:06,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=606760.0, ans=0.125 2023-12-22 13:27:11,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=606826.6666666666, ans=0.125 2023-12-22 13:27:29,402 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.545e+01 2.863e+01 2.962e+01 3.098e+01 3.891e+01, threshold=5.924e+01, percent-clipped=0.0 2023-12-22 13:27:29,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=606893.3333333334, ans=0.1 2023-12-22 13:27:41,510 INFO [train.py:886] (1/4) Epoch 20, batch 500, loss[loss=0.0128, audio_tagging_loss=0.0128, over 25000.00 frames. ], tot_loss[loss=0.01388, audio_tagging_loss=0.01388, over 4542255.16 frames. ], batch size: 100, lr: 5.49e-03, grad_scale: 32.0 2023-12-22 13:28:18,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=607226.6666666666, ans=0.125 2023-12-22 13:28:33,276 INFO [train.py:886] (1/4) Epoch 20, batch 550, loss[loss=0.01474, audio_tagging_loss=0.01474, over 25000.00 frames. ], tot_loss[loss=0.01384, audio_tagging_loss=0.01384, over 4634735.87 frames. ], batch size: 100, lr: 5.49e-03, grad_scale: 32.0 2023-12-22 13:28:36,830 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2023-12-22 13:28:51,368 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.45 vs. limit=15.0 2023-12-22 13:28:52,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=607493.3333333334, ans=0.2 2023-12-22 13:29:12,563 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.565e+01 2.894e+01 3.022e+01 3.158e+01 3.590e+01, threshold=6.043e+01, percent-clipped=0.0 2023-12-22 13:29:15,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=607626.6666666666, ans=0.0 2023-12-22 13:29:19,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=607626.6666666666, ans=0.0 2023-12-22 13:29:20,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=607626.6666666666, ans=0.2 2023-12-22 13:29:21,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=607626.6666666666, ans=0.1 2023-12-22 13:29:22,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=607626.6666666666, ans=0.125 2023-12-22 13:29:23,972 INFO [train.py:886] (1/4) Epoch 20, batch 600, loss[loss=0.01503, audio_tagging_loss=0.01503, over 24750.00 frames. ], tot_loss[loss=0.01389, audio_tagging_loss=0.01389, over 4703389.23 frames. ], batch size: 99, lr: 5.48e-03, grad_scale: 32.0 2023-12-22 13:29:43,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=607760.0, ans=0.0 2023-12-22 13:29:44,280 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=6.855e-01 2023-12-22 13:30:06,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=607960.0, ans=0.2 2023-12-22 13:30:07,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=607960.0, ans=0.0 2023-12-22 13:30:15,644 INFO [train.py:886] (1/4) Epoch 20, batch 650, loss[loss=0.01205, audio_tagging_loss=0.01205, over 24750.00 frames. ], tot_loss[loss=0.0139, audio_tagging_loss=0.0139, over 4754239.09 frames. ], batch size: 99, lr: 5.48e-03, grad_scale: 32.0 2023-12-22 13:30:16,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=608026.6666666666, ans=0.125 2023-12-22 13:30:16,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=608026.6666666666, ans=0.0 2023-12-22 13:30:27,040 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:30:37,879 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.19 vs. limit=15.0 2023-12-22 13:30:54,378 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.574e+01 2.917e+01 3.051e+01 3.204e+01 3.727e+01, threshold=6.102e+01, percent-clipped=0.0 2023-12-22 13:31:06,496 INFO [train.py:886] (1/4) Epoch 20, batch 700, loss[loss=0.01472, audio_tagging_loss=0.01472, over 22321.00 frames. ], tot_loss[loss=0.01381, audio_tagging_loss=0.01381, over 4793150.37 frames. ], batch size: 107, lr: 5.48e-03, grad_scale: 32.0 2023-12-22 13:31:15,318 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.89 vs. limit=22.5 2023-12-22 13:31:17,219 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.95 vs. limit=15.0 2023-12-22 13:31:32,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=608493.3333333334, ans=0.125 2023-12-22 13:31:57,665 INFO [train.py:886] (1/4) Epoch 20, batch 750, loss[loss=0.01307, audio_tagging_loss=0.01307, over 25000.00 frames. ], tot_loss[loss=0.01374, audio_tagging_loss=0.01374, over 4828293.49 frames. ], batch size: 100, lr: 5.48e-03, grad_scale: 32.0 2023-12-22 13:32:01,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=608693.3333333334, ans=0.125 2023-12-22 13:32:24,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=608826.6666666666, ans=0.125 2023-12-22 13:32:29,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=608893.3333333334, ans=0.125 2023-12-22 13:32:35,945 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.542e+01 2.906e+01 3.017e+01 3.126e+01 3.757e+01, threshold=6.033e+01, percent-clipped=0.0 2023-12-22 13:32:39,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=608960.0, ans=0.0 2023-12-22 13:32:49,577 INFO [train.py:886] (1/4) Epoch 20, batch 800, loss[loss=0.01277, audio_tagging_loss=0.01277, over 24750.00 frames. ], tot_loss[loss=0.01373, audio_tagging_loss=0.01373, over 4858149.55 frames. ], batch size: 99, lr: 5.48e-03, grad_scale: 32.0 2023-12-22 13:32:52,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=609026.6666666666, ans=0.0 2023-12-22 13:32:53,049 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=15.0 2023-12-22 13:33:03,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=609093.3333333334, ans=0.1 2023-12-22 13:33:06,593 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:33:26,356 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.81 vs. limit=15.0 2023-12-22 13:33:26,386 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.54 vs. limit=15.0 2023-12-22 13:33:29,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=609293.3333333334, ans=0.0 2023-12-22 13:33:40,122 INFO [train.py:886] (1/4) Epoch 20, batch 850, loss[loss=0.01247, audio_tagging_loss=0.01247, over 25000.00 frames. ], tot_loss[loss=0.01366, audio_tagging_loss=0.01366, over 4880856.97 frames. ], batch size: 100, lr: 5.48e-03, grad_scale: 32.0 2023-12-22 13:33:48,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=609360.0, ans=0.2 2023-12-22 13:33:49,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=609360.0, ans=0.2 2023-12-22 13:33:50,702 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.05 vs. limit=10.0 2023-12-22 13:33:58,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=609426.6666666666, ans=0.0 2023-12-22 13:34:10,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=609560.0, ans=0.125 2023-12-22 13:34:19,520 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.642e+01 2.973e+01 3.095e+01 3.258e+01 3.569e+01, threshold=6.190e+01, percent-clipped=0.0 2023-12-22 13:34:25,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=609626.6666666666, ans=0.125 2023-12-22 13:34:32,286 INFO [train.py:886] (1/4) Epoch 20, batch 900, loss[loss=0.01228, audio_tagging_loss=0.01228, over 24750.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 4898051.55 frames. ], batch size: 99, lr: 5.48e-03, grad_scale: 32.0 2023-12-22 13:34:54,403 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.23 vs. limit=15.0 2023-12-22 13:34:59,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=609826.6666666666, ans=0.125 2023-12-22 13:35:04,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=609893.3333333334, ans=0.0 2023-12-22 13:35:09,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=609893.3333333334, ans=0.0 2023-12-22 13:35:12,146 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.31 vs. limit=6.0 2023-12-22 13:35:24,646 INFO [train.py:886] (1/4) Epoch 20, batch 950, loss[loss=0.01267, audio_tagging_loss=0.01267, over 24750.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4904238.07 frames. ], batch size: 99, lr: 5.47e-03, grad_scale: 32.0 2023-12-22 13:36:01,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=610226.6666666666, ans=0.125 2023-12-22 13:36:03,994 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.635e+01 2.893e+01 3.045e+01 3.239e+01 3.538e+01, threshold=6.090e+01, percent-clipped=0.0 2023-12-22 13:36:06,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=610293.3333333334, ans=0.125 2023-12-22 13:36:15,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=610360.0, ans=0.0 2023-12-22 13:36:16,085 INFO [train.py:886] (1/4) Epoch 20, batch 1000, loss[loss=0.01146, audio_tagging_loss=0.01146, over 25000.00 frames. ], tot_loss[loss=0.01377, audio_tagging_loss=0.01377, over 4911276.68 frames. ], batch size: 100, lr: 5.47e-03, grad_scale: 32.0 2023-12-22 13:36:20,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=610360.0, ans=0.0 2023-12-22 13:36:20,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=610360.0, ans=0.2 2023-12-22 13:37:08,811 INFO [train.py:886] (1/4) Epoch 20, batch 1050, loss[loss=0.01466, audio_tagging_loss=0.01466, over 24750.00 frames. ], tot_loss[loss=0.01364, audio_tagging_loss=0.01364, over 4920100.54 frames. ], batch size: 99, lr: 5.47e-03, grad_scale: 32.0 2023-12-22 13:37:23,023 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.45 vs. limit=12.0 2023-12-22 13:37:25,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=610760.0, ans=0.025 2023-12-22 13:37:28,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=610826.6666666666, ans=0.0 2023-12-22 13:37:30,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=610826.6666666666, ans=0.0 2023-12-22 13:37:43,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=610893.3333333334, ans=0.125 2023-12-22 13:37:47,459 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.594e+01 2.869e+01 3.046e+01 3.181e+01 3.757e+01, threshold=6.091e+01, percent-clipped=0.0 2023-12-22 13:37:51,735 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.75 vs. limit=6.0 2023-12-22 13:37:52,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=610960.0, ans=0.1 2023-12-22 13:37:55,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=610960.0, ans=0.125 2023-12-22 13:38:00,258 INFO [train.py:886] (1/4) Epoch 20, batch 1100, loss[loss=0.01387, audio_tagging_loss=0.01387, over 23974.00 frames. ], tot_loss[loss=0.01363, audio_tagging_loss=0.01363, over 4918249.58 frames. ], batch size: 100, lr: 5.47e-03, grad_scale: 32.0 2023-12-22 13:38:27,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=611160.0, ans=0.125 2023-12-22 13:38:33,701 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.78 vs. limit=15.0 2023-12-22 13:38:39,242 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.52 vs. limit=15.0 2023-12-22 13:38:50,980 INFO [train.py:886] (1/4) Epoch 20, batch 1150, loss[loss=0.01388, audio_tagging_loss=0.01388, over 25000.00 frames. ], tot_loss[loss=0.01361, audio_tagging_loss=0.01361, over 4928672.58 frames. ], batch size: 100, lr: 5.47e-03, grad_scale: 32.0 2023-12-22 13:39:12,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=611493.3333333334, ans=0.0 2023-12-22 13:39:21,884 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.993e-01 2023-12-22 13:39:29,145 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.553e+01 2.913e+01 3.020e+01 3.168e+01 3.585e+01, threshold=6.039e+01, percent-clipped=0.0 2023-12-22 13:39:42,698 INFO [train.py:886] (1/4) Epoch 20, batch 1200, loss[loss=0.01478, audio_tagging_loss=0.01478, over 24750.00 frames. ], tot_loss[loss=0.01364, audio_tagging_loss=0.01364, over 4934371.66 frames. ], batch size: 99, lr: 5.47e-03, grad_scale: 32.0 2023-12-22 13:39:46,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=611693.3333333334, ans=0.0 2023-12-22 13:39:59,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=611760.0, ans=0.0 2023-12-22 13:40:09,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=611826.6666666666, ans=0.1 2023-12-22 13:40:14,532 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.94 vs. limit=6.0 2023-12-22 13:40:14,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=611893.3333333334, ans=0.125 2023-12-22 13:40:31,869 INFO [train.py:886] (1/4) Epoch 20, batch 1250, loss[loss=0.01789, audio_tagging_loss=0.01789, over 24944.00 frames. ], tot_loss[loss=0.01386, audio_tagging_loss=0.01386, over 4937302.24 frames. ], batch size: 100, lr: 5.47e-03, grad_scale: 32.0 2023-12-22 13:40:38,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=612026.6666666666, ans=0.1 2023-12-22 13:40:52,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=612160.0, ans=0.125 2023-12-22 13:41:10,088 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.713e+01 2.956e+01 3.075e+01 3.204e+01 3.822e+01, threshold=6.150e+01, percent-clipped=0.0 2023-12-22 13:41:11,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=612293.3333333334, ans=10.0 2023-12-22 13:41:15,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=612293.3333333334, ans=0.1 2023-12-22 13:41:20,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.06 vs. limit=15.0 2023-12-22 13:41:21,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=612360.0, ans=0.125 2023-12-22 13:41:22,078 INFO [train.py:886] (1/4) Epoch 20, batch 1300, loss[loss=0.01443, audio_tagging_loss=0.01443, over 24750.00 frames. ], tot_loss[loss=0.01388, audio_tagging_loss=0.01388, over 4934386.36 frames. ], batch size: 99, lr: 5.46e-03, grad_scale: 32.0 2023-12-22 13:41:27,262 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.10 vs. limit=22.5 2023-12-22 13:41:41,388 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.44 vs. limit=10.0 2023-12-22 13:41:51,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=612493.3333333334, ans=0.0 2023-12-22 13:42:02,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.02 vs. limit=22.5 2023-12-22 13:42:07,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=612626.6666666666, ans=0.125 2023-12-22 13:42:13,396 INFO [train.py:886] (1/4) Epoch 20, batch 1350, loss[loss=0.01153, audio_tagging_loss=0.01153, over 25000.00 frames. ], tot_loss[loss=0.01374, audio_tagging_loss=0.01374, over 4937965.67 frames. ], batch size: 100, lr: 5.46e-03, grad_scale: 32.0 2023-12-22 13:42:15,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=612693.3333333334, ans=0.02 2023-12-22 13:42:15,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=612693.3333333334, ans=0.2 2023-12-22 13:42:25,761 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.33 vs. limit=15.0 2023-12-22 13:42:29,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=612760.0, ans=0.1 2023-12-22 13:42:38,402 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.74 vs. limit=15.0 2023-12-22 13:42:49,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=612893.3333333334, ans=0.0 2023-12-22 13:42:51,498 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.556e+01 2.862e+01 2.982e+01 3.143e+01 3.555e+01, threshold=5.964e+01, percent-clipped=0.0 2023-12-22 13:42:51,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=612893.3333333334, ans=0.0 2023-12-22 13:43:02,829 INFO [train.py:886] (1/4) Epoch 20, batch 1400, loss[loss=0.01457, audio_tagging_loss=0.01457, over 24750.00 frames. ], tot_loss[loss=0.01366, audio_tagging_loss=0.01366, over 4936818.14 frames. ], batch size: 99, lr: 5.46e-03, grad_scale: 32.0 2023-12-22 13:43:18,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=613093.3333333334, ans=0.0 2023-12-22 13:43:23,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=613160.0, ans=0.125 2023-12-22 13:43:30,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=613160.0, ans=0.125 2023-12-22 13:43:31,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=613226.6666666666, ans=0.125 2023-12-22 13:43:31,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=613226.6666666666, ans=0.125 2023-12-22 13:43:36,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=613226.6666666666, ans=0.125 2023-12-22 13:43:37,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=613226.6666666666, ans=0.125 2023-12-22 13:43:40,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=613226.6666666666, ans=0.0 2023-12-22 13:43:47,030 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.76 vs. limit=15.0 2023-12-22 13:43:49,487 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.78 vs. limit=22.5 2023-12-22 13:43:54,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=613293.3333333334, ans=0.125 2023-12-22 13:43:56,514 INFO [train.py:886] (1/4) Epoch 20, batch 1450, loss[loss=0.01199, audio_tagging_loss=0.01199, over 25000.00 frames. ], tot_loss[loss=0.01356, audio_tagging_loss=0.01356, over 4947030.38 frames. ], batch size: 100, lr: 5.46e-03, grad_scale: 32.0 2023-12-22 13:43:57,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=613360.0, ans=0.125 2023-12-22 13:44:32,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=613560.0, ans=0.125 2023-12-22 13:44:34,561 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.572e+01 2.829e+01 3.018e+01 3.145e+01 3.579e+01, threshold=6.037e+01, percent-clipped=0.0 2023-12-22 13:44:45,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=613693.3333333334, ans=0.0 2023-12-22 13:44:45,941 INFO [train.py:886] (1/4) Epoch 20, batch 1500, loss[loss=0.01725, audio_tagging_loss=0.01725, over 25000.00 frames. ], tot_loss[loss=0.01364, audio_tagging_loss=0.01364, over 4947711.68 frames. ], batch size: 100, lr: 5.46e-03, grad_scale: 32.0 2023-12-22 13:44:46,195 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:44:51,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=613693.3333333334, ans=0.0 2023-12-22 13:44:55,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=613693.3333333334, ans=0.0 2023-12-22 13:45:03,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=613760.0, ans=0.125 2023-12-22 13:45:20,021 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.61 vs. limit=15.0 2023-12-22 13:45:31,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=613960.0, ans=0.1 2023-12-22 13:45:31,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=613960.0, ans=0.125 2023-12-22 13:45:37,859 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.64 vs. limit=15.0 2023-12-22 13:45:38,431 INFO [train.py:886] (1/4) Epoch 20, batch 1550, loss[loss=0.01657, audio_tagging_loss=0.01657, over 24750.00 frames. ], tot_loss[loss=0.01374, audio_tagging_loss=0.01374, over 4946089.13 frames. ], batch size: 99, lr: 5.46e-03, grad_scale: 32.0 2023-12-22 13:46:06,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=614160.0, ans=0.0 2023-12-22 13:46:15,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=614226.6666666666, ans=0.2 2023-12-22 13:46:16,250 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.562e+01 2.942e+01 3.051e+01 3.184e+01 5.064e+01, threshold=6.103e+01, percent-clipped=0.0 2023-12-22 13:46:29,800 INFO [train.py:886] (1/4) Epoch 20, batch 1600, loss[loss=0.01335, audio_tagging_loss=0.01335, over 24750.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4943507.72 frames. ], batch size: 99, lr: 5.46e-03, grad_scale: 32.0 2023-12-22 13:46:40,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=614426.6666666666, ans=0.125 2023-12-22 13:46:40,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=614426.6666666666, ans=0.125 2023-12-22 13:46:56,970 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.02 vs. limit=15.0 2023-12-22 13:47:04,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=614560.0, ans=0.125 2023-12-22 13:47:07,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=614560.0, ans=10.0 2023-12-22 13:47:11,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=614626.6666666666, ans=0.125 2023-12-22 13:47:15,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=614626.6666666666, ans=0.125 2023-12-22 13:47:16,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=614626.6666666666, ans=15.0 2023-12-22 13:47:20,711 INFO [train.py:886] (1/4) Epoch 20, batch 1650, loss[loss=0.01394, audio_tagging_loss=0.01394, over 25000.00 frames. ], tot_loss[loss=0.01375, audio_tagging_loss=0.01375, over 4936373.11 frames. ], batch size: 100, lr: 5.45e-03, grad_scale: 32.0 2023-12-22 13:47:22,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=614693.3333333334, ans=0.07 2023-12-22 13:47:25,832 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2023-12-22 13:47:40,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=614760.0, ans=0.125 2023-12-22 13:47:59,616 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.531e+01 2.897e+01 3.020e+01 3.160e+01 3.964e+01, threshold=6.040e+01, percent-clipped=0.0 2023-12-22 13:48:00,125 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.98 vs. limit=12.0 2023-12-22 13:48:05,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=614960.0, ans=0.0 2023-12-22 13:48:12,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=615026.6666666666, ans=0.125 2023-12-22 13:48:13,739 INFO [train.py:886] (1/4) Epoch 20, batch 1700, loss[loss=0.01354, audio_tagging_loss=0.01354, over 25000.00 frames. ], tot_loss[loss=0.01374, audio_tagging_loss=0.01374, over 4946307.87 frames. ], batch size: 100, lr: 5.45e-03, grad_scale: 32.0 2023-12-22 13:48:49,229 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.98 vs. limit=10.0 2023-12-22 13:48:52,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=615293.3333333334, ans=0.125 2023-12-22 13:48:55,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=615293.3333333334, ans=0.07 2023-12-22 13:48:57,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=615293.3333333334, ans=0.07 2023-12-22 13:49:02,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=615360.0, ans=0.2 2023-12-22 13:49:02,947 INFO [train.py:886] (1/4) Epoch 20, batch 1750, loss[loss=0.01433, audio_tagging_loss=0.01433, over 25000.00 frames. ], tot_loss[loss=0.01365, audio_tagging_loss=0.01365, over 4951692.69 frames. ], batch size: 100, lr: 5.45e-03, grad_scale: 32.0 2023-12-22 13:49:03,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=615360.0, ans=0.05 2023-12-22 13:49:14,204 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.10 vs. limit=15.0 2023-12-22 13:49:14,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=615426.6666666666, ans=0.0 2023-12-22 13:49:24,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=615493.3333333334, ans=0.2 2023-12-22 13:49:27,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=615493.3333333334, ans=0.1 2023-12-22 13:49:32,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=615560.0, ans=0.015 2023-12-22 13:49:37,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=615560.0, ans=0.125 2023-12-22 13:49:42,939 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.567e+01 2.888e+01 2.963e+01 3.094e+01 5.510e+01, threshold=5.927e+01, percent-clipped=0.0 2023-12-22 13:49:44,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=615626.6666666666, ans=0.125 2023-12-22 13:49:54,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=615693.3333333334, ans=0.125 2023-12-22 13:49:54,997 INFO [train.py:886] (1/4) Epoch 20, batch 1800, loss[loss=0.01313, audio_tagging_loss=0.01313, over 25000.00 frames. ], tot_loss[loss=0.01359, audio_tagging_loss=0.01359, over 4954559.88 frames. ], batch size: 100, lr: 5.45e-03, grad_scale: 32.0 2023-12-22 13:50:03,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=615760.0, ans=0.1 2023-12-22 13:50:21,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=615826.6666666666, ans=0.0 2023-12-22 13:50:30,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=615893.3333333334, ans=0.125 2023-12-22 13:50:32,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=615893.3333333334, ans=0.125 2023-12-22 13:50:41,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=615960.0, ans=0.125 2023-12-22 13:50:47,411 INFO [train.py:886] (1/4) Epoch 20, batch 1850, loss[loss=0.01156, audio_tagging_loss=0.01156, over 24750.00 frames. ], tot_loss[loss=0.01361, audio_tagging_loss=0.01361, over 4950292.42 frames. ], batch size: 99, lr: 5.45e-03, grad_scale: 32.0 2023-12-22 13:50:48,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=616026.6666666666, ans=0.125 2023-12-22 13:50:57,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=616093.3333333334, ans=0.0 2023-12-22 13:51:02,704 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.15 vs. limit=22.5 2023-12-22 13:51:25,867 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.662e+01 2.959e+01 3.073e+01 3.205e+01 3.995e+01, threshold=6.146e+01, percent-clipped=0.0 2023-12-22 13:51:28,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=616293.3333333334, ans=0.0 2023-12-22 13:51:35,723 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.47 vs. limit=10.0 2023-12-22 13:51:38,099 INFO [train.py:886] (1/4) Epoch 20, batch 1900, loss[loss=0.01395, audio_tagging_loss=0.01395, over 24750.00 frames. ], tot_loss[loss=0.01375, audio_tagging_loss=0.01375, over 4946757.44 frames. ], batch size: 99, lr: 5.45e-03, grad_scale: 32.0 2023-12-22 13:51:38,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=616360.0, ans=0.07 2023-12-22 13:51:39,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=616360.0, ans=0.0 2023-12-22 13:51:49,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=616426.6666666666, ans=0.0 2023-12-22 13:52:12,155 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.810e-02 2023-12-22 13:52:21,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=616626.6666666666, ans=0.125 2023-12-22 13:52:30,207 INFO [train.py:886] (1/4) Epoch 20, batch 1950, loss[loss=0.01409, audio_tagging_loss=0.01409, over 25000.00 frames. ], tot_loss[loss=0.01369, audio_tagging_loss=0.01369, over 4943403.77 frames. ], batch size: 100, lr: 5.44e-03, grad_scale: 32.0 2023-12-22 13:52:34,164 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.43 vs. limit=15.0 2023-12-22 13:52:38,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=616693.3333333334, ans=0.2 2023-12-22 13:52:50,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=616826.6666666666, ans=0.0 2023-12-22 13:52:50,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=616826.6666666666, ans=0.125 2023-12-22 13:53:09,469 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.571e+01 2.904e+01 3.050e+01 3.172e+01 4.406e+01, threshold=6.100e+01, percent-clipped=0.0 2023-12-22 13:53:11,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=616960.0, ans=0.1 2023-12-22 13:53:14,674 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.61 vs. limit=15.0 2023-12-22 13:53:15,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=616960.0, ans=0.0 2023-12-22 13:53:22,132 INFO [train.py:886] (1/4) Epoch 20, batch 2000, loss[loss=0.014, audio_tagging_loss=0.014, over 25000.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 4951239.33 frames. ], batch size: 100, lr: 5.44e-03, grad_scale: 64.0 2023-12-22 13:53:45,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=617160.0, ans=0.2 2023-12-22 13:54:01,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=617226.6666666666, ans=0.125 2023-12-22 13:54:02,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=617226.6666666666, ans=0.125 2023-12-22 13:54:06,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=617293.3333333334, ans=0.0 2023-12-22 13:54:08,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=617293.3333333334, ans=0.125 2023-12-22 13:54:11,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=617293.3333333334, ans=0.125 2023-12-22 13:54:14,210 INFO [train.py:886] (1/4) Epoch 20, batch 2050, loss[loss=0.01457, audio_tagging_loss=0.01457, over 25000.00 frames. ], tot_loss[loss=0.01362, audio_tagging_loss=0.01362, over 4950574.86 frames. ], batch size: 100, lr: 5.44e-03, grad_scale: 64.0 2023-12-22 13:54:15,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=617360.0, ans=0.0 2023-12-22 13:54:26,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=617426.6666666666, ans=10.0 2023-12-22 13:54:50,491 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:54:52,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=617560.0, ans=0.125 2023-12-22 13:54:53,075 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.697e+01 2.903e+01 3.029e+01 3.164e+01 3.600e+01, threshold=6.057e+01, percent-clipped=0.0 2023-12-22 13:54:54,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=617626.6666666666, ans=0.0 2023-12-22 13:54:59,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=617626.6666666666, ans=0.125 2023-12-22 13:55:06,642 INFO [train.py:886] (1/4) Epoch 20, batch 2100, loss[loss=0.01568, audio_tagging_loss=0.01568, over 25000.00 frames. ], tot_loss[loss=0.01359, audio_tagging_loss=0.01359, over 4955287.52 frames. ], batch size: 100, lr: 5.44e-03, grad_scale: 64.0 2023-12-22 13:55:28,786 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=15.0 2023-12-22 13:55:45,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=617893.3333333334, ans=0.125 2023-12-22 13:55:49,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=617960.0, ans=0.1 2023-12-22 13:55:57,520 INFO [train.py:886] (1/4) Epoch 20, batch 2150, loss[loss=0.01301, audio_tagging_loss=0.01301, over 24003.00 frames. ], tot_loss[loss=0.0137, audio_tagging_loss=0.0137, over 4957874.57 frames. ], batch size: 100, lr: 5.44e-03, grad_scale: 64.0 2023-12-22 13:56:28,512 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.92 vs. limit=22.5 2023-12-22 13:56:37,616 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.487e+01 2.945e+01 3.054e+01 3.212e+01 3.650e+01, threshold=6.109e+01, percent-clipped=0.0 2023-12-22 13:56:47,254 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:56:49,061 INFO [train.py:886] (1/4) Epoch 20, batch 2200, loss[loss=0.01674, audio_tagging_loss=0.01674, over 24750.00 frames. ], tot_loss[loss=0.0137, audio_tagging_loss=0.0137, over 4952479.85 frames. ], batch size: 99, lr: 5.44e-03, grad_scale: 64.0 2023-12-22 13:57:08,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=618426.6666666666, ans=0.125 2023-12-22 13:57:29,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=618626.6666666666, ans=0.125 2023-12-22 13:57:40,637 INFO [train.py:886] (1/4) Epoch 20, batch 2250, loss[loss=0.01109, audio_tagging_loss=0.01109, over 25000.00 frames. ], tot_loss[loss=0.01374, audio_tagging_loss=0.01374, over 4947726.94 frames. ], batch size: 100, lr: 5.44e-03, grad_scale: 64.0 2023-12-22 13:57:49,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=618760.0, ans=0.0 2023-12-22 13:57:50,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=618760.0, ans=0.1 2023-12-22 13:57:51,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=618760.0, ans=0.125 2023-12-22 13:58:08,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=618826.6666666666, ans=0.125 2023-12-22 13:58:18,004 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.647e+01 2.946e+01 3.059e+01 3.211e+01 3.803e+01, threshold=6.118e+01, percent-clipped=0.0 2023-12-22 13:58:29,442 INFO [train.py:886] (1/4) Epoch 20, batch 2300, loss[loss=0.01431, audio_tagging_loss=0.01431, over 25000.00 frames. ], tot_loss[loss=0.01369, audio_tagging_loss=0.01369, over 4946924.76 frames. ], batch size: 100, lr: 5.43e-03, grad_scale: 64.0 2023-12-22 13:58:37,614 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=14.38 vs. limit=15.0 2023-12-22 13:59:21,874 INFO [train.py:886] (1/4) Epoch 20, batch 2350, loss[loss=0.01302, audio_tagging_loss=0.01302, over 25000.00 frames. ], tot_loss[loss=0.01372, audio_tagging_loss=0.01372, over 4951605.76 frames. ], batch size: 100, lr: 5.43e-03, grad_scale: 64.0 2023-12-22 13:59:33,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=619426.6666666666, ans=0.125 2023-12-22 13:59:53,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=619560.0, ans=0.0 2023-12-22 14:00:00,827 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.549e+01 2.868e+01 3.010e+01 3.128e+01 3.736e+01, threshold=6.020e+01, percent-clipped=0.0 2023-12-22 14:00:12,310 INFO [train.py:886] (1/4) Epoch 20, batch 2400, loss[loss=0.0155, audio_tagging_loss=0.0155, over 21459.00 frames. ], tot_loss[loss=0.01365, audio_tagging_loss=0.01365, over 4948529.50 frames. ], batch size: 107, lr: 5.43e-03, grad_scale: 64.0 2023-12-22 14:00:19,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=619693.3333333334, ans=0.1 2023-12-22 14:00:25,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=619760.0, ans=0.125 2023-12-22 14:00:53,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=619960.0, ans=0.125 2023-12-22 14:00:54,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=619960.0, ans=0.125 2023-12-22 14:00:56,509 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2023-12-22 14:01:03,801 INFO [train.py:886] (1/4) Epoch 20, batch 2450, loss[loss=0.01237, audio_tagging_loss=0.01237, over 25000.00 frames. ], tot_loss[loss=0.01371, audio_tagging_loss=0.01371, over 4952894.91 frames. ], batch size: 100, lr: 5.43e-03, grad_scale: 64.0 2023-12-22 14:01:07,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=620026.6666666666, ans=0.1 2023-12-22 14:01:12,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=620093.3333333334, ans=0.1 2023-12-22 14:01:31,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=620160.0, ans=0.5 2023-12-22 14:01:32,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=620160.0, ans=0.2 2023-12-22 14:01:33,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=620226.6666666666, ans=0.0 2023-12-22 14:01:36,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=620226.6666666666, ans=0.125 2023-12-22 14:01:41,190 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.574e+01 2.937e+01 3.082e+01 3.201e+01 3.649e+01, threshold=6.165e+01, percent-clipped=0.0 2023-12-22 14:01:41,729 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.55 vs. limit=15.0 2023-12-22 14:01:42,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=620293.3333333334, ans=0.0 2023-12-22 14:01:45,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=620293.3333333334, ans=0.1 2023-12-22 14:01:54,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=620360.0, ans=0.125 2023-12-22 14:01:54,865 INFO [train.py:886] (1/4) Epoch 20, batch 2500, loss[loss=0.01268, audio_tagging_loss=0.01268, over 24750.00 frames. ], tot_loss[loss=0.01371, audio_tagging_loss=0.01371, over 4946661.63 frames. ], batch size: 99, lr: 5.43e-03, grad_scale: 64.0 2023-12-22 14:02:28,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=620560.0, ans=0.125 2023-12-22 14:02:30,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=620560.0, ans=0.1 2023-12-22 14:02:31,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=620560.0, ans=0.0 2023-12-22 14:02:33,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=620560.0, ans=0.0 2023-12-22 14:02:34,966 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.66 vs. limit=22.5 2023-12-22 14:02:37,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=620626.6666666666, ans=0.125 2023-12-22 14:02:39,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=620626.6666666666, ans=0.125 2023-12-22 14:02:44,681 INFO [train.py:886] (1/4) Epoch 20, batch 2550, loss[loss=0.01282, audio_tagging_loss=0.01282, over 25000.00 frames. ], tot_loss[loss=0.01387, audio_tagging_loss=0.01387, over 4948648.19 frames. ], batch size: 100, lr: 5.43e-03, grad_scale: 64.0 2023-12-22 14:02:53,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=620693.3333333334, ans=10.0 2023-12-22 14:03:08,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=620826.6666666666, ans=0.125 2023-12-22 14:03:08,983 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.79 vs. limit=15.0 2023-12-22 14:03:25,337 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.633e+01 2.957e+01 3.094e+01 3.249e+01 3.777e+01, threshold=6.188e+01, percent-clipped=0.0 2023-12-22 14:03:30,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=620960.0, ans=0.2 2023-12-22 14:03:30,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=620960.0, ans=0.1 2023-12-22 14:03:34,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=620960.0, ans=22.5 2023-12-22 14:03:34,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=620960.0, ans=10.0 2023-12-22 14:03:35,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=620960.0, ans=0.1 2023-12-22 14:03:37,541 INFO [train.py:886] (1/4) Epoch 20, batch 2600, loss[loss=0.01201, audio_tagging_loss=0.01201, over 24750.00 frames. ], tot_loss[loss=0.01374, audio_tagging_loss=0.01374, over 4942641.50 frames. ], batch size: 99, lr: 5.43e-03, grad_scale: 64.0 2023-12-22 14:03:37,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=621026.6666666666, ans=0.1 2023-12-22 14:03:49,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=621093.3333333334, ans=0.5 2023-12-22 14:03:51,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=621093.3333333334, ans=0.2 2023-12-22 14:04:00,087 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.43 vs. limit=15.0 2023-12-22 14:04:03,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=621160.0, ans=0.125 2023-12-22 14:04:28,261 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=19.29 vs. limit=15.0 2023-12-22 14:04:30,332 INFO [train.py:886] (1/4) Epoch 20, batch 2650, loss[loss=0.01628, audio_tagging_loss=0.01628, over 25000.00 frames. ], tot_loss[loss=0.01365, audio_tagging_loss=0.01365, over 4945742.11 frames. ], batch size: 100, lr: 5.42e-03, grad_scale: 64.0 2023-12-22 14:04:33,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=621360.0, ans=0.1 2023-12-22 14:04:49,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=621493.3333333334, ans=0.125 2023-12-22 14:05:03,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=621560.0, ans=0.125 2023-12-22 14:05:09,977 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.585e+01 2.862e+01 3.001e+01 3.149e+01 4.006e+01, threshold=6.003e+01, percent-clipped=0.0 2023-12-22 14:05:19,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=621626.6666666666, ans=0.0 2023-12-22 14:05:21,418 INFO [train.py:886] (1/4) Epoch 20, batch 2700, loss[loss=0.01204, audio_tagging_loss=0.01204, over 25000.00 frames. ], tot_loss[loss=0.01347, audio_tagging_loss=0.01347, over 4948459.64 frames. ], batch size: 100, lr: 5.42e-03, grad_scale: 64.0 2023-12-22 14:05:30,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=621760.0, ans=0.05 2023-12-22 14:05:31,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=621760.0, ans=0.125 2023-12-22 14:05:40,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=621760.0, ans=0.2 2023-12-22 14:05:43,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=621826.6666666666, ans=0.2 2023-12-22 14:05:55,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=621893.3333333334, ans=0.09899494936611666 2023-12-22 14:06:04,887 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.205e-03 2023-12-22 14:06:12,933 INFO [train.py:886] (1/4) Epoch 20, batch 2750, loss[loss=0.0154, audio_tagging_loss=0.0154, over 24909.00 frames. ], tot_loss[loss=0.01341, audio_tagging_loss=0.01341, over 4954018.73 frames. ], batch size: 100, lr: 5.42e-03, grad_scale: 64.0 2023-12-22 14:06:21,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=622026.6666666666, ans=0.0 2023-12-22 14:06:30,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=622093.3333333334, ans=0.125 2023-12-22 14:06:34,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=622160.0, ans=0.125 2023-12-22 14:06:34,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=622160.0, ans=0.125 2023-12-22 14:06:35,228 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.05 vs. limit=6.0 2023-12-22 14:06:40,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=622160.0, ans=0.1 2023-12-22 14:06:41,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=622160.0, ans=0.125 2023-12-22 14:06:52,648 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.580e+01 2.935e+01 3.098e+01 3.194e+01 3.617e+01, threshold=6.197e+01, percent-clipped=0.0 2023-12-22 14:06:57,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=622293.3333333334, ans=0.125 2023-12-22 14:06:57,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=622293.3333333334, ans=0.125 2023-12-22 14:06:59,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=622293.3333333334, ans=0.125 2023-12-22 14:07:04,123 INFO [train.py:886] (1/4) Epoch 20, batch 2800, loss[loss=0.01453, audio_tagging_loss=0.01453, over 25000.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4949110.26 frames. ], batch size: 100, lr: 5.42e-03, grad_scale: 64.0 2023-12-22 14:07:08,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=622360.0, ans=0.1 2023-12-22 14:07:30,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=622493.3333333334, ans=0.035 2023-12-22 14:07:38,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=622560.0, ans=0.125 2023-12-22 14:07:49,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=622626.6666666666, ans=0.95 2023-12-22 14:07:56,129 INFO [train.py:886] (1/4) Epoch 20, batch 2850, loss[loss=0.01274, audio_tagging_loss=0.01274, over 24750.00 frames. ], tot_loss[loss=0.01362, audio_tagging_loss=0.01362, over 4943963.31 frames. ], batch size: 99, lr: 5.42e-03, grad_scale: 64.0 2023-12-22 14:08:04,165 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2023-12-22 14:08:31,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=622893.3333333334, ans=0.025 2023-12-22 14:08:34,138 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.595e+01 2.924e+01 3.080e+01 3.230e+01 3.669e+01, threshold=6.161e+01, percent-clipped=0.0 2023-12-22 14:08:36,803 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.33 vs. limit=15.0 2023-12-22 14:08:46,407 INFO [train.py:886] (1/4) Epoch 20, batch 2900, loss[loss=0.01344, audio_tagging_loss=0.01344, over 24750.00 frames. ], tot_loss[loss=0.0136, audio_tagging_loss=0.0136, over 4941559.54 frames. ], batch size: 99, lr: 5.42e-03, grad_scale: 64.0 2023-12-22 14:08:46,693 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=7.992e-03 2023-12-22 14:09:21,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=623226.6666666666, ans=0.2 2023-12-22 14:09:23,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=623226.6666666666, ans=0.0 2023-12-22 14:09:31,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=623293.3333333334, ans=0.1 2023-12-22 14:09:36,747 INFO [train.py:886] (1/4) Epoch 20, batch 2950, loss[loss=0.01132, audio_tagging_loss=0.01132, over 25000.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4945665.37 frames. ], batch size: 100, lr: 5.42e-03, grad_scale: 64.0 2023-12-22 14:09:37,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=623360.0, ans=0.0 2023-12-22 14:09:41,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=623360.0, ans=0.125 2023-12-22 14:09:54,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=623426.6666666666, ans=0.125 2023-12-22 14:10:05,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=623493.3333333334, ans=0.125 2023-12-22 14:10:08,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=623560.0, ans=0.0 2023-12-22 14:10:14,855 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.630e+01 2.870e+01 3.046e+01 3.169e+01 3.607e+01, threshold=6.091e+01, percent-clipped=0.0 2023-12-22 14:10:18,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=623626.6666666666, ans=0.125 2023-12-22 14:10:26,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=623626.6666666666, ans=0.125 2023-12-22 14:10:28,843 INFO [train.py:886] (1/4) Epoch 20, batch 3000, loss[loss=0.01732, audio_tagging_loss=0.01732, over 25000.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4951725.58 frames. ], batch size: 100, lr: 5.41e-03, grad_scale: 64.0 2023-12-22 14:10:28,844 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 14:10:36,153 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.6086, 3.6306, 3.2669, 3.1857], device='cuda:1') 2023-12-22 14:10:50,338 INFO [train.py:917] (1/4) Epoch 20, validation: loss=0.03313, audio_tagging_loss=0.03313, over 3737520.00 frames. 2023-12-22 14:10:50,338 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 14:11:27,016 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.45 vs. limit=22.5 2023-12-22 14:11:37,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=623960.0, ans=0.125 2023-12-22 14:11:40,490 INFO [train.py:886] (1/4) Epoch 20, batch 3050, loss[loss=0.01018, audio_tagging_loss=0.01018, over 25000.00 frames. ], tot_loss[loss=0.01344, audio_tagging_loss=0.01344, over 4956149.83 frames. ], batch size: 100, lr: 5.41e-03, grad_scale: 64.0 2023-12-22 14:11:43,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=624026.6666666666, ans=0.125 2023-12-22 14:12:02,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=624160.0, ans=0.125 2023-12-22 14:12:06,631 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.058e+00 2023-12-22 14:12:12,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=624226.6666666666, ans=0.1 2023-12-22 14:12:18,571 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.616e+01 2.950e+01 3.022e+01 3.122e+01 3.698e+01, threshold=6.045e+01, percent-clipped=0.0 2023-12-22 14:12:20,953 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.42 vs. limit=15.0 2023-12-22 14:12:30,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=624360.0, ans=0.125 2023-12-22 14:12:30,783 INFO [train.py:886] (1/4) Epoch 20, batch 3100, loss[loss=0.01534, audio_tagging_loss=0.01534, over 25000.00 frames. ], tot_loss[loss=0.01355, audio_tagging_loss=0.01355, over 4955472.74 frames. ], batch size: 100, lr: 5.41e-03, grad_scale: 64.0 2023-12-22 14:12:31,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=624360.0, ans=0.125 2023-12-22 14:12:50,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=624493.3333333334, ans=0.125 2023-12-22 14:12:55,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=624493.3333333334, ans=0.04949747468305833 2023-12-22 14:12:59,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=624493.3333333334, ans=0.2 2023-12-22 14:13:01,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=624560.0, ans=0.1 2023-12-22 14:13:02,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=624560.0, ans=0.09899494936611666 2023-12-22 14:13:07,040 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 14:13:20,287 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.29 vs. limit=15.0 2023-12-22 14:13:20,711 INFO [train.py:886] (1/4) Epoch 20, batch 3150, loss[loss=0.01585, audio_tagging_loss=0.01585, over 24750.00 frames. ], tot_loss[loss=0.0136, audio_tagging_loss=0.0136, over 4949953.71 frames. ], batch size: 99, lr: 5.41e-03, grad_scale: 64.0 2023-12-22 14:13:26,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=624693.3333333334, ans=0.0 2023-12-22 14:13:26,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=624693.3333333334, ans=0.125 2023-12-22 14:13:27,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=624693.3333333334, ans=0.2 2023-12-22 14:13:28,877 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.760e-01 2023-12-22 14:13:29,285 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.61 vs. limit=15.0 2023-12-22 14:13:37,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=624760.0, ans=0.125 2023-12-22 14:13:39,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=624760.0, ans=0.125 2023-12-22 14:13:43,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=624826.6666666666, ans=0.125 2023-12-22 14:13:44,288 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.67 vs. limit=15.0 2023-12-22 14:13:49,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=624826.6666666666, ans=0.125 2023-12-22 14:13:57,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=624893.3333333334, ans=0.2 2023-12-22 14:14:00,105 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.588e+01 2.956e+01 3.114e+01 3.316e+01 3.696e+01, threshold=6.228e+01, percent-clipped=0.0 2023-12-22 14:14:11,509 INFO [train.py:886] (1/4) Epoch 20, batch 3200, loss[loss=0.01388, audio_tagging_loss=0.01388, over 25000.00 frames. ], tot_loss[loss=0.01362, audio_tagging_loss=0.01362, over 4946309.77 frames. ], batch size: 100, lr: 5.41e-03, grad_scale: 64.0 2023-12-22 14:14:14,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=625026.6666666666, ans=0.1 2023-12-22 14:14:18,185 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=4.289e-02 2023-12-22 14:14:30,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=625093.3333333334, ans=0.0 2023-12-22 14:14:36,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=625160.0, ans=0.0 2023-12-22 14:14:54,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=625293.3333333334, ans=0.0 2023-12-22 14:15:01,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=625293.3333333334, ans=0.2 2023-12-22 14:15:02,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=625360.0, ans=0.0 2023-12-22 14:15:03,420 INFO [train.py:886] (1/4) Epoch 20, batch 3250, loss[loss=0.01484, audio_tagging_loss=0.01484, over 25000.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4943847.24 frames. ], batch size: 100, lr: 5.41e-03, grad_scale: 64.0 2023-12-22 14:15:14,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=625426.6666666666, ans=0.0 2023-12-22 14:15:22,634 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.16 vs. limit=10.0 2023-12-22 14:15:27,533 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.07 vs. limit=15.0 2023-12-22 14:15:35,536 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 14:15:40,986 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.609e+01 2.857e+01 2.981e+01 3.142e+01 3.421e+01, threshold=5.962e+01, percent-clipped=0.0 2023-12-22 14:15:43,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=625626.6666666666, ans=0.2 2023-12-22 14:15:52,999 INFO [train.py:886] (1/4) Epoch 20, batch 3300, loss[loss=0.0158, audio_tagging_loss=0.0158, over 25000.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4950505.41 frames. ], batch size: 100, lr: 5.41e-03, grad_scale: 64.0 2023-12-22 14:15:54,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=625693.3333333334, ans=0.1 2023-12-22 14:15:55,460 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.00 vs. limit=12.0 2023-12-22 14:15:58,234 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2023-12-22 14:16:02,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=625693.3333333334, ans=0.1 2023-12-22 14:16:32,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=625960.0, ans=0.125 2023-12-22 14:16:39,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=625960.0, ans=0.1 2023-12-22 14:16:43,895 INFO [train.py:886] (1/4) Epoch 20, batch 3350, loss[loss=0.01296, audio_tagging_loss=0.01296, over 25000.00 frames. ], tot_loss[loss=0.01352, audio_tagging_loss=0.01352, over 4956089.53 frames. ], batch size: 100, lr: 5.40e-03, grad_scale: 64.0 2023-12-22 14:16:45,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=626026.6666666666, ans=0.0 2023-12-22 14:16:46,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=626026.6666666666, ans=0.125 2023-12-22 14:16:57,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=626093.3333333334, ans=0.1 2023-12-22 14:17:12,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=626160.0, ans=0.125 2023-12-22 14:17:18,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=626226.6666666666, ans=0.0 2023-12-22 14:17:21,209 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.663e+01 2.948e+01 3.074e+01 3.157e+01 3.734e+01, threshold=6.147e+01, percent-clipped=0.0 2023-12-22 14:17:22,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=626293.3333333334, ans=0.2 2023-12-22 14:17:33,306 INFO [train.py:886] (1/4) Epoch 20, batch 3400, loss[loss=0.0134, audio_tagging_loss=0.0134, over 25000.00 frames. ], tot_loss[loss=0.01348, audio_tagging_loss=0.01348, over 4961284.02 frames. ], batch size: 100, lr: 5.40e-03, grad_scale: 64.0 2023-12-22 14:17:57,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=626493.3333333334, ans=0.2 2023-12-22 14:18:06,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=626560.0, ans=0.1 2023-12-22 14:18:17,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=626626.6666666666, ans=0.125 2023-12-22 14:18:18,771 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 14:18:20,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=626626.6666666666, ans=0.025 2023-12-22 14:18:21,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=626626.6666666666, ans=0.125 2023-12-22 14:18:25,373 INFO [train.py:886] (1/4) Epoch 20, batch 3450, loss[loss=0.01311, audio_tagging_loss=0.01311, over 24750.00 frames. ], tot_loss[loss=0.01365, audio_tagging_loss=0.01365, over 4955717.40 frames. ], batch size: 99, lr: 5.40e-03, grad_scale: 64.0 2023-12-22 14:18:25,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=626693.3333333334, ans=0.125 2023-12-22 14:18:37,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=626760.0, ans=0.125 2023-12-22 14:18:44,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.71 vs. limit=15.0 2023-12-22 14:18:58,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=626893.3333333334, ans=0.2 2023-12-22 14:19:04,159 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.632e+01 2.964e+01 3.104e+01 3.232e+01 3.716e+01, threshold=6.208e+01, percent-clipped=0.0 2023-12-22 14:19:06,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=626960.0, ans=0.1 2023-12-22 14:19:17,581 INFO [train.py:886] (1/4) Epoch 20, batch 3500, loss[loss=0.01494, audio_tagging_loss=0.01494, over 24750.00 frames. ], tot_loss[loss=0.01375, audio_tagging_loss=0.01375, over 4955691.02 frames. ], batch size: 99, lr: 5.40e-03, grad_scale: 64.0 2023-12-22 14:19:34,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=627093.3333333334, ans=0.0 2023-12-22 14:19:47,864 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.06 vs. limit=15.0 2023-12-22 14:19:56,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=627226.6666666666, ans=0.125 2023-12-22 14:19:58,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=627293.3333333334, ans=0.125 2023-12-22 14:20:08,207 INFO [train.py:886] (1/4) Epoch 20, batch 3550, loss[loss=0.01213, audio_tagging_loss=0.01213, over 24750.00 frames. ], tot_loss[loss=0.01362, audio_tagging_loss=0.01362, over 4953535.58 frames. ], batch size: 99, lr: 5.40e-03, grad_scale: 64.0 2023-12-22 14:20:08,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=627360.0, ans=0.0 2023-12-22 14:20:10,862 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.07 vs. limit=15.0 2023-12-22 14:20:21,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=627426.6666666666, ans=0.04949747468305833 2023-12-22 14:20:24,070 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.24 vs. limit=15.0 2023-12-22 14:20:43,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=627560.0, ans=0.125 2023-12-22 14:20:45,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=627560.0, ans=0.0 2023-12-22 14:20:49,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=627560.0, ans=0.125 2023-12-22 14:20:50,036 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.608e+01 2.881e+01 3.030e+01 3.174e+01 3.581e+01, threshold=6.061e+01, percent-clipped=0.0 2023-12-22 14:20:56,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=627626.6666666666, ans=0.125 2023-12-22 14:21:01,412 INFO [train.py:886] (1/4) Epoch 20, batch 3600, loss[loss=0.0133, audio_tagging_loss=0.0133, over 25000.00 frames. ], tot_loss[loss=0.01353, audio_tagging_loss=0.01353, over 4957192.07 frames. ], batch size: 100, lr: 5.40e-03, grad_scale: 64.0 2023-12-22 14:21:03,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=627693.3333333334, ans=0.0 2023-12-22 14:21:04,703 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.87 vs. limit=6.0 2023-12-22 14:21:14,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.69 vs. limit=15.0 2023-12-22 14:21:22,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=627826.6666666666, ans=0.0 2023-12-22 14:21:32,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=627893.3333333334, ans=0.125 2023-12-22 14:21:43,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=627960.0, ans=0.125 2023-12-22 14:21:47,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=627960.0, ans=0.125 2023-12-22 14:21:53,552 INFO [train.py:886] (1/4) Epoch 20, batch 3650, loss[loss=0.01312, audio_tagging_loss=0.01312, over 24907.00 frames. ], tot_loss[loss=0.01348, audio_tagging_loss=0.01348, over 4959850.74 frames. ], batch size: 100, lr: 5.40e-03, grad_scale: 64.0 2023-12-22 14:21:55,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=628026.6666666666, ans=0.09899494936611666 2023-12-22 14:21:57,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=628026.6666666666, ans=0.125 2023-12-22 14:21:58,570 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.41 vs. limit=15.0 2023-12-22 14:22:32,324 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.583e+01 2.914e+01 3.056e+01 3.191e+01 3.710e+01, threshold=6.111e+01, percent-clipped=0.0 2023-12-22 14:22:41,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=628293.3333333334, ans=0.2 2023-12-22 14:22:43,676 INFO [train.py:886] (1/4) Epoch 20, batch 3700, loss[loss=0.01397, audio_tagging_loss=0.01397, over 25000.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4964394.48 frames. ], batch size: 100, lr: 5.39e-03, grad_scale: 64.0 2023-12-22 14:22:51,048 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.32 vs. limit=15.0 2023-12-22 14:23:12,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=628493.3333333334, ans=0.2 2023-12-22 14:23:35,212 INFO [train.py:886] (1/4) Epoch 20, batch 3750, loss[loss=0.01375, audio_tagging_loss=0.01375, over 24750.00 frames. ], tot_loss[loss=0.0137, audio_tagging_loss=0.0137, over 4956318.97 frames. ], batch size: 99, lr: 5.39e-03, grad_scale: 64.0 2023-12-22 14:23:38,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=628693.3333333334, ans=0.0 2023-12-22 14:23:44,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=628760.0, ans=0.2 2023-12-22 14:23:52,875 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.33 vs. limit=12.0 2023-12-22 14:24:06,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=628893.3333333334, ans=0.0 2023-12-22 14:24:14,148 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.638e+01 2.921e+01 3.146e+01 3.299e+01 3.816e+01, threshold=6.291e+01, percent-clipped=0.0 2023-12-22 14:24:25,608 INFO [train.py:886] (1/4) Epoch 20, batch 3800, loss[loss=0.01204, audio_tagging_loss=0.01204, over 24750.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4945559.75 frames. ], batch size: 99, lr: 5.39e-03, grad_scale: 64.0 2023-12-22 14:24:34,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=629026.6666666666, ans=15.0 2023-12-22 14:24:45,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=629093.3333333334, ans=0.125 2023-12-22 14:25:12,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=629293.3333333334, ans=0.05 2023-12-22 14:25:18,296 INFO [train.py:886] (1/4) Epoch 20, batch 3850, loss[loss=0.014, audio_tagging_loss=0.014, over 24750.00 frames. ], tot_loss[loss=0.01378, audio_tagging_loss=0.01378, over 4943550.00 frames. ], batch size: 99, lr: 5.39e-03, grad_scale: 64.0 2023-12-22 14:25:28,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.14 vs. limit=22.5 2023-12-22 14:25:40,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=629493.3333333334, ans=0.125 2023-12-22 14:25:52,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=629560.0, ans=0.07 2023-12-22 14:25:57,052 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.593e+01 2.955e+01 3.052e+01 3.216e+01 3.711e+01, threshold=6.104e+01, percent-clipped=0.0 2023-12-22 14:26:04,088 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.77 vs. limit=22.5 2023-12-22 14:26:10,658 INFO [train.py:886] (1/4) Epoch 20, batch 3900, loss[loss=0.01039, audio_tagging_loss=0.01039, over 22596.00 frames. ], tot_loss[loss=0.01369, audio_tagging_loss=0.01369, over 4945044.90 frames. ], batch size: 107, lr: 5.39e-03, grad_scale: 64.0 2023-12-22 14:26:16,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=629693.3333333334, ans=0.1 2023-12-22 14:26:25,985 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.15 vs. limit=10.0 2023-12-22 14:26:41,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.02 vs. limit=15.0 2023-12-22 14:26:57,370 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.64 vs. limit=12.0 2023-12-22 14:26:58,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=629960.0, ans=0.0 2023-12-22 14:27:01,638 INFO [train.py:886] (1/4) Epoch 20, batch 3950, loss[loss=0.0155, audio_tagging_loss=0.0155, over 25000.00 frames. ], tot_loss[loss=0.01364, audio_tagging_loss=0.01364, over 4950386.73 frames. ], batch size: 100, lr: 5.39e-03, grad_scale: 64.0 2023-12-22 14:27:14,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=630093.3333333334, ans=0.125 2023-12-22 14:27:18,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=630093.3333333334, ans=0.0 2023-12-22 14:27:19,720 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.34 vs. limit=22.5 2023-12-22 14:27:21,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=630093.3333333334, ans=0.0 2023-12-22 14:27:23,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.65 vs. limit=15.0 2023-12-22 14:27:28,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=630160.0, ans=0.125 2023-12-22 14:27:31,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=630226.6666666666, ans=0.1 2023-12-22 14:27:35,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=630226.6666666666, ans=0.125 2023-12-22 14:27:40,563 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.540e+01 2.892e+01 3.043e+01 3.159e+01 3.802e+01, threshold=6.086e+01, percent-clipped=0.0 2023-12-22 14:27:53,428 INFO [train.py:886] (1/4) Epoch 20, batch 4000, loss[loss=0.01322, audio_tagging_loss=0.01322, over 25000.00 frames. ], tot_loss[loss=0.01366, audio_tagging_loss=0.01366, over 4955621.90 frames. ], batch size: 100, lr: 5.39e-03, grad_scale: 128.0 2023-12-22 14:28:31,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=630560.0, ans=0.125 2023-12-22 14:28:41,645 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.33 vs. limit=15.0 2023-12-22 14:28:44,054 INFO [train.py:886] (1/4) Epoch 20, batch 4050, loss[loss=0.01506, audio_tagging_loss=0.01506, over 24750.00 frames. ], tot_loss[loss=0.0137, audio_tagging_loss=0.0137, over 4958602.26 frames. ], batch size: 99, lr: 5.38e-03, grad_scale: 128.0 2023-12-22 14:28:45,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=630693.3333333334, ans=0.0 2023-12-22 14:28:46,158 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.04 vs. limit=15.0 2023-12-22 14:28:59,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=630760.0, ans=0.0 2023-12-22 14:29:06,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=630826.6666666666, ans=0.0 2023-12-22 14:29:24,773 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.684e+01 2.997e+01 3.121e+01 3.224e+01 3.703e+01, threshold=6.243e+01, percent-clipped=0.0 2023-12-22 14:29:27,968 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.42 vs. limit=15.0 2023-12-22 14:29:36,670 INFO [train.py:886] (1/4) Epoch 20, batch 4100, loss[loss=0.01271, audio_tagging_loss=0.01271, over 23999.00 frames. ], tot_loss[loss=0.01377, audio_tagging_loss=0.01377, over 4950095.77 frames. ], batch size: 100, lr: 5.38e-03, grad_scale: 64.0 2023-12-22 14:29:41,816 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.84 vs. limit=22.5 2023-12-22 14:29:42,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=631026.6666666666, ans=0.2 2023-12-22 14:29:45,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.76 vs. limit=10.0 2023-12-22 14:29:53,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=631093.3333333334, ans=0.0 2023-12-22 14:30:03,417 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.66 vs. limit=15.0 2023-12-22 14:30:10,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=631226.6666666666, ans=0.125 2023-12-22 14:30:20,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=631293.3333333334, ans=0.0 2023-12-22 14:30:28,236 INFO [train.py:886] (1/4) Epoch 20, batch 4150, loss[loss=0.01476, audio_tagging_loss=0.01476, over 25000.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4950219.19 frames. ], batch size: 100, lr: 5.38e-03, grad_scale: 64.0 2023-12-22 14:30:28,670 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.45 vs. limit=22.5 2023-12-22 14:30:36,032 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.27 vs. limit=15.0 2023-12-22 14:30:39,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=631426.6666666666, ans=0.125 2023-12-22 14:30:41,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=631426.6666666666, ans=0.0 2023-12-22 14:30:48,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=631493.3333333334, ans=0.1 2023-12-22 14:31:07,722 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.483e+01 2.915e+01 3.076e+01 3.208e+01 3.689e+01, threshold=6.152e+01, percent-clipped=0.0 2023-12-22 14:31:08,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=631626.6666666666, ans=0.0 2023-12-22 14:31:18,920 INFO [train.py:886] (1/4) Epoch 20, batch 4200, loss[loss=0.01636, audio_tagging_loss=0.01636, over 24750.00 frames. ], tot_loss[loss=0.01369, audio_tagging_loss=0.01369, over 4949425.50 frames. ], batch size: 99, lr: 5.38e-03, grad_scale: 64.0 2023-12-22 14:31:20,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=631693.3333333334, ans=0.1 2023-12-22 14:31:26,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=631693.3333333334, ans=0.0 2023-12-22 14:31:49,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=631893.3333333334, ans=0.0 2023-12-22 14:31:50,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=631893.3333333334, ans=0.0 2023-12-22 14:31:58,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=631893.3333333334, ans=0.0 2023-12-22 14:32:11,379 INFO [train.py:886] (1/4) Epoch 20, batch 4250, loss[loss=0.01517, audio_tagging_loss=0.01517, over 25000.00 frames. ], tot_loss[loss=0.01368, audio_tagging_loss=0.01368, over 4952495.86 frames. ], batch size: 100, lr: 5.38e-03, grad_scale: 64.0 2023-12-22 14:32:18,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=632026.6666666666, ans=0.125 2023-12-22 14:32:26,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=632093.3333333334, ans=0.2 2023-12-22 14:32:49,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=632226.6666666666, ans=0.1 2023-12-22 14:32:52,161 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.605e+01 2.864e+01 3.006e+01 3.128e+01 3.399e+01, threshold=6.011e+01, percent-clipped=0.0 2023-12-22 14:32:57,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=632293.3333333334, ans=0.0 2023-12-22 14:33:04,787 INFO [train.py:886] (1/4) Epoch 20, batch 4300, loss[loss=0.01385, audio_tagging_loss=0.01385, over 25000.00 frames. ], tot_loss[loss=0.01363, audio_tagging_loss=0.01363, over 4947147.09 frames. ], batch size: 100, lr: 5.38e-03, grad_scale: 64.0 2023-12-22 14:33:27,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=632493.3333333334, ans=0.0 2023-12-22 14:33:34,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=632560.0, ans=0.125 2023-12-22 14:33:41,525 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.57 vs. limit=15.0 2023-12-22 14:33:43,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=632560.0, ans=0.2 2023-12-22 14:33:48,618 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 14:33:54,946 INFO [train.py:886] (1/4) Epoch 20, batch 4350, loss[loss=0.01537, audio_tagging_loss=0.01537, over 25000.00 frames. ], tot_loss[loss=0.01365, audio_tagging_loss=0.01365, over 4953487.48 frames. ], batch size: 100, lr: 5.38e-03, grad_scale: 64.0 2023-12-22 14:33:56,435 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.39 vs. limit=12.0 2023-12-22 14:34:00,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.68 vs. limit=6.0 2023-12-22 14:34:11,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=632760.0, ans=0.0 2023-12-22 14:34:19,382 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.03 vs. limit=6.0 2023-12-22 14:34:35,356 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.596e+01 2.921e+01 3.074e+01 3.206e+01 3.879e+01, threshold=6.148e+01, percent-clipped=0.0 2023-12-22 14:34:47,189 INFO [train.py:886] (1/4) Epoch 20, batch 4400, loss[loss=0.01593, audio_tagging_loss=0.01593, over 24750.00 frames. ], tot_loss[loss=0.01369, audio_tagging_loss=0.01369, over 4946995.72 frames. ], batch size: 99, lr: 5.37e-03, grad_scale: 64.0 2023-12-22 14:34:47,808 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.71 vs. limit=22.5 2023-12-22 14:34:53,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=633026.6666666666, ans=0.125 2023-12-22 14:34:58,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=633093.3333333334, ans=0.125 2023-12-22 14:35:14,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=633160.0, ans=0.1 2023-12-22 14:35:17,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2023-12-22 14:35:20,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=633226.6666666666, ans=0.125 2023-12-22 14:35:22,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=633226.6666666666, ans=0.125 2023-12-22 14:35:38,544 INFO [train.py:886] (1/4) Epoch 20, batch 4450, loss[loss=0.01221, audio_tagging_loss=0.01221, over 25000.00 frames. ], tot_loss[loss=0.01372, audio_tagging_loss=0.01372, over 4940952.13 frames. ], batch size: 100, lr: 5.37e-03, grad_scale: 64.0 2023-12-22 14:35:38,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=633360.0, ans=0.0 2023-12-22 14:35:46,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.17 vs. limit=15.0 2023-12-22 14:35:47,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=633360.0, ans=15.0 2023-12-22 14:35:48,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=633360.0, ans=0.125 2023-12-22 14:35:50,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=633426.6666666666, ans=0.0 2023-12-22 14:35:59,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=633493.3333333334, ans=0.1 2023-12-22 14:36:02,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=633493.3333333334, ans=0.125 2023-12-22 14:36:07,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=633493.3333333334, ans=0.125 2023-12-22 14:36:20,616 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.566e+01 2.929e+01 3.037e+01 3.190e+01 3.598e+01, threshold=6.073e+01, percent-clipped=0.0 2023-12-22 14:36:27,616 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.71 vs. limit=15.0 2023-12-22 14:36:31,010 INFO [train.py:886] (1/4) Epoch 20, batch 4500, loss[loss=0.0122, audio_tagging_loss=0.0122, over 24750.00 frames. ], tot_loss[loss=0.01368, audio_tagging_loss=0.01368, over 4945418.09 frames. ], batch size: 99, lr: 5.37e-03, grad_scale: 64.0 2023-12-22 14:36:36,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=633693.3333333334, ans=0.125 2023-12-22 14:36:41,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=633760.0, ans=0.0 2023-12-22 14:36:59,926 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.673e-01 2023-12-22 14:37:07,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=633893.3333333334, ans=0.125 2023-12-22 14:37:17,844 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.66 vs. limit=15.0 2023-12-22 14:37:19,111 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 14:37:21,276 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2023-12-22 14:37:23,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=634026.6666666666, ans=0.0 2023-12-22 14:37:24,388 INFO [train.py:886] (1/4) Epoch 20, batch 4550, loss[loss=0.01426, audio_tagging_loss=0.01426, over 25000.00 frames. ], tot_loss[loss=0.01355, audio_tagging_loss=0.01355, over 4952875.48 frames. ], batch size: 100, lr: 5.37e-03, grad_scale: 64.0 2023-12-22 14:37:27,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=634026.6666666666, ans=0.2 2023-12-22 14:37:29,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=634026.6666666666, ans=0.1 2023-12-22 14:37:32,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=634026.6666666666, ans=0.125 2023-12-22 14:37:44,140 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.19 vs. limit=10.0 2023-12-22 14:37:46,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=634160.0, ans=0.0 2023-12-22 14:37:55,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=634226.6666666666, ans=0.0 2023-12-22 14:38:04,250 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.486e+01 2.875e+01 3.021e+01 3.198e+01 3.721e+01, threshold=6.043e+01, percent-clipped=0.0 2023-12-22 14:38:10,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=634293.3333333334, ans=10.0 2023-12-22 14:38:14,812 INFO [train.py:886] (1/4) Epoch 20, batch 4600, loss[loss=0.01185, audio_tagging_loss=0.01185, over 25000.00 frames. ], tot_loss[loss=0.01346, audio_tagging_loss=0.01346, over 4949778.53 frames. ], batch size: 100, lr: 5.37e-03, grad_scale: 64.0 2023-12-22 14:38:20,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=634360.0, ans=0.125 2023-12-22 14:38:29,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=634426.6666666666, ans=0.0 2023-12-22 14:38:38,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=634493.3333333334, ans=0.125 2023-12-22 14:38:48,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=634560.0, ans=0.125 2023-12-22 14:38:57,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=634626.6666666666, ans=0.125 2023-12-22 14:39:02,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=634626.6666666666, ans=0.2 2023-12-22 14:39:08,019 INFO [train.py:886] (1/4) Epoch 20, batch 4650, loss[loss=0.01418, audio_tagging_loss=0.01418, over 25000.00 frames. ], tot_loss[loss=0.01353, audio_tagging_loss=0.01353, over 4949393.67 frames. ], batch size: 100, lr: 5.37e-03, grad_scale: 64.0 2023-12-22 14:39:37,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=634826.6666666666, ans=0.0 2023-12-22 14:39:41,451 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.41 vs. limit=15.0 2023-12-22 14:39:47,496 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.597e+01 2.896e+01 3.053e+01 3.210e+01 3.598e+01, threshold=6.106e+01, percent-clipped=0.0 2023-12-22 14:39:50,013 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.50 vs. limit=15.0 2023-12-22 14:39:54,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=634960.0, ans=0.1 2023-12-22 14:39:57,585 INFO [train.py:886] (1/4) Epoch 20, batch 4700, loss[loss=0.0155, audio_tagging_loss=0.0155, over 24750.00 frames. ], tot_loss[loss=0.0137, audio_tagging_loss=0.0137, over 4948055.52 frames. ], batch size: 99, lr: 5.37e-03, grad_scale: 64.0 2023-12-22 14:40:01,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=635026.6666666666, ans=0.0 2023-12-22 14:40:22,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=635160.0, ans=15.0 2023-12-22 14:40:24,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=635160.0, ans=0.125 2023-12-22 14:40:26,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=635226.6666666666, ans=0.0 2023-12-22 14:40:37,909 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.85 vs. limit=10.0 2023-12-22 14:40:41,597 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.31 vs. limit=22.5 2023-12-22 14:40:42,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=635293.3333333334, ans=0.1 2023-12-22 14:40:45,763 INFO [train.py:886] (1/4) Epoch 20, batch 4750, loss[loss=0.01422, audio_tagging_loss=0.01422, over 24750.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4949053.75 frames. ], batch size: 99, lr: 5.36e-03, grad_scale: 64.0 2023-12-22 14:40:52,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=635360.0, ans=10.0 2023-12-22 14:41:19,713 INFO [train.py:886] (1/4) Epoch 21, batch 0, loss[loss=0.03156, audio_tagging_loss=0.03156, over 25000.00 frames. ], tot_loss[loss=0.03156, audio_tagging_loss=0.03156, over 25000.00 frames. ], batch size: 100, lr: 5.23e-03, grad_scale: 32.0 2023-12-22 14:41:19,713 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 14:41:40,671 INFO [train.py:917] (1/4) Epoch 21, validation: loss=0.03243, audio_tagging_loss=0.03243, over 3737520.00 frames. 2023-12-22 14:41:40,671 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 14:41:57,722 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.779e-03 2023-12-22 14:41:59,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=635600.0, ans=0.125 2023-12-22 14:42:04,965 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.625e+01 2.952e+01 3.172e+01 3.843e+01 8.854e+01, threshold=6.343e+01, percent-clipped=8.0 2023-12-22 14:42:09,307 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.61 vs. limit=6.0 2023-12-22 14:42:13,281 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.62 vs. limit=10.0 2023-12-22 14:42:22,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=635733.3333333334, ans=0.0 2023-12-22 14:42:31,013 INFO [train.py:886] (1/4) Epoch 21, batch 50, loss[loss=0.01681, audio_tagging_loss=0.01681, over 25000.00 frames. ], tot_loss[loss=0.02131, audio_tagging_loss=0.02131, over 1120287.43 frames. ], batch size: 100, lr: 5.23e-03, grad_scale: 32.0 2023-12-22 14:42:32,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=635800.0, ans=0.0 2023-12-22 14:42:38,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=635800.0, ans=0.125 2023-12-22 14:42:55,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=635933.3333333334, ans=0.125 2023-12-22 14:42:57,387 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 14:43:00,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=636000.0, ans=0.0 2023-12-22 14:43:21,771 INFO [train.py:886] (1/4) Epoch 21, batch 100, loss[loss=0.01571, audio_tagging_loss=0.01571, over 25000.00 frames. ], tot_loss[loss=0.01865, audio_tagging_loss=0.01865, over 1973432.89 frames. ], batch size: 100, lr: 5.23e-03, grad_scale: 32.0 2023-12-22 14:43:46,538 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.874e+01 3.292e+01 3.512e+01 3.782e+01 4.878e+01, threshold=7.024e+01, percent-clipped=0.0 2023-12-22 14:43:54,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=636333.3333333334, ans=0.125 2023-12-22 14:43:56,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=636333.3333333334, ans=0.125 2023-12-22 14:43:58,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=636333.3333333334, ans=0.125 2023-12-22 14:44:09,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=636400.0, ans=0.05 2023-12-22 14:44:13,138 INFO [train.py:886] (1/4) Epoch 21, batch 150, loss[loss=0.01675, audio_tagging_loss=0.01675, over 24750.00 frames. ], tot_loss[loss=0.01691, audio_tagging_loss=0.01691, over 2638866.25 frames. ], batch size: 99, lr: 5.23e-03, grad_scale: 32.0 2023-12-22 14:44:15,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=636466.6666666666, ans=0.125 2023-12-22 14:44:32,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=636600.0, ans=0.1 2023-12-22 14:44:33,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=636600.0, ans=0.0 2023-12-22 14:44:36,931 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.75 vs. limit=15.0 2023-12-22 14:44:49,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=636666.6666666666, ans=0.0 2023-12-22 14:44:51,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=636666.6666666666, ans=0.0 2023-12-22 14:44:53,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=636733.3333333334, ans=0.0 2023-12-22 14:44:54,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=636733.3333333334, ans=0.125 2023-12-22 14:45:03,390 INFO [train.py:886] (1/4) Epoch 21, batch 200, loss[loss=0.01332, audio_tagging_loss=0.01332, over 25000.00 frames. ], tot_loss[loss=0.01591, audio_tagging_loss=0.01591, over 3153864.08 frames. ], batch size: 100, lr: 5.23e-03, grad_scale: 32.0 2023-12-22 14:45:12,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=636800.0, ans=0.0 2023-12-22 14:45:25,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=636933.3333333334, ans=0.125 2023-12-22 14:45:28,986 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.601e+01 2.981e+01 3.099e+01 3.247e+01 3.721e+01, threshold=6.198e+01, percent-clipped=0.0 2023-12-22 14:45:40,935 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.93 vs. limit=15.0 2023-12-22 14:45:51,737 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 14:45:51,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=637066.6666666666, ans=0.1 2023-12-22 14:45:56,400 INFO [train.py:886] (1/4) Epoch 21, batch 250, loss[loss=0.01252, audio_tagging_loss=0.01252, over 25000.00 frames. ], tot_loss[loss=0.01511, audio_tagging_loss=0.01511, over 3551331.05 frames. ], batch size: 100, lr: 5.23e-03, grad_scale: 32.0 2023-12-22 14:46:07,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=637200.0, ans=0.5 2023-12-22 14:46:16,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=637266.6666666666, ans=0.125 2023-12-22 14:46:19,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=637266.6666666666, ans=0.125 2023-12-22 14:46:22,912 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.53 vs. limit=6.0 2023-12-22 14:46:48,395 INFO [train.py:886] (1/4) Epoch 21, batch 300, loss[loss=0.009607, audio_tagging_loss=0.009607, over 24750.00 frames. ], tot_loss[loss=0.01473, audio_tagging_loss=0.01473, over 3855692.16 frames. ], batch size: 99, lr: 5.22e-03, grad_scale: 32.0 2023-12-22 14:46:48,930 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2023-12-22 14:47:04,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=637533.3333333334, ans=0.015 2023-12-22 14:47:12,446 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.578e+01 2.934e+01 3.059e+01 3.180e+01 3.932e+01, threshold=6.118e+01, percent-clipped=0.0 2023-12-22 14:47:12,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=637600.0, ans=0.125 2023-12-22 14:47:15,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=637600.0, ans=0.0 2023-12-22 14:47:20,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=637666.6666666666, ans=0.125 2023-12-22 14:47:21,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=637666.6666666666, ans=0.125 2023-12-22 14:47:34,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=637733.3333333334, ans=0.0 2023-12-22 14:47:39,923 INFO [train.py:886] (1/4) Epoch 21, batch 350, loss[loss=0.0163, audio_tagging_loss=0.0163, over 24750.00 frames. ], tot_loss[loss=0.01463, audio_tagging_loss=0.01463, over 4098776.60 frames. ], batch size: 99, lr: 5.22e-03, grad_scale: 32.0 2023-12-22 14:47:43,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=637800.0, ans=0.09899494936611666 2023-12-22 14:48:08,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.52 vs. limit=22.5 2023-12-22 14:48:15,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=638000.0, ans=0.2 2023-12-22 14:48:16,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=638000.0, ans=0.125 2023-12-22 14:48:17,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=638000.0, ans=0.125 2023-12-22 14:48:23,513 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.72 vs. limit=15.0 2023-12-22 14:48:31,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=638133.3333333334, ans=0.0 2023-12-22 14:48:32,062 INFO [train.py:886] (1/4) Epoch 21, batch 400, loss[loss=0.01562, audio_tagging_loss=0.01562, over 24750.00 frames. ], tot_loss[loss=0.01418, audio_tagging_loss=0.01418, over 4288067.32 frames. ], batch size: 99, lr: 5.22e-03, grad_scale: 32.0 2023-12-22 14:48:33,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=638133.3333333334, ans=0.1 2023-12-22 14:48:38,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=638133.3333333334, ans=0.0 2023-12-22 14:48:42,665 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.63 vs. limit=15.0 2023-12-22 14:48:50,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=638200.0, ans=0.125 2023-12-22 14:48:50,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=638200.0, ans=0.125 2023-12-22 14:48:50,367 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.45 vs. limit=15.0 2023-12-22 14:48:56,996 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.454e+01 2.877e+01 2.996e+01 3.140e+01 3.707e+01, threshold=5.991e+01, percent-clipped=0.0 2023-12-22 14:49:01,010 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.65 vs. limit=15.0 2023-12-22 14:49:07,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=638333.3333333334, ans=0.125 2023-12-22 14:49:15,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=638400.0, ans=0.125 2023-12-22 14:49:21,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=638400.0, ans=0.1 2023-12-22 14:49:23,556 INFO [train.py:886] (1/4) Epoch 21, batch 450, loss[loss=0.01319, audio_tagging_loss=0.01319, over 25000.00 frames. ], tot_loss[loss=0.01396, audio_tagging_loss=0.01396, over 4436732.48 frames. ], batch size: 100, lr: 5.22e-03, grad_scale: 32.0 2023-12-22 14:49:38,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=638533.3333333334, ans=0.125 2023-12-22 14:49:53,939 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.09 vs. limit=12.0 2023-12-22 14:50:13,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=638733.3333333334, ans=0.125 2023-12-22 14:50:16,422 INFO [train.py:886] (1/4) Epoch 21, batch 500, loss[loss=0.01181, audio_tagging_loss=0.01181, over 25000.00 frames. ], tot_loss[loss=0.01378, audio_tagging_loss=0.01378, over 4554507.78 frames. ], batch size: 100, lr: 5.22e-03, grad_scale: 32.0 2023-12-22 14:50:32,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=638866.6666666666, ans=0.025 2023-12-22 14:50:41,263 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.599e+01 2.918e+01 3.045e+01 3.140e+01 3.683e+01, threshold=6.089e+01, percent-clipped=0.0 2023-12-22 14:51:07,966 INFO [train.py:886] (1/4) Epoch 21, batch 550, loss[loss=0.01266, audio_tagging_loss=0.01266, over 25000.00 frames. ], tot_loss[loss=0.01381, audio_tagging_loss=0.01381, over 4644497.84 frames. ], batch size: 100, lr: 5.22e-03, grad_scale: 32.0 2023-12-22 14:51:08,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=639133.3333333334, ans=0.125 2023-12-22 14:51:12,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=639133.3333333334, ans=0.125 2023-12-22 14:51:17,620 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.96 vs. limit=15.0 2023-12-22 14:51:18,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=639200.0, ans=0.125 2023-12-22 14:51:20,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=639200.0, ans=0.125 2023-12-22 14:51:24,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=639200.0, ans=0.1 2023-12-22 14:51:45,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=639333.3333333334, ans=0.1 2023-12-22 14:51:49,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=639400.0, ans=0.125 2023-12-22 14:51:51,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=639400.0, ans=0.0 2023-12-22 14:51:59,315 INFO [train.py:886] (1/4) Epoch 21, batch 600, loss[loss=0.01423, audio_tagging_loss=0.01423, over 24750.00 frames. ], tot_loss[loss=0.01392, audio_tagging_loss=0.01392, over 4713303.76 frames. ], batch size: 99, lr: 5.22e-03, grad_scale: 32.0 2023-12-22 14:52:15,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=639533.3333333334, ans=0.0 2023-12-22 14:52:24,050 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.726e+01 2.947e+01 3.052e+01 3.171e+01 3.659e+01, threshold=6.104e+01, percent-clipped=0.0 2023-12-22 14:52:49,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=639733.3333333334, ans=0.1 2023-12-22 14:52:51,554 INFO [train.py:886] (1/4) Epoch 21, batch 650, loss[loss=0.0132, audio_tagging_loss=0.0132, over 24750.00 frames. ], tot_loss[loss=0.01397, audio_tagging_loss=0.01397, over 4767165.97 frames. ], batch size: 99, lr: 5.22e-03, grad_scale: 32.0 2023-12-22 14:52:55,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=639800.0, ans=0.2 2023-12-22 14:52:56,010 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2023-12-22 14:53:02,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=639866.6666666666, ans=0.1 2023-12-22 14:53:05,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=639866.6666666666, ans=0.1 2023-12-22 14:53:29,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=640000.0, ans=0.2 2023-12-22 14:53:35,505 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.48 vs. limit=22.5 2023-12-22 14:53:42,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=640066.6666666666, ans=0.0 2023-12-22 14:53:45,861 INFO [train.py:886] (1/4) Epoch 21, batch 700, loss[loss=0.01479, audio_tagging_loss=0.01479, over 23989.00 frames. ], tot_loss[loss=0.01387, audio_tagging_loss=0.01387, over 4806839.62 frames. ], batch size: 100, lr: 5.21e-03, grad_scale: 32.0 2023-12-22 14:53:47,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=640133.3333333334, ans=0.125 2023-12-22 14:53:55,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=640200.0, ans=0.0 2023-12-22 14:54:09,893 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.593e+01 2.917e+01 3.103e+01 3.196e+01 3.559e+01, threshold=6.207e+01, percent-clipped=0.0 2023-12-22 14:54:37,377 INFO [train.py:886] (1/4) Epoch 21, batch 750, loss[loss=0.01372, audio_tagging_loss=0.01372, over 25000.00 frames. ], tot_loss[loss=0.01384, audio_tagging_loss=0.01384, over 4840148.60 frames. ], batch size: 100, lr: 5.21e-03, grad_scale: 32.0 2023-12-22 14:54:40,706 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.49 vs. limit=15.0 2023-12-22 14:55:06,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=640600.0, ans=0.2 2023-12-22 14:55:20,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=640733.3333333334, ans=0.125 2023-12-22 14:55:30,040 INFO [train.py:886] (1/4) Epoch 21, batch 800, loss[loss=0.009505, audio_tagging_loss=0.009505, over 24004.00 frames. ], tot_loss[loss=0.0137, audio_tagging_loss=0.0137, over 4866725.12 frames. ], batch size: 100, lr: 5.21e-03, grad_scale: 32.0 2023-12-22 14:55:36,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=640800.0, ans=0.0 2023-12-22 14:55:55,172 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.577e+01 2.887e+01 3.023e+01 3.199e+01 3.565e+01, threshold=6.047e+01, percent-clipped=0.0 2023-12-22 14:56:04,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=641000.0, ans=0.0 2023-12-22 14:56:06,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=641000.0, ans=0.125 2023-12-22 14:56:21,501 INFO [train.py:886] (1/4) Epoch 21, batch 850, loss[loss=0.01652, audio_tagging_loss=0.01652, over 25000.00 frames. ], tot_loss[loss=0.01361, audio_tagging_loss=0.01361, over 4887142.69 frames. ], batch size: 100, lr: 5.21e-03, grad_scale: 32.0 2023-12-22 14:56:22,855 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=15.0 2023-12-22 14:56:49,664 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.88 vs. limit=6.0 2023-12-22 14:56:59,648 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=15.0 2023-12-22 14:57:02,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=641333.3333333334, ans=0.0 2023-12-22 14:57:06,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=641400.0, ans=0.0 2023-12-22 14:57:13,897 INFO [train.py:886] (1/4) Epoch 21, batch 900, loss[loss=0.01175, audio_tagging_loss=0.01175, over 24750.00 frames. ], tot_loss[loss=0.01368, audio_tagging_loss=0.01368, over 4907091.53 frames. ], batch size: 99, lr: 5.21e-03, grad_scale: 32.0 2023-12-22 14:57:33,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=641600.0, ans=10.0 2023-12-22 14:57:35,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=641600.0, ans=0.04949747468305833 2023-12-22 14:57:38,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=641600.0, ans=0.1 2023-12-22 14:57:39,214 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.708e+01 2.906e+01 3.042e+01 3.221e+01 3.641e+01, threshold=6.083e+01, percent-clipped=0.0 2023-12-22 14:57:42,693 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.48 vs. limit=22.5 2023-12-22 14:57:45,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=641666.6666666666, ans=0.1 2023-12-22 14:57:47,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=641666.6666666666, ans=0.025 2023-12-22 14:57:47,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=641666.6666666666, ans=0.125 2023-12-22 14:57:48,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=641666.6666666666, ans=0.125 2023-12-22 14:57:58,207 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.022e-03 2023-12-22 14:58:06,027 INFO [train.py:886] (1/4) Epoch 21, batch 950, loss[loss=0.01297, audio_tagging_loss=0.01297, over 24750.00 frames. ], tot_loss[loss=0.01369, audio_tagging_loss=0.01369, over 4909414.59 frames. ], batch size: 99, lr: 5.21e-03, grad_scale: 32.0 2023-12-22 14:58:15,915 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.60 vs. limit=15.0 2023-12-22 14:58:25,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.95 vs. limit=22.5 2023-12-22 14:58:26,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=641933.3333333334, ans=0.125 2023-12-22 14:58:56,670 INFO [train.py:886] (1/4) Epoch 21, batch 1000, loss[loss=0.01356, audio_tagging_loss=0.01356, over 25000.00 frames. ], tot_loss[loss=0.01376, audio_tagging_loss=0.01376, over 4913209.89 frames. ], batch size: 100, lr: 5.21e-03, grad_scale: 32.0 2023-12-22 14:59:13,560 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.67 vs. limit=10.0 2023-12-22 14:59:17,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=642266.6666666666, ans=0.1 2023-12-22 14:59:21,804 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.543e+01 2.900e+01 3.063e+01 3.236e+01 3.644e+01, threshold=6.126e+01, percent-clipped=0.0 2023-12-22 14:59:22,219 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.72 vs. limit=15.0 2023-12-22 14:59:48,511 INFO [train.py:886] (1/4) Epoch 21, batch 1050, loss[loss=0.01308, audio_tagging_loss=0.01308, over 25000.00 frames. ], tot_loss[loss=0.01368, audio_tagging_loss=0.01368, over 4920805.33 frames. ], batch size: 100, lr: 5.20e-03, grad_scale: 32.0 2023-12-22 14:59:54,565 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.71 vs. limit=12.0 2023-12-22 15:00:00,242 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.44 vs. limit=15.0 2023-12-22 15:00:03,462 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.75 vs. limit=22.5 2023-12-22 15:00:15,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=642600.0, ans=0.125 2023-12-22 15:00:40,273 INFO [train.py:886] (1/4) Epoch 21, batch 1100, loss[loss=0.015, audio_tagging_loss=0.015, over 25000.00 frames. ], tot_loss[loss=0.01368, audio_tagging_loss=0.01368, over 4922323.15 frames. ], batch size: 100, lr: 5.20e-03, grad_scale: 32.0 2023-12-22 15:00:44,473 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.44 vs. limit=15.0 2023-12-22 15:00:53,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=642866.6666666666, ans=0.1 2023-12-22 15:01:04,309 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.630e+01 2.897e+01 3.078e+01 3.244e+01 5.460e+01, threshold=6.156e+01, percent-clipped=0.0 2023-12-22 15:01:04,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.33 vs. limit=15.0 2023-12-22 15:01:08,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=642933.3333333334, ans=0.1 2023-12-22 15:01:26,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=643066.6666666666, ans=0.1 2023-12-22 15:01:27,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=643066.6666666666, ans=0.125 2023-12-22 15:01:30,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=643066.6666666666, ans=0.0 2023-12-22 15:01:31,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=643133.3333333334, ans=0.125 2023-12-22 15:01:32,010 INFO [train.py:886] (1/4) Epoch 21, batch 1150, loss[loss=0.01313, audio_tagging_loss=0.01313, over 25000.00 frames. ], tot_loss[loss=0.01361, audio_tagging_loss=0.01361, over 4931824.00 frames. ], batch size: 100, lr: 5.20e-03, grad_scale: 32.0 2023-12-22 15:01:37,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=643133.3333333334, ans=0.125 2023-12-22 15:01:43,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=643200.0, ans=15.0 2023-12-22 15:01:47,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=643200.0, ans=0.0 2023-12-22 15:01:55,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=643266.6666666666, ans=0.125 2023-12-22 15:01:55,497 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.32 vs. limit=15.0 2023-12-22 15:02:16,735 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:02:21,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=643400.0, ans=0.125 2023-12-22 15:02:23,715 INFO [train.py:886] (1/4) Epoch 21, batch 1200, loss[loss=0.01365, audio_tagging_loss=0.01365, over 22791.00 frames. ], tot_loss[loss=0.01373, audio_tagging_loss=0.01373, over 4931949.48 frames. ], batch size: 107, lr: 5.20e-03, grad_scale: 32.0 2023-12-22 15:02:26,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=643466.6666666666, ans=0.125 2023-12-22 15:02:36,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=643533.3333333334, ans=0.0 2023-12-22 15:02:39,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=643533.3333333334, ans=0.1 2023-12-22 15:02:47,768 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.628e+01 2.913e+01 3.055e+01 3.244e+01 3.742e+01, threshold=6.111e+01, percent-clipped=0.0 2023-12-22 15:03:12,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=643733.3333333334, ans=0.1 2023-12-22 15:03:14,001 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.08 vs. limit=15.0 2023-12-22 15:03:14,490 INFO [train.py:886] (1/4) Epoch 21, batch 1250, loss[loss=0.01502, audio_tagging_loss=0.01502, over 24750.00 frames. ], tot_loss[loss=0.01383, audio_tagging_loss=0.01383, over 4931045.44 frames. ], batch size: 99, lr: 5.20e-03, grad_scale: 32.0 2023-12-22 15:03:20,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=643800.0, ans=0.125 2023-12-22 15:03:23,800 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.69 vs. limit=15.0 2023-12-22 15:03:26,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=643866.6666666666, ans=0.125 2023-12-22 15:03:33,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=643866.6666666666, ans=0.125 2023-12-22 15:03:38,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=643933.3333333334, ans=0.0 2023-12-22 15:03:42,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=643933.3333333334, ans=0.2 2023-12-22 15:03:44,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=644000.0, ans=0.1 2023-12-22 15:03:46,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=644000.0, ans=0.2 2023-12-22 15:03:51,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=644000.0, ans=0.125 2023-12-22 15:04:03,148 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.72 vs. limit=10.0 2023-12-22 15:04:06,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=644133.3333333334, ans=0.1 2023-12-22 15:04:07,501 INFO [train.py:886] (1/4) Epoch 21, batch 1300, loss[loss=0.01502, audio_tagging_loss=0.01502, over 24750.00 frames. ], tot_loss[loss=0.01395, audio_tagging_loss=0.01395, over 4934974.74 frames. ], batch size: 99, lr: 5.20e-03, grad_scale: 32.0 2023-12-22 15:04:07,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=644133.3333333334, ans=0.125 2023-12-22 15:04:33,042 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.559e+01 2.961e+01 3.123e+01 3.298e+01 3.701e+01, threshold=6.246e+01, percent-clipped=0.0 2023-12-22 15:04:42,269 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.26 vs. limit=15.0 2023-12-22 15:04:52,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=644400.0, ans=0.2 2023-12-22 15:04:59,029 INFO [train.py:886] (1/4) Epoch 21, batch 1350, loss[loss=0.01323, audio_tagging_loss=0.01323, over 24750.00 frames. ], tot_loss[loss=0.01384, audio_tagging_loss=0.01384, over 4935097.71 frames. ], batch size: 99, lr: 5.20e-03, grad_scale: 32.0 2023-12-22 15:05:24,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=644600.0, ans=0.0 2023-12-22 15:05:24,616 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.73 vs. limit=15.0 2023-12-22 15:05:32,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=644666.6666666666, ans=0.05 2023-12-22 15:05:44,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=644733.3333333334, ans=0.09899494936611666 2023-12-22 15:05:45,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=644733.3333333334, ans=0.125 2023-12-22 15:05:48,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=644733.3333333334, ans=0.2 2023-12-22 15:05:50,493 INFO [train.py:886] (1/4) Epoch 21, batch 1400, loss[loss=0.01389, audio_tagging_loss=0.01389, over 25000.00 frames. ], tot_loss[loss=0.01365, audio_tagging_loss=0.01365, over 4942685.18 frames. ], batch size: 100, lr: 5.19e-03, grad_scale: 32.0 2023-12-22 15:06:07,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=644866.6666666666, ans=0.125 2023-12-22 15:06:15,742 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.531e+01 2.918e+01 3.034e+01 3.190e+01 3.697e+01, threshold=6.068e+01, percent-clipped=0.0 2023-12-22 15:06:43,069 INFO [train.py:886] (1/4) Epoch 21, batch 1450, loss[loss=0.01302, audio_tagging_loss=0.01302, over 25000.00 frames. ], tot_loss[loss=0.01359, audio_tagging_loss=0.01359, over 4948635.14 frames. ], batch size: 100, lr: 5.19e-03, grad_scale: 32.0 2023-12-22 15:07:00,471 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.28 vs. limit=15.0 2023-12-22 15:07:14,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=645333.3333333334, ans=0.125 2023-12-22 15:07:19,454 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.15 vs. limit=15.0 2023-12-22 15:07:30,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=645400.0, ans=0.125 2023-12-22 15:07:33,420 INFO [train.py:886] (1/4) Epoch 21, batch 1500, loss[loss=0.01469, audio_tagging_loss=0.01469, over 25000.00 frames. ], tot_loss[loss=0.01359, audio_tagging_loss=0.01359, over 4948934.53 frames. ], batch size: 100, lr: 5.19e-03, grad_scale: 32.0 2023-12-22 15:07:41,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=645466.6666666666, ans=0.025 2023-12-22 15:07:59,017 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.580e+01 2.868e+01 3.009e+01 3.172e+01 3.976e+01, threshold=6.018e+01, percent-clipped=0.0 2023-12-22 15:08:04,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=645666.6666666666, ans=0.125 2023-12-22 15:08:16,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=645733.3333333334, ans=0.125 2023-12-22 15:08:18,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=645733.3333333334, ans=0.0 2023-12-22 15:08:26,462 INFO [train.py:886] (1/4) Epoch 21, batch 1550, loss[loss=0.01259, audio_tagging_loss=0.01259, over 25000.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 4949449.56 frames. ], batch size: 100, lr: 5.19e-03, grad_scale: 32.0 2023-12-22 15:08:29,124 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=21.07 vs. limit=22.5 2023-12-22 15:08:38,479 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.40 vs. limit=12.0 2023-12-22 15:08:43,772 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.16 vs. limit=15.0 2023-12-22 15:08:45,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=645866.6666666666, ans=0.125 2023-12-22 15:08:58,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=646000.0, ans=0.0 2023-12-22 15:09:06,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=646066.6666666666, ans=0.05 2023-12-22 15:09:07,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=646066.6666666666, ans=0.125 2023-12-22 15:09:16,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=646066.6666666666, ans=0.0 2023-12-22 15:09:18,966 INFO [train.py:886] (1/4) Epoch 21, batch 1600, loss[loss=0.01377, audio_tagging_loss=0.01377, over 25000.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 4945347.73 frames. ], batch size: 100, lr: 5.19e-03, grad_scale: 32.0 2023-12-22 15:09:42,938 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.609e+01 3.012e+01 3.134e+01 3.270e+01 4.139e+01, threshold=6.268e+01, percent-clipped=0.0 2023-12-22 15:10:01,594 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.49 vs. limit=22.5 2023-12-22 15:10:09,633 INFO [train.py:886] (1/4) Epoch 21, batch 1650, loss[loss=0.01145, audio_tagging_loss=0.01145, over 25000.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 4942264.29 frames. ], batch size: 100, lr: 5.19e-03, grad_scale: 32.0 2023-12-22 15:10:58,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=646733.3333333334, ans=0.125 2023-12-22 15:11:02,006 INFO [train.py:886] (1/4) Epoch 21, batch 1700, loss[loss=0.01121, audio_tagging_loss=0.01121, over 25000.00 frames. ], tot_loss[loss=0.01355, audio_tagging_loss=0.01355, over 4942977.59 frames. ], batch size: 100, lr: 5.19e-03, grad_scale: 32.0 2023-12-22 15:11:03,490 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=15.0 2023-12-22 15:11:17,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=646866.6666666666, ans=0.125 2023-12-22 15:11:23,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=646933.3333333334, ans=0.125 2023-12-22 15:11:25,493 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.53 vs. limit=15.0 2023-12-22 15:11:27,469 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.501e+01 2.927e+01 3.026e+01 3.153e+01 3.833e+01, threshold=6.051e+01, percent-clipped=0.0 2023-12-22 15:11:52,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=647066.6666666666, ans=0.0 2023-12-22 15:11:54,512 INFO [train.py:886] (1/4) Epoch 21, batch 1750, loss[loss=0.01291, audio_tagging_loss=0.01291, over 25000.00 frames. ], tot_loss[loss=0.01349, audio_tagging_loss=0.01349, over 4952402.87 frames. ], batch size: 100, lr: 5.19e-03, grad_scale: 32.0 2023-12-22 15:12:06,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=647200.0, ans=0.125 2023-12-22 15:12:11,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=647200.0, ans=0.125 2023-12-22 15:12:20,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=647266.6666666666, ans=0.125 2023-12-22 15:12:28,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=647333.3333333334, ans=0.125 2023-12-22 15:12:42,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=647400.0, ans=0.0 2023-12-22 15:12:46,355 INFO [train.py:886] (1/4) Epoch 21, batch 1800, loss[loss=0.01282, audio_tagging_loss=0.01282, over 25000.00 frames. ], tot_loss[loss=0.01341, audio_tagging_loss=0.01341, over 4958295.62 frames. ], batch size: 100, lr: 5.18e-03, grad_scale: 32.0 2023-12-22 15:12:55,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=647533.3333333334, ans=0.125 2023-12-22 15:12:57,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=647533.3333333334, ans=0.125 2023-12-22 15:13:03,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=647533.3333333334, ans=0.0 2023-12-22 15:13:11,089 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.641e+01 2.946e+01 3.053e+01 3.147e+01 3.589e+01, threshold=6.107e+01, percent-clipped=0.0 2023-12-22 15:13:27,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=647733.3333333334, ans=0.125 2023-12-22 15:13:33,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=647733.3333333334, ans=0.0 2023-12-22 15:13:38,500 INFO [train.py:886] (1/4) Epoch 21, batch 1850, loss[loss=0.01048, audio_tagging_loss=0.01048, over 24075.00 frames. ], tot_loss[loss=0.0135, audio_tagging_loss=0.0135, over 4957482.28 frames. ], batch size: 100, lr: 5.18e-03, grad_scale: 32.0 2023-12-22 15:13:45,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=647800.0, ans=0.0 2023-12-22 15:13:50,550 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.72 vs. limit=15.0 2023-12-22 15:13:50,559 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.62 vs. limit=6.0 2023-12-22 15:13:51,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=647866.6666666666, ans=0.0 2023-12-22 15:13:58,741 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.85 vs. limit=15.0 2023-12-22 15:14:23,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=648066.6666666666, ans=0.125 2023-12-22 15:14:29,466 INFO [train.py:886] (1/4) Epoch 21, batch 1900, loss[loss=0.01293, audio_tagging_loss=0.01293, over 24750.00 frames. ], tot_loss[loss=0.01362, audio_tagging_loss=0.01362, over 4955338.76 frames. ], batch size: 99, lr: 5.18e-03, grad_scale: 32.0 2023-12-22 15:14:54,205 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.641e+01 2.961e+01 3.101e+01 3.299e+01 3.976e+01, threshold=6.202e+01, percent-clipped=0.0 2023-12-22 15:14:59,325 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.97 vs. limit=22.5 2023-12-22 15:15:21,530 INFO [train.py:886] (1/4) Epoch 21, batch 1950, loss[loss=0.01286, audio_tagging_loss=0.01286, over 24750.00 frames. ], tot_loss[loss=0.0136, audio_tagging_loss=0.0136, over 4947344.47 frames. ], batch size: 99, lr: 5.18e-03, grad_scale: 32.0 2023-12-22 15:16:04,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.07 vs. limit=15.0 2023-12-22 15:16:13,184 INFO [train.py:886] (1/4) Epoch 21, batch 2000, loss[loss=0.01043, audio_tagging_loss=0.01043, over 24007.00 frames. ], tot_loss[loss=0.01355, audio_tagging_loss=0.01355, over 4949132.67 frames. ], batch size: 100, lr: 5.18e-03, grad_scale: 64.0 2023-12-22 15:16:26,751 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.53 vs. limit=22.5 2023-12-22 15:16:29,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=648866.6666666666, ans=0.125 2023-12-22 15:16:37,594 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.627e+01 2.920e+01 3.050e+01 3.228e+01 3.734e+01, threshold=6.100e+01, percent-clipped=0.0 2023-12-22 15:16:49,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=649000.0, ans=0.2 2023-12-22 15:17:03,714 INFO [train.py:886] (1/4) Epoch 21, batch 2050, loss[loss=0.01419, audio_tagging_loss=0.01419, over 25000.00 frames. ], tot_loss[loss=0.01347, audio_tagging_loss=0.01347, over 4951968.63 frames. ], batch size: 100, lr: 5.18e-03, grad_scale: 64.0 2023-12-22 15:17:08,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=649133.3333333334, ans=0.125 2023-12-22 15:17:22,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=649200.0, ans=0.125 2023-12-22 15:17:26,647 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.558e-03 2023-12-22 15:17:32,486 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=12.0 2023-12-22 15:17:35,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=649333.3333333334, ans=0.125 2023-12-22 15:17:42,750 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.81 vs. limit=12.0 2023-12-22 15:17:56,874 INFO [train.py:886] (1/4) Epoch 21, batch 2100, loss[loss=0.01392, audio_tagging_loss=0.01392, over 25000.00 frames. ], tot_loss[loss=0.01344, audio_tagging_loss=0.01344, over 4950436.75 frames. ], batch size: 100, lr: 5.18e-03, grad_scale: 64.0 2023-12-22 15:17:59,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=649466.6666666666, ans=0.125 2023-12-22 15:18:04,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=649466.6666666666, ans=0.125 2023-12-22 15:18:21,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=649600.0, ans=0.0 2023-12-22 15:18:21,962 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.454e+01 2.928e+01 3.055e+01 3.207e+01 3.778e+01, threshold=6.110e+01, percent-clipped=0.0 2023-12-22 15:18:29,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=649666.6666666666, ans=0.125 2023-12-22 15:18:32,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=649666.6666666666, ans=0.0 2023-12-22 15:18:44,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=649733.3333333334, ans=0.07 2023-12-22 15:18:47,494 INFO [train.py:886] (1/4) Epoch 21, batch 2150, loss[loss=0.01211, audio_tagging_loss=0.01211, over 24750.00 frames. ], tot_loss[loss=0.01347, audio_tagging_loss=0.01347, over 4956960.55 frames. ], batch size: 99, lr: 5.17e-03, grad_scale: 64.0 2023-12-22 15:19:03,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=649866.6666666666, ans=0.0 2023-12-22 15:19:13,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=649933.3333333334, ans=0.125 2023-12-22 15:19:24,644 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.64 vs. limit=6.0 2023-12-22 15:19:32,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.89 vs. limit=15.0 2023-12-22 15:19:35,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=650066.6666666666, ans=0.125 2023-12-22 15:19:38,976 INFO [train.py:886] (1/4) Epoch 21, batch 2200, loss[loss=0.0154, audio_tagging_loss=0.0154, over 24750.00 frames. ], tot_loss[loss=0.01359, audio_tagging_loss=0.01359, over 4946690.69 frames. ], batch size: 99, lr: 5.17e-03, grad_scale: 64.0 2023-12-22 15:19:46,310 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.69 vs. limit=22.5 2023-12-22 15:19:58,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=650200.0, ans=0.125 2023-12-22 15:20:04,334 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.613e+01 2.983e+01 3.116e+01 3.285e+01 3.791e+01, threshold=6.231e+01, percent-clipped=0.0 2023-12-22 15:20:04,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=650266.6666666666, ans=0.1 2023-12-22 15:20:30,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.41 vs. limit=15.0 2023-12-22 15:20:30,694 INFO [train.py:886] (1/4) Epoch 21, batch 2250, loss[loss=0.01091, audio_tagging_loss=0.01091, over 24750.00 frames. ], tot_loss[loss=0.01359, audio_tagging_loss=0.01359, over 4945794.23 frames. ], batch size: 99, lr: 5.17e-03, grad_scale: 64.0 2023-12-22 15:20:35,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=650466.6666666666, ans=0.125 2023-12-22 15:20:37,313 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:20:44,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=650533.3333333334, ans=0.125 2023-12-22 15:20:51,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=650600.0, ans=0.0 2023-12-22 15:20:55,366 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.33 vs. limit=10.0 2023-12-22 15:21:06,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=650666.6666666666, ans=0.125 2023-12-22 15:21:07,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=650666.6666666666, ans=0.125 2023-12-22 15:21:22,074 INFO [train.py:886] (1/4) Epoch 21, batch 2300, loss[loss=0.01493, audio_tagging_loss=0.01493, over 24750.00 frames. ], tot_loss[loss=0.0136, audio_tagging_loss=0.0136, over 4943533.81 frames. ], batch size: 99, lr: 5.17e-03, grad_scale: 64.0 2023-12-22 15:21:25,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=650800.0, ans=0.04949747468305833 2023-12-22 15:21:35,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=650866.6666666666, ans=0.125 2023-12-22 15:21:37,845 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.46 vs. limit=15.0 2023-12-22 15:21:44,634 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2023-12-22 15:21:46,905 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.617e+01 2.891e+01 3.019e+01 3.134e+01 3.586e+01, threshold=6.037e+01, percent-clipped=0.0 2023-12-22 15:21:50,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=650933.3333333334, ans=0.125 2023-12-22 15:21:52,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=651000.0, ans=0.125 2023-12-22 15:21:57,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=651000.0, ans=0.0 2023-12-22 15:22:11,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=651066.6666666666, ans=0.0 2023-12-22 15:22:12,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=651066.6666666666, ans=0.125 2023-12-22 15:22:12,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=651066.6666666666, ans=0.0 2023-12-22 15:22:14,363 INFO [train.py:886] (1/4) Epoch 21, batch 2350, loss[loss=0.0137, audio_tagging_loss=0.0137, over 24750.00 frames. ], tot_loss[loss=0.01353, audio_tagging_loss=0.01353, over 4949454.42 frames. ], batch size: 99, lr: 5.17e-03, grad_scale: 64.0 2023-12-22 15:22:37,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=651266.6666666666, ans=0.125 2023-12-22 15:22:38,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=651266.6666666666, ans=0.0 2023-12-22 15:22:42,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=651266.6666666666, ans=0.0 2023-12-22 15:22:49,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=651333.3333333334, ans=0.0 2023-12-22 15:22:51,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=651333.3333333334, ans=0.125 2023-12-22 15:23:02,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=651400.0, ans=0.125 2023-12-22 15:23:05,390 INFO [train.py:886] (1/4) Epoch 21, batch 2400, loss[loss=0.01497, audio_tagging_loss=0.01497, over 25000.00 frames. ], tot_loss[loss=0.0135, audio_tagging_loss=0.0135, over 4955576.30 frames. ], batch size: 100, lr: 5.17e-03, grad_scale: 64.0 2023-12-22 15:23:15,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=651466.6666666666, ans=0.125 2023-12-22 15:23:23,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=651533.3333333334, ans=0.0 2023-12-22 15:23:25,732 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:23:27,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=651600.0, ans=0.1 2023-12-22 15:23:30,201 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.620e+01 2.897e+01 3.019e+01 3.181e+01 3.470e+01, threshold=6.039e+01, percent-clipped=0.0 2023-12-22 15:23:35,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=651666.6666666666, ans=0.0 2023-12-22 15:23:57,922 INFO [train.py:886] (1/4) Epoch 21, batch 2450, loss[loss=0.01632, audio_tagging_loss=0.01632, over 25000.00 frames. ], tot_loss[loss=0.01359, audio_tagging_loss=0.01359, over 4958221.15 frames. ], batch size: 100, lr: 5.17e-03, grad_scale: 64.0 2023-12-22 15:24:11,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=651866.6666666666, ans=0.0 2023-12-22 15:24:20,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=651933.3333333334, ans=0.0 2023-12-22 15:24:26,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=651933.3333333334, ans=0.04949747468305833 2023-12-22 15:24:27,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=651933.3333333334, ans=0.2 2023-12-22 15:24:28,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=652000.0, ans=0.125 2023-12-22 15:24:28,966 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.90 vs. limit=15.0 2023-12-22 15:24:39,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=652066.6666666666, ans=0.1 2023-12-22 15:24:50,606 INFO [train.py:886] (1/4) Epoch 21, batch 2500, loss[loss=0.01404, audio_tagging_loss=0.01404, over 24750.00 frames. ], tot_loss[loss=0.01376, audio_tagging_loss=0.01376, over 4955935.19 frames. ], batch size: 99, lr: 5.17e-03, grad_scale: 64.0 2023-12-22 15:24:58,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=652133.3333333334, ans=0.125 2023-12-22 15:25:07,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=652200.0, ans=0.0 2023-12-22 15:25:14,693 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.708e+01 3.022e+01 3.140e+01 3.250e+01 3.693e+01, threshold=6.280e+01, percent-clipped=0.0 2023-12-22 15:25:31,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=652400.0, ans=0.025 2023-12-22 15:25:31,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=652400.0, ans=0.125 2023-12-22 15:25:31,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=652400.0, ans=0.125 2023-12-22 15:25:35,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=652400.0, ans=0.0 2023-12-22 15:25:40,923 INFO [train.py:886] (1/4) Epoch 21, batch 2550, loss[loss=0.01624, audio_tagging_loss=0.01624, over 24750.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4953846.46 frames. ], batch size: 99, lr: 5.16e-03, grad_scale: 64.0 2023-12-22 15:25:58,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=652533.3333333334, ans=0.125 2023-12-22 15:26:02,306 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.06 vs. limit=15.0 2023-12-22 15:26:12,102 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.58 vs. limit=15.0 2023-12-22 15:26:14,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=652666.6666666666, ans=0.125 2023-12-22 15:26:29,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=652733.3333333334, ans=0.125 2023-12-22 15:26:33,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=652800.0, ans=0.09899494936611666 2023-12-22 15:26:34,214 INFO [train.py:886] (1/4) Epoch 21, batch 2600, loss[loss=0.01594, audio_tagging_loss=0.01594, over 25000.00 frames. ], tot_loss[loss=0.01378, audio_tagging_loss=0.01378, over 4949454.36 frames. ], batch size: 100, lr: 5.16e-03, grad_scale: 64.0 2023-12-22 15:26:42,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=652866.6666666666, ans=0.125 2023-12-22 15:26:49,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=652866.6666666666, ans=0.125 2023-12-22 15:26:58,943 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.515e+01 2.930e+01 3.065e+01 3.223e+01 3.938e+01, threshold=6.130e+01, percent-clipped=0.0 2023-12-22 15:27:10,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=653000.0, ans=0.04949747468305833 2023-12-22 15:27:18,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=653066.6666666666, ans=0.125 2023-12-22 15:27:26,043 INFO [train.py:886] (1/4) Epoch 21, batch 2650, loss[loss=0.01093, audio_tagging_loss=0.01093, over 25000.00 frames. ], tot_loss[loss=0.01362, audio_tagging_loss=0.01362, over 4946528.63 frames. ], batch size: 100, lr: 5.16e-03, grad_scale: 64.0 2023-12-22 15:27:32,811 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.62 vs. limit=22.5 2023-12-22 15:27:38,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=653200.0, ans=0.1 2023-12-22 15:28:05,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=653333.3333333334, ans=0.125 2023-12-22 15:28:09,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=653400.0, ans=0.0 2023-12-22 15:28:17,705 INFO [train.py:886] (1/4) Epoch 21, batch 2700, loss[loss=0.01271, audio_tagging_loss=0.01271, over 24004.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4948926.79 frames. ], batch size: 100, lr: 5.16e-03, grad_scale: 64.0 2023-12-22 15:28:26,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=653466.6666666666, ans=0.125 2023-12-22 15:28:30,900 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.21 vs. limit=15.0 2023-12-22 15:28:31,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=653533.3333333334, ans=0.1 2023-12-22 15:28:35,446 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.02 vs. limit=22.5 2023-12-22 15:28:43,280 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.395e+01 2.932e+01 3.064e+01 3.237e+01 3.661e+01, threshold=6.127e+01, percent-clipped=0.0 2023-12-22 15:28:53,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=653666.6666666666, ans=0.2 2023-12-22 15:28:57,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=653666.6666666666, ans=0.125 2023-12-22 15:28:57,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=653666.6666666666, ans=0.1 2023-12-22 15:29:10,339 INFO [train.py:886] (1/4) Epoch 21, batch 2750, loss[loss=0.0138, audio_tagging_loss=0.0138, over 25000.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4954039.04 frames. ], batch size: 100, lr: 5.16e-03, grad_scale: 64.0 2023-12-22 15:29:17,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.53 vs. limit=15.0 2023-12-22 15:29:27,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.46 vs. limit=12.0 2023-12-22 15:29:38,875 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.34 vs. limit=22.5 2023-12-22 15:29:54,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=654066.6666666666, ans=0.125 2023-12-22 15:29:57,385 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.24 vs. limit=15.0 2023-12-22 15:30:02,224 INFO [train.py:886] (1/4) Epoch 21, batch 2800, loss[loss=0.01201, audio_tagging_loss=0.01201, over 24030.00 frames. ], tot_loss[loss=0.01357, audio_tagging_loss=0.01357, over 4955138.92 frames. ], batch size: 100, lr: 5.16e-03, grad_scale: 64.0 2023-12-22 15:30:11,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=654133.3333333334, ans=0.0 2023-12-22 15:30:26,918 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.655e+01 2.987e+01 3.081e+01 3.261e+01 3.744e+01, threshold=6.161e+01, percent-clipped=0.0 2023-12-22 15:30:46,226 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2023-12-22 15:30:54,049 INFO [train.py:886] (1/4) Epoch 21, batch 2850, loss[loss=0.01348, audio_tagging_loss=0.01348, over 24750.00 frames. ], tot_loss[loss=0.0136, audio_tagging_loss=0.0136, over 4948870.24 frames. ], batch size: 99, lr: 5.16e-03, grad_scale: 64.0 2023-12-22 15:30:55,327 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.96 vs. limit=15.0 2023-12-22 15:31:05,868 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.47 vs. limit=6.0 2023-12-22 15:31:16,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=654600.0, ans=0.125 2023-12-22 15:31:21,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=654600.0, ans=0.0 2023-12-22 15:31:28,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=654666.6666666666, ans=0.0 2023-12-22 15:31:32,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=654666.6666666666, ans=0.0 2023-12-22 15:31:37,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=654733.3333333334, ans=0.0 2023-12-22 15:31:46,810 INFO [train.py:886] (1/4) Epoch 21, batch 2900, loss[loss=0.01189, audio_tagging_loss=0.01189, over 24091.00 frames. ], tot_loss[loss=0.01361, audio_tagging_loss=0.01361, over 4946060.92 frames. ], batch size: 100, lr: 5.16e-03, grad_scale: 64.0 2023-12-22 15:31:53,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=654800.0, ans=0.125 2023-12-22 15:31:54,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=654800.0, ans=0.1 2023-12-22 15:32:02,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=654866.6666666666, ans=0.125 2023-12-22 15:32:11,069 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.657e+01 2.895e+01 3.036e+01 3.201e+01 4.104e+01, threshold=6.072e+01, percent-clipped=0.0 2023-12-22 15:32:34,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=655066.6666666666, ans=0.125 2023-12-22 15:32:37,565 INFO [train.py:886] (1/4) Epoch 21, batch 2950, loss[loss=0.01267, audio_tagging_loss=0.01267, over 25000.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4949199.47 frames. ], batch size: 100, lr: 5.15e-03, grad_scale: 64.0 2023-12-22 15:32:42,346 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.75 vs. limit=6.0 2023-12-22 15:32:51,461 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.31 vs. limit=12.0 2023-12-22 15:32:53,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=655200.0, ans=0.125 2023-12-22 15:32:59,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=655266.6666666666, ans=0.1 2023-12-22 15:33:19,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=655400.0, ans=0.125 2023-12-22 15:33:24,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=655400.0, ans=0.125 2023-12-22 15:33:27,345 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.76 vs. limit=15.0 2023-12-22 15:33:29,611 INFO [train.py:886] (1/4) Epoch 21, batch 3000, loss[loss=0.01218, audio_tagging_loss=0.01218, over 25000.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4949637.76 frames. ], batch size: 100, lr: 5.15e-03, grad_scale: 64.0 2023-12-22 15:33:29,611 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 15:33:50,879 INFO [train.py:917] (1/4) Epoch 21, validation: loss=0.03274, audio_tagging_loss=0.03274, over 3737520.00 frames. 2023-12-22 15:33:50,880 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 15:33:58,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=655466.6666666666, ans=0.05 2023-12-22 15:34:14,635 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.605e+01 2.876e+01 3.036e+01 3.151e+01 3.734e+01, threshold=6.071e+01, percent-clipped=0.0 2023-12-22 15:34:22,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=655666.6666666666, ans=0.125 2023-12-22 15:34:35,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=655733.3333333334, ans=15.0 2023-12-22 15:34:36,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=655733.3333333334, ans=0.125 2023-12-22 15:34:41,062 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=22.40 vs. limit=15.0 2023-12-22 15:34:41,397 INFO [train.py:886] (1/4) Epoch 21, batch 3050, loss[loss=0.0135, audio_tagging_loss=0.0135, over 21952.00 frames. ], tot_loss[loss=0.01353, audio_tagging_loss=0.01353, over 4954881.60 frames. ], batch size: 107, lr: 5.15e-03, grad_scale: 64.0 2023-12-22 15:34:45,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=655800.0, ans=0.2 2023-12-22 15:34:47,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=655800.0, ans=0.125 2023-12-22 15:35:05,744 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.91 vs. limit=15.0 2023-12-22 15:35:33,734 INFO [train.py:886] (1/4) Epoch 21, batch 3100, loss[loss=0.01405, audio_tagging_loss=0.01405, over 25000.00 frames. ], tot_loss[loss=0.01355, audio_tagging_loss=0.01355, over 4962096.12 frames. ], batch size: 100, lr: 5.15e-03, grad_scale: 64.0 2023-12-22 15:35:33,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=656133.3333333334, ans=0.0 2023-12-22 15:35:43,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.30 vs. limit=15.0 2023-12-22 15:35:58,327 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.673e+01 2.955e+01 3.066e+01 3.256e+01 3.692e+01, threshold=6.132e+01, percent-clipped=0.0 2023-12-22 15:36:19,347 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.74 vs. limit=22.5 2023-12-22 15:36:25,927 INFO [train.py:886] (1/4) Epoch 21, batch 3150, loss[loss=0.01266, audio_tagging_loss=0.01266, over 24750.00 frames. ], tot_loss[loss=0.01365, audio_tagging_loss=0.01365, over 4960113.11 frames. ], batch size: 99, lr: 5.15e-03, grad_scale: 64.0 2023-12-22 15:37:04,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=656666.6666666666, ans=0.0 2023-12-22 15:37:16,938 INFO [train.py:886] (1/4) Epoch 21, batch 3200, loss[loss=0.01234, audio_tagging_loss=0.01234, over 24750.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4953669.77 frames. ], batch size: 99, lr: 5.15e-03, grad_scale: 64.0 2023-12-22 15:37:20,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=656800.0, ans=0.125 2023-12-22 15:37:21,985 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:37:28,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=656866.6666666666, ans=0.125 2023-12-22 15:37:29,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=656866.6666666666, ans=0.0 2023-12-22 15:37:29,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=656866.6666666666, ans=0.0 2023-12-22 15:37:42,410 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.707e+01 2.933e+01 3.051e+01 3.239e+01 4.108e+01, threshold=6.103e+01, percent-clipped=0.0 2023-12-22 15:37:49,360 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.28 vs. limit=22.5 2023-12-22 15:37:56,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=657000.0, ans=0.0 2023-12-22 15:38:09,721 INFO [train.py:886] (1/4) Epoch 21, batch 3250, loss[loss=0.01251, audio_tagging_loss=0.01251, over 24750.00 frames. ], tot_loss[loss=0.01349, audio_tagging_loss=0.01349, over 4948086.12 frames. ], batch size: 99, lr: 5.15e-03, grad_scale: 64.0 2023-12-22 15:38:18,384 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:38:24,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=657200.0, ans=0.125 2023-12-22 15:38:35,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.20 vs. limit=15.0 2023-12-22 15:38:37,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=657266.6666666666, ans=0.0 2023-12-22 15:38:47,515 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.13 vs. limit=15.0 2023-12-22 15:38:48,399 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.04 vs. limit=12.0 2023-12-22 15:38:51,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=657400.0, ans=0.09899494936611666 2023-12-22 15:38:58,210 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.25 vs. limit=15.0 2023-12-22 15:39:00,444 INFO [train.py:886] (1/4) Epoch 21, batch 3300, loss[loss=0.01424, audio_tagging_loss=0.01424, over 24750.00 frames. ], tot_loss[loss=0.01348, audio_tagging_loss=0.01348, over 4946119.94 frames. ], batch size: 99, lr: 5.14e-03, grad_scale: 64.0 2023-12-22 15:39:04,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=657466.6666666666, ans=0.1 2023-12-22 15:39:21,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=657600.0, ans=0.2 2023-12-22 15:39:24,988 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.536e+01 2.897e+01 3.042e+01 3.162e+01 3.785e+01, threshold=6.083e+01, percent-clipped=0.0 2023-12-22 15:39:51,623 INFO [train.py:886] (1/4) Epoch 21, batch 3350, loss[loss=0.01538, audio_tagging_loss=0.01538, over 25000.00 frames. ], tot_loss[loss=0.01349, audio_tagging_loss=0.01349, over 4949834.75 frames. ], batch size: 100, lr: 5.14e-03, grad_scale: 64.0 2023-12-22 15:39:54,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=657800.0, ans=0.125 2023-12-22 15:40:08,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=657866.6666666666, ans=0.0 2023-12-22 15:40:25,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=658000.0, ans=0.125 2023-12-22 15:40:32,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=658066.6666666666, ans=0.125 2023-12-22 15:40:43,801 INFO [train.py:886] (1/4) Epoch 21, batch 3400, loss[loss=0.01467, audio_tagging_loss=0.01467, over 25000.00 frames. ], tot_loss[loss=0.01353, audio_tagging_loss=0.01353, over 4952377.38 frames. ], batch size: 100, lr: 5.14e-03, grad_scale: 64.0 2023-12-22 15:41:07,160 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.745e+01 2.968e+01 3.084e+01 3.241e+01 3.911e+01, threshold=6.167e+01, percent-clipped=0.0 2023-12-22 15:41:34,388 INFO [train.py:886] (1/4) Epoch 21, batch 3450, loss[loss=0.01534, audio_tagging_loss=0.01534, over 24750.00 frames. ], tot_loss[loss=0.01366, audio_tagging_loss=0.01366, over 4949714.43 frames. ], batch size: 99, lr: 5.14e-03, grad_scale: 64.0 2023-12-22 15:42:01,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=658600.0, ans=0.0 2023-12-22 15:42:02,075 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.19 vs. limit=6.0 2023-12-22 15:42:04,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=658666.6666666666, ans=0.125 2023-12-22 15:42:13,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=658666.6666666666, ans=0.0 2023-12-22 15:42:26,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=658800.0, ans=0.125 2023-12-22 15:42:27,473 INFO [train.py:886] (1/4) Epoch 21, batch 3500, loss[loss=0.01439, audio_tagging_loss=0.01439, over 25000.00 frames. ], tot_loss[loss=0.01366, audio_tagging_loss=0.01366, over 4948194.70 frames. ], batch size: 100, lr: 5.14e-03, grad_scale: 64.0 2023-12-22 15:42:52,329 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.611e+01 2.914e+01 3.083e+01 3.218e+01 3.665e+01, threshold=6.166e+01, percent-clipped=0.0 2023-12-22 15:42:57,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=658933.3333333334, ans=0.125 2023-12-22 15:43:03,680 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:43:10,683 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.58 vs. limit=15.0 2023-12-22 15:43:18,619 INFO [train.py:886] (1/4) Epoch 21, batch 3550, loss[loss=0.01246, audio_tagging_loss=0.01246, over 25000.00 frames. ], tot_loss[loss=0.01351, audio_tagging_loss=0.01351, over 4945401.54 frames. ], batch size: 100, lr: 5.14e-03, grad_scale: 64.0 2023-12-22 15:43:37,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=659200.0, ans=0.125 2023-12-22 15:43:47,436 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.51 vs. limit=22.5 2023-12-22 15:44:10,684 INFO [train.py:886] (1/4) Epoch 21, batch 3600, loss[loss=0.01635, audio_tagging_loss=0.01635, over 25000.00 frames. ], tot_loss[loss=0.01351, audio_tagging_loss=0.01351, over 4948444.13 frames. ], batch size: 100, lr: 5.14e-03, grad_scale: 64.0 2023-12-22 15:44:28,966 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=15.0 2023-12-22 15:44:36,331 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.499e+01 2.979e+01 3.108e+01 3.250e+01 3.657e+01, threshold=6.215e+01, percent-clipped=0.0 2023-12-22 15:44:36,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=659600.0, ans=0.0 2023-12-22 15:44:40,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=659600.0, ans=0.125 2023-12-22 15:44:41,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=659666.6666666666, ans=0.125 2023-12-22 15:44:48,278 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.40 vs. limit=15.0 2023-12-22 15:45:02,660 INFO [train.py:886] (1/4) Epoch 21, batch 3650, loss[loss=0.01674, audio_tagging_loss=0.01674, over 25000.00 frames. ], tot_loss[loss=0.01351, audio_tagging_loss=0.01351, over 4951928.37 frames. ], batch size: 100, lr: 5.14e-03, grad_scale: 64.0 2023-12-22 15:45:04,683 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.15 vs. limit=22.5 2023-12-22 15:45:07,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=659800.0, ans=0.1 2023-12-22 15:45:17,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=659866.6666666666, ans=0.05 2023-12-22 15:45:28,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=659933.3333333334, ans=0.07 2023-12-22 15:45:34,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=660000.0, ans=0.0 2023-12-22 15:45:46,265 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.98 vs. limit=22.5 2023-12-22 15:45:47,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=660066.6666666666, ans=0.0 2023-12-22 15:45:54,403 INFO [train.py:886] (1/4) Epoch 21, batch 3700, loss[loss=0.01347, audio_tagging_loss=0.01347, over 25000.00 frames. ], tot_loss[loss=0.0135, audio_tagging_loss=0.0135, over 4952750.78 frames. ], batch size: 100, lr: 5.13e-03, grad_scale: 64.0 2023-12-22 15:45:56,845 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.45 vs. limit=15.0 2023-12-22 15:46:14,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=660200.0, ans=0.125 2023-12-22 15:46:18,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.48 vs. limit=15.0 2023-12-22 15:46:19,997 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.478e+01 2.928e+01 3.055e+01 3.227e+01 3.842e+01, threshold=6.110e+01, percent-clipped=0.0 2023-12-22 15:46:22,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=660266.6666666666, ans=0.0 2023-12-22 15:46:47,453 INFO [train.py:886] (1/4) Epoch 21, batch 3750, loss[loss=0.01121, audio_tagging_loss=0.01121, over 25000.00 frames. ], tot_loss[loss=0.01353, audio_tagging_loss=0.01353, over 4943722.93 frames. ], batch size: 100, lr: 5.13e-03, grad_scale: 64.0 2023-12-22 15:46:53,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=660466.6666666666, ans=0.07 2023-12-22 15:46:53,776 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.21 vs. limit=12.0 2023-12-22 15:47:11,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=660600.0, ans=0.1 2023-12-22 15:47:22,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=660666.6666666666, ans=0.125 2023-12-22 15:47:38,461 INFO [train.py:886] (1/4) Epoch 21, batch 3800, loss[loss=0.01232, audio_tagging_loss=0.01232, over 25000.00 frames. ], tot_loss[loss=0.01357, audio_tagging_loss=0.01357, over 4941920.83 frames. ], batch size: 100, lr: 5.13e-03, grad_scale: 64.0 2023-12-22 15:47:49,210 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.44 vs. limit=15.0 2023-12-22 15:47:50,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=660866.6666666666, ans=0.0 2023-12-22 15:47:50,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=660866.6666666666, ans=0.0 2023-12-22 15:47:56,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=660866.6666666666, ans=0.02 2023-12-22 15:47:59,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=660933.3333333334, ans=0.125 2023-12-22 15:48:03,543 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.617e+01 3.001e+01 3.115e+01 3.242e+01 4.083e+01, threshold=6.229e+01, percent-clipped=0.0 2023-12-22 15:48:03,964 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.52 vs. limit=22.5 2023-12-22 15:48:12,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=661000.0, ans=0.125 2023-12-22 15:48:13,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=661000.0, ans=0.125 2023-12-22 15:48:14,894 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.98 vs. limit=6.0 2023-12-22 15:48:20,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=661066.6666666666, ans=0.0 2023-12-22 15:48:30,952 INFO [train.py:886] (1/4) Epoch 21, batch 3850, loss[loss=0.01172, audio_tagging_loss=0.01172, over 24750.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4946508.89 frames. ], batch size: 99, lr: 5.13e-03, grad_scale: 64.0 2023-12-22 15:48:32,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=661133.3333333334, ans=0.125 2023-12-22 15:48:40,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=661200.0, ans=0.1 2023-12-22 15:49:04,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.13 vs. limit=15.0 2023-12-22 15:49:23,641 INFO [train.py:886] (1/4) Epoch 21, batch 3900, loss[loss=0.01308, audio_tagging_loss=0.01308, over 24750.00 frames. ], tot_loss[loss=0.01347, audio_tagging_loss=0.01347, over 4949779.08 frames. ], batch size: 99, lr: 5.13e-03, grad_scale: 64.0 2023-12-22 15:49:43,048 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=5.56 vs. limit=15.0 2023-12-22 15:49:47,899 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.727e+01 2.909e+01 3.084e+01 3.230e+01 3.604e+01, threshold=6.168e+01, percent-clipped=0.0 2023-12-22 15:50:02,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=661666.6666666666, ans=0.0 2023-12-22 15:50:10,525 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.27 vs. limit=15.0 2023-12-22 15:50:14,919 INFO [train.py:886] (1/4) Epoch 21, batch 3950, loss[loss=0.01075, audio_tagging_loss=0.01075, over 25000.00 frames. ], tot_loss[loss=0.01345, audio_tagging_loss=0.01345, over 4951942.99 frames. ], batch size: 100, lr: 5.13e-03, grad_scale: 64.0 2023-12-22 15:50:16,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=661800.0, ans=0.0 2023-12-22 15:50:17,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=661800.0, ans=0.125 2023-12-22 15:50:22,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=661800.0, ans=0.0 2023-12-22 15:50:27,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.91 vs. limit=10.0 2023-12-22 15:50:37,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=661933.3333333334, ans=0.125 2023-12-22 15:50:50,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=662000.0, ans=0.1 2023-12-22 15:50:50,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=662000.0, ans=0.125 2023-12-22 15:50:52,854 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.22 vs. limit=15.0 2023-12-22 15:50:56,273 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.89 vs. limit=15.0 2023-12-22 15:50:59,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=662066.6666666666, ans=0.125 2023-12-22 15:50:59,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=662066.6666666666, ans=0.04949747468305833 2023-12-22 15:51:07,211 INFO [train.py:886] (1/4) Epoch 21, batch 4000, loss[loss=0.01172, audio_tagging_loss=0.01172, over 24750.00 frames. ], tot_loss[loss=0.01345, audio_tagging_loss=0.01345, over 4946268.80 frames. ], batch size: 99, lr: 5.13e-03, grad_scale: 128.0 2023-12-22 15:51:08,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=662133.3333333334, ans=0.125 2023-12-22 15:51:10,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=662133.3333333334, ans=0.125 2023-12-22 15:51:32,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=662266.6666666666, ans=0.125 2023-12-22 15:51:33,828 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.648e+01 2.959e+01 3.062e+01 3.233e+01 3.752e+01, threshold=6.123e+01, percent-clipped=0.0 2023-12-22 15:51:35,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=662266.6666666666, ans=0.1 2023-12-22 15:51:37,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=662333.3333333334, ans=0.0 2023-12-22 15:51:51,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=662400.0, ans=0.1 2023-12-22 15:51:51,956 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.37 vs. limit=22.5 2023-12-22 15:51:59,478 INFO [train.py:886] (1/4) Epoch 21, batch 4050, loss[loss=0.01587, audio_tagging_loss=0.01587, over 24750.00 frames. ], tot_loss[loss=0.01343, audio_tagging_loss=0.01343, over 4951833.07 frames. ], batch size: 99, lr: 5.13e-03, grad_scale: 64.0 2023-12-22 15:52:00,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=662466.6666666666, ans=0.125 2023-12-22 15:52:06,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=662466.6666666666, ans=0.1 2023-12-22 15:52:17,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=662533.3333333334, ans=0.0 2023-12-22 15:52:36,467 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.22 vs. limit=15.0 2023-12-22 15:52:40,199 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2023-12-22 15:52:40,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=662733.3333333334, ans=0.0 2023-12-22 15:52:45,377 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.11 vs. limit=15.0 2023-12-22 15:52:49,959 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.57 vs. limit=22.5 2023-12-22 15:52:51,198 INFO [train.py:886] (1/4) Epoch 21, batch 4100, loss[loss=0.01376, audio_tagging_loss=0.01376, over 25000.00 frames. ], tot_loss[loss=0.01348, audio_tagging_loss=0.01348, over 4947682.90 frames. ], batch size: 100, lr: 5.12e-03, grad_scale: 64.0 2023-12-22 15:52:52,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=662800.0, ans=0.2 2023-12-22 15:52:57,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=662800.0, ans=0.1 2023-12-22 15:53:17,034 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.663e+01 2.959e+01 3.122e+01 3.290e+01 3.671e+01, threshold=6.244e+01, percent-clipped=0.0 2023-12-22 15:53:18,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=662933.3333333334, ans=0.125 2023-12-22 15:53:24,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=663000.0, ans=0.125 2023-12-22 15:53:33,343 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.39 vs. limit=22.5 2023-12-22 15:53:36,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=663066.6666666666, ans=0.1 2023-12-22 15:53:43,771 INFO [train.py:886] (1/4) Epoch 21, batch 4150, loss[loss=0.01662, audio_tagging_loss=0.01662, over 25000.00 frames. ], tot_loss[loss=0.01346, audio_tagging_loss=0.01346, over 4944146.66 frames. ], batch size: 100, lr: 5.12e-03, grad_scale: 64.0 2023-12-22 15:53:47,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=663133.3333333334, ans=0.2 2023-12-22 15:53:54,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=663200.0, ans=0.0 2023-12-22 15:54:20,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=663333.3333333334, ans=0.1 2023-12-22 15:54:23,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=663400.0, ans=0.0 2023-12-22 15:54:30,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=663400.0, ans=15.0 2023-12-22 15:54:35,393 INFO [train.py:886] (1/4) Epoch 21, batch 4200, loss[loss=0.0144, audio_tagging_loss=0.0144, over 24750.00 frames. ], tot_loss[loss=0.0135, audio_tagging_loss=0.0135, over 4946712.17 frames. ], batch size: 99, lr: 5.12e-03, grad_scale: 64.0 2023-12-22 15:55:00,455 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.534e+01 2.926e+01 3.050e+01 3.273e+01 3.755e+01, threshold=6.101e+01, percent-clipped=0.0 2023-12-22 15:55:06,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=663666.6666666666, ans=0.125 2023-12-22 15:55:13,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=663666.6666666666, ans=0.0 2023-12-22 15:55:27,278 INFO [train.py:886] (1/4) Epoch 21, batch 4250, loss[loss=0.01102, audio_tagging_loss=0.01102, over 25000.00 frames. ], tot_loss[loss=0.01344, audio_tagging_loss=0.01344, over 4947730.20 frames. ], batch size: 100, lr: 5.12e-03, grad_scale: 64.0 2023-12-22 15:55:29,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.18 vs. limit=22.5 2023-12-22 15:55:33,685 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.02 vs. limit=22.5 2023-12-22 15:55:38,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=663866.6666666666, ans=0.125 2023-12-22 15:56:11,541 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.21 vs. limit=15.0 2023-12-22 15:56:18,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=664066.6666666666, ans=0.125 2023-12-22 15:56:20,087 INFO [train.py:886] (1/4) Epoch 21, batch 4300, loss[loss=0.01387, audio_tagging_loss=0.01387, over 25000.00 frames. ], tot_loss[loss=0.01342, audio_tagging_loss=0.01342, over 4952971.03 frames. ], batch size: 100, lr: 5.12e-03, grad_scale: 64.0 2023-12-22 15:56:29,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=664200.0, ans=0.125 2023-12-22 15:56:32,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=664200.0, ans=0.0 2023-12-22 15:56:38,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=664200.0, ans=0.1 2023-12-22 15:56:39,242 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.33 vs. limit=22.5 2023-12-22 15:56:45,829 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.556e+01 2.943e+01 3.122e+01 3.224e+01 4.020e+01, threshold=6.245e+01, percent-clipped=0.0 2023-12-22 15:56:48,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=664266.6666666666, ans=0.0 2023-12-22 15:56:56,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=664333.3333333334, ans=0.0 2023-12-22 15:57:06,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=664400.0, ans=0.1 2023-12-22 15:57:10,832 INFO [train.py:886] (1/4) Epoch 21, batch 4350, loss[loss=0.01354, audio_tagging_loss=0.01354, over 25000.00 frames. ], tot_loss[loss=0.01357, audio_tagging_loss=0.01357, over 4960545.32 frames. ], batch size: 100, lr: 5.12e-03, grad_scale: 64.0 2023-12-22 15:57:14,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=664466.6666666666, ans=0.125 2023-12-22 15:57:14,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=664466.6666666666, ans=0.125 2023-12-22 15:57:41,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=664666.6666666666, ans=0.0 2023-12-22 15:57:46,205 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.71 vs. limit=22.5 2023-12-22 15:57:56,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=664733.3333333334, ans=0.0 2023-12-22 15:58:03,334 INFO [train.py:886] (1/4) Epoch 21, batch 4400, loss[loss=0.0141, audio_tagging_loss=0.0141, over 24750.00 frames. ], tot_loss[loss=0.01373, audio_tagging_loss=0.01373, over 4956305.10 frames. ], batch size: 99, lr: 5.12e-03, grad_scale: 64.0 2023-12-22 15:58:06,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=664800.0, ans=0.2 2023-12-22 15:58:08,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=664800.0, ans=0.5 2023-12-22 15:58:08,463 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.01 vs. limit=15.0 2023-12-22 15:58:10,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=664800.0, ans=0.125 2023-12-22 15:58:25,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=664933.3333333334, ans=0.125 2023-12-22 15:58:29,329 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.790e+01 3.057e+01 3.154e+01 3.271e+01 4.005e+01, threshold=6.308e+01, percent-clipped=0.0 2023-12-22 15:58:54,459 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.09 vs. limit=22.5 2023-12-22 15:58:54,511 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.33 vs. limit=12.0 2023-12-22 15:58:55,006 INFO [train.py:886] (1/4) Epoch 21, batch 4450, loss[loss=0.0146, audio_tagging_loss=0.0146, over 24750.00 frames. ], tot_loss[loss=0.01365, audio_tagging_loss=0.01365, over 4945553.26 frames. ], batch size: 99, lr: 5.12e-03, grad_scale: 64.0 2023-12-22 15:58:59,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=665133.3333333334, ans=0.125 2023-12-22 15:59:02,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=665133.3333333334, ans=0.125 2023-12-22 15:59:14,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=665266.6666666666, ans=0.1 2023-12-22 15:59:17,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=665266.6666666666, ans=0.2 2023-12-22 15:59:39,137 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.76 vs. limit=15.0 2023-12-22 15:59:39,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=665400.0, ans=0.95 2023-12-22 15:59:46,938 INFO [train.py:886] (1/4) Epoch 21, batch 4500, loss[loss=0.01312, audio_tagging_loss=0.01312, over 25000.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4946655.68 frames. ], batch size: 100, lr: 5.11e-03, grad_scale: 64.0 2023-12-22 15:59:57,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=665533.3333333334, ans=0.0 2023-12-22 16:00:12,610 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.625e+01 2.928e+01 3.056e+01 3.221e+01 3.659e+01, threshold=6.113e+01, percent-clipped=0.0 2023-12-22 16:00:39,022 INFO [train.py:886] (1/4) Epoch 21, batch 4550, loss[loss=0.01214, audio_tagging_loss=0.01214, over 25000.00 frames. ], tot_loss[loss=0.01347, audio_tagging_loss=0.01347, over 4952922.43 frames. ], batch size: 100, lr: 5.11e-03, grad_scale: 64.0 2023-12-22 16:00:41,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=665800.0, ans=0.0 2023-12-22 16:00:42,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=665800.0, ans=0.125 2023-12-22 16:00:51,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=665866.6666666666, ans=0.125 2023-12-22 16:01:00,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=665933.3333333334, ans=0.125 2023-12-22 16:01:00,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=665933.3333333334, ans=0.125 2023-12-22 16:01:08,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=666000.0, ans=0.125 2023-12-22 16:01:09,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=666000.0, ans=0.2 2023-12-22 16:01:15,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=666000.0, ans=0.05 2023-12-22 16:01:16,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=666000.0, ans=0.2 2023-12-22 16:01:23,010 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.54 vs. limit=15.0 2023-12-22 16:01:29,224 INFO [train.py:886] (1/4) Epoch 21, batch 4600, loss[loss=0.01364, audio_tagging_loss=0.01364, over 25000.00 frames. ], tot_loss[loss=0.01351, audio_tagging_loss=0.01351, over 4960076.51 frames. ], batch size: 100, lr: 5.11e-03, grad_scale: 64.0 2023-12-22 16:01:39,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=666200.0, ans=0.125 2023-12-22 16:01:43,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=666200.0, ans=0.2 2023-12-22 16:01:55,669 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.599e+01 2.941e+01 3.039e+01 3.146e+01 3.835e+01, threshold=6.079e+01, percent-clipped=0.0 2023-12-22 16:01:55,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=666266.6666666666, ans=0.125 2023-12-22 16:01:55,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=666266.6666666666, ans=0.1 2023-12-22 16:01:58,062 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.06 vs. limit=6.0 2023-12-22 16:02:03,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=666333.3333333334, ans=0.0 2023-12-22 16:02:06,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=666333.3333333334, ans=0.125 2023-12-22 16:02:07,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=666333.3333333334, ans=0.125 2023-12-22 16:02:08,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=666333.3333333334, ans=0.0 2023-12-22 16:02:09,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=666333.3333333334, ans=0.0 2023-12-22 16:02:10,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=666400.0, ans=0.125 2023-12-22 16:02:17,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=666400.0, ans=0.0 2023-12-22 16:02:21,754 INFO [train.py:886] (1/4) Epoch 21, batch 4650, loss[loss=0.01397, audio_tagging_loss=0.01397, over 24750.00 frames. ], tot_loss[loss=0.01355, audio_tagging_loss=0.01355, over 4961383.04 frames. ], batch size: 99, lr: 5.11e-03, grad_scale: 64.0 2023-12-22 16:02:26,731 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 16:02:35,755 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.46 vs. limit=15.0 2023-12-22 16:02:50,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=666600.0, ans=0.125 2023-12-22 16:02:58,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=666666.6666666666, ans=0.09899494936611666 2023-12-22 16:02:58,540 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.66 vs. limit=15.0 2023-12-22 16:03:13,762 INFO [train.py:886] (1/4) Epoch 21, batch 4700, loss[loss=0.01386, audio_tagging_loss=0.01386, over 24750.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 4957276.50 frames. ], batch size: 99, lr: 5.11e-03, grad_scale: 64.0 2023-12-22 16:03:25,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=666866.6666666666, ans=0.125 2023-12-22 16:03:27,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=666866.6666666666, ans=0.04949747468305833 2023-12-22 16:03:35,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=666933.3333333334, ans=0.125 2023-12-22 16:03:37,516 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.714e+01 3.014e+01 3.141e+01 3.308e+01 3.967e+01, threshold=6.283e+01, percent-clipped=0.0 2023-12-22 16:03:41,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=666933.3333333334, ans=0.125 2023-12-22 16:03:45,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=667000.0, ans=0.1 2023-12-22 16:03:53,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=667066.6666666666, ans=0.05 2023-12-22 16:04:01,457 INFO [train.py:886] (1/4) Epoch 21, batch 4750, loss[loss=0.01438, audio_tagging_loss=0.01438, over 24750.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4955877.97 frames. ], batch size: 99, lr: 5.11e-03, grad_scale: 64.0 2023-12-22 16:04:04,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=667133.3333333334, ans=0.0 2023-12-22 16:04:07,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=667133.3333333334, ans=0.1 2023-12-22 16:04:07,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=667133.3333333334, ans=0.2 2023-12-22 16:04:35,442 INFO [train.py:886] (1/4) Epoch 22, batch 0, loss[loss=0.03087, audio_tagging_loss=0.03087, over 24017.00 frames. ], tot_loss[loss=0.03087, audio_tagging_loss=0.03087, over 24017.00 frames. ], batch size: 100, lr: 4.99e-03, grad_scale: 64.0 2023-12-22 16:04:35,443 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 16:04:48,901 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.5751, 3.6300, 3.4456, 3.3762, 3.4661, 3.6932, 2.2505, 2.7073], device='cuda:1') 2023-12-22 16:04:55,982 INFO [train.py:917] (1/4) Epoch 22, validation: loss=0.03204, audio_tagging_loss=0.03204, over 3737520.00 frames. 2023-12-22 16:04:55,983 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 16:04:58,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=667240.0, ans=0.125 2023-12-22 16:05:00,013 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 16:05:01,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=667240.0, ans=0.05 2023-12-22 16:05:17,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=667373.3333333334, ans=0.1 2023-12-22 16:05:30,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=667440.0, ans=0.125 2023-12-22 16:05:41,846 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.63 vs. limit=5.0 2023-12-22 16:05:47,417 INFO [train.py:886] (1/4) Epoch 22, batch 50, loss[loss=0.01569, audio_tagging_loss=0.01569, over 25000.00 frames. ], tot_loss[loss=0.02139, audio_tagging_loss=0.02139, over 1117042.72 frames. ], batch size: 100, lr: 4.99e-03, grad_scale: 32.0 2023-12-22 16:05:53,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=667573.3333333334, ans=0.1 2023-12-22 16:05:58,091 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.665e+01 3.143e+01 3.722e+01 4.421e+01 9.512e+01, threshold=7.444e+01, percent-clipped=8.0 2023-12-22 16:06:23,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=667773.3333333334, ans=0.125 2023-12-22 16:06:23,387 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.59 vs. limit=22.5 2023-12-22 16:06:38,674 INFO [train.py:886] (1/4) Epoch 22, batch 100, loss[loss=0.0134, audio_tagging_loss=0.0134, over 25000.00 frames. ], tot_loss[loss=0.01849, audio_tagging_loss=0.01849, over 1971138.29 frames. ], batch size: 100, lr: 4.99e-03, grad_scale: 32.0 2023-12-22 16:06:41,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=667906.6666666666, ans=0.125 2023-12-22 16:06:55,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=667973.3333333334, ans=0.125 2023-12-22 16:07:09,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=668106.6666666666, ans=0.125 2023-12-22 16:07:13,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=668106.6666666666, ans=0.125 2023-12-22 16:07:24,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=668173.3333333334, ans=0.125 2023-12-22 16:07:30,451 INFO [train.py:886] (1/4) Epoch 22, batch 150, loss[loss=0.01231, audio_tagging_loss=0.01231, over 25000.00 frames. ], tot_loss[loss=0.01675, audio_tagging_loss=0.01675, over 2636369.45 frames. ], batch size: 100, lr: 4.98e-03, grad_scale: 32.0 2023-12-22 16:07:38,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=668240.0, ans=0.07 2023-12-22 16:07:41,256 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.747e+01 3.091e+01 3.297e+01 3.433e+01 3.866e+01, threshold=6.595e+01, percent-clipped=0.0 2023-12-22 16:08:01,368 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.16 vs. limit=15.0 2023-12-22 16:08:22,592 INFO [train.py:886] (1/4) Epoch 22, batch 200, loss[loss=0.01195, audio_tagging_loss=0.01195, over 25000.00 frames. ], tot_loss[loss=0.01575, audio_tagging_loss=0.01575, over 3153716.11 frames. ], batch size: 100, lr: 4.98e-03, grad_scale: 32.0 2023-12-22 16:08:23,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=668573.3333333334, ans=0.125 2023-12-22 16:08:28,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=668573.3333333334, ans=0.2 2023-12-22 16:08:41,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=668640.0, ans=0.0 2023-12-22 16:08:47,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=668706.6666666666, ans=0.09899494936611666 2023-12-22 16:08:47,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=668706.6666666666, ans=0.125 2023-12-22 16:08:51,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=668706.6666666666, ans=0.1 2023-12-22 16:08:52,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=668773.3333333334, ans=0.125 2023-12-22 16:08:55,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=668773.3333333334, ans=0.125 2023-12-22 16:09:13,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=668906.6666666666, ans=0.0 2023-12-22 16:09:14,301 INFO [train.py:886] (1/4) Epoch 22, batch 250, loss[loss=0.01401, audio_tagging_loss=0.01401, over 25000.00 frames. ], tot_loss[loss=0.01514, audio_tagging_loss=0.01514, over 3550780.77 frames. ], batch size: 100, lr: 4.98e-03, grad_scale: 32.0 2023-12-22 16:09:17,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=668906.6666666666, ans=0.0 2023-12-22 16:09:24,422 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.578e+01 2.955e+01 3.079e+01 3.215e+01 4.174e+01, threshold=6.159e+01, percent-clipped=0.0 2023-12-22 16:09:30,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=668973.3333333334, ans=0.1 2023-12-22 16:09:35,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=669040.0, ans=0.0 2023-12-22 16:09:59,301 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.25 vs. limit=22.5 2023-12-22 16:10:02,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=669173.3333333334, ans=0.125 2023-12-22 16:10:06,734 INFO [train.py:886] (1/4) Epoch 22, batch 300, loss[loss=0.01298, audio_tagging_loss=0.01298, over 24750.00 frames. ], tot_loss[loss=0.01489, audio_tagging_loss=0.01489, over 3858234.82 frames. ], batch size: 99, lr: 4.98e-03, grad_scale: 32.0 2023-12-22 16:10:15,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=669306.6666666666, ans=10.0 2023-12-22 16:10:18,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=669306.6666666666, ans=0.1 2023-12-22 16:10:26,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=669373.3333333334, ans=0.0 2023-12-22 16:10:58,105 INFO [train.py:886] (1/4) Epoch 22, batch 350, loss[loss=0.01241, audio_tagging_loss=0.01241, over 25000.00 frames. ], tot_loss[loss=0.01458, audio_tagging_loss=0.01458, over 4098531.33 frames. ], batch size: 100, lr: 4.98e-03, grad_scale: 32.0 2023-12-22 16:10:59,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=669573.3333333334, ans=0.125 2023-12-22 16:11:08,942 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.576e+01 2.952e+01 3.101e+01 3.215e+01 3.819e+01, threshold=6.201e+01, percent-clipped=0.0 2023-12-22 16:11:10,080 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 16:11:10,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=669640.0, ans=0.0 2023-12-22 16:11:16,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=669640.0, ans=0.125 2023-12-22 16:11:39,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=669840.0, ans=0.1 2023-12-22 16:11:50,144 INFO [train.py:886] (1/4) Epoch 22, batch 400, loss[loss=0.01501, audio_tagging_loss=0.01501, over 25000.00 frames. ], tot_loss[loss=0.01421, audio_tagging_loss=0.01421, over 4289815.85 frames. ], batch size: 100, lr: 4.98e-03, grad_scale: 32.0 2023-12-22 16:12:02,716 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.33 vs. limit=15.0 2023-12-22 16:12:04,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=669973.3333333334, ans=0.0 2023-12-22 16:12:05,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=669973.3333333334, ans=0.125 2023-12-22 16:12:11,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=670040.0, ans=0.125 2023-12-22 16:12:14,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=670040.0, ans=15.0 2023-12-22 16:12:41,793 INFO [train.py:886] (1/4) Epoch 22, batch 450, loss[loss=0.01327, audio_tagging_loss=0.01327, over 24750.00 frames. ], tot_loss[loss=0.01392, audio_tagging_loss=0.01392, over 4431858.26 frames. ], batch size: 99, lr: 4.98e-03, grad_scale: 32.0 2023-12-22 16:12:41,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=670240.0, ans=0.125 2023-12-22 16:12:52,653 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.604e+01 2.929e+01 3.055e+01 3.182e+01 3.732e+01, threshold=6.110e+01, percent-clipped=0.0 2023-12-22 16:13:09,107 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.06 vs. limit=15.0 2023-12-22 16:13:31,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=670506.6666666666, ans=0.2 2023-12-22 16:13:33,362 INFO [train.py:886] (1/4) Epoch 22, batch 500, loss[loss=0.01067, audio_tagging_loss=0.01067, over 22770.00 frames. ], tot_loss[loss=0.01376, audio_tagging_loss=0.01376, over 4551790.81 frames. ], batch size: 107, lr: 4.98e-03, grad_scale: 32.0 2023-12-22 16:13:36,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=670573.3333333334, ans=0.0 2023-12-22 16:13:46,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=670640.0, ans=0.125 2023-12-22 16:14:07,267 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.11 vs. limit=6.0 2023-12-22 16:14:13,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=670840.0, ans=0.125 2023-12-22 16:14:21,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=670840.0, ans=0.125 2023-12-22 16:14:25,948 INFO [train.py:886] (1/4) Epoch 22, batch 550, loss[loss=0.01466, audio_tagging_loss=0.01466, over 25000.00 frames. ], tot_loss[loss=0.0137, audio_tagging_loss=0.0137, over 4640951.92 frames. ], batch size: 100, lr: 4.97e-03, grad_scale: 32.0 2023-12-22 16:14:36,071 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.670e+01 2.918e+01 3.052e+01 3.206e+01 3.698e+01, threshold=6.105e+01, percent-clipped=0.0 2023-12-22 16:14:41,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=670973.3333333334, ans=22.5 2023-12-22 16:14:50,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=671040.0, ans=0.0 2023-12-22 16:14:53,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=671040.0, ans=0.1 2023-12-22 16:15:04,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=671106.6666666666, ans=0.125 2023-12-22 16:15:07,904 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2023-12-22 16:15:16,934 INFO [train.py:886] (1/4) Epoch 22, batch 600, loss[loss=0.01249, audio_tagging_loss=0.01249, over 24750.00 frames. ], tot_loss[loss=0.01365, audio_tagging_loss=0.01365, over 4704921.13 frames. ], batch size: 99, lr: 4.97e-03, grad_scale: 32.0 2023-12-22 16:15:52,918 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.60 vs. limit=15.0 2023-12-22 16:16:04,415 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 16:16:09,864 INFO [train.py:886] (1/4) Epoch 22, batch 650, loss[loss=0.01138, audio_tagging_loss=0.01138, over 24750.00 frames. ], tot_loss[loss=0.01365, audio_tagging_loss=0.01365, over 4756341.26 frames. ], batch size: 99, lr: 4.97e-03, grad_scale: 32.0 2023-12-22 16:16:12,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=671573.3333333334, ans=0.125 2023-12-22 16:16:15,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=671573.3333333334, ans=0.025 2023-12-22 16:16:17,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=671573.3333333334, ans=0.125 2023-12-22 16:16:19,418 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.741e+01 2.962e+01 3.088e+01 3.252e+01 3.665e+01, threshold=6.175e+01, percent-clipped=0.0 2023-12-22 16:16:28,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=671640.0, ans=0.125 2023-12-22 16:16:49,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=671840.0, ans=0.125 2023-12-22 16:17:01,206 INFO [train.py:886] (1/4) Epoch 22, batch 700, loss[loss=0.0119, audio_tagging_loss=0.0119, over 24750.00 frames. ], tot_loss[loss=0.01366, audio_tagging_loss=0.01366, over 4795247.97 frames. ], batch size: 99, lr: 4.97e-03, grad_scale: 32.0 2023-12-22 16:17:08,154 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.52 vs. limit=8.0 2023-12-22 16:17:35,686 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.93 vs. limit=15.0 2023-12-22 16:17:44,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=672173.3333333334, ans=0.04949747468305833 2023-12-22 16:17:50,567 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.39 vs. limit=15.0 2023-12-22 16:17:52,036 INFO [train.py:886] (1/4) Epoch 22, batch 750, loss[loss=0.01459, audio_tagging_loss=0.01459, over 25000.00 frames. ], tot_loss[loss=0.01357, audio_tagging_loss=0.01357, over 4827000.77 frames. ], batch size: 100, lr: 4.97e-03, grad_scale: 32.0 2023-12-22 16:17:54,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=672240.0, ans=0.0 2023-12-22 16:18:00,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=672240.0, ans=0.125 2023-12-22 16:18:02,883 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.741e+01 3.002e+01 3.128e+01 3.298e+01 3.708e+01, threshold=6.256e+01, percent-clipped=0.0 2023-12-22 16:18:27,339 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.565e-03 2023-12-22 16:18:28,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.84 vs. limit=22.5 2023-12-22 16:18:32,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=672506.6666666666, ans=0.1 2023-12-22 16:18:32,840 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.47 vs. limit=15.0 2023-12-22 16:18:33,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=672506.6666666666, ans=0.05 2023-12-22 16:18:45,120 INFO [train.py:886] (1/4) Epoch 22, batch 800, loss[loss=0.01284, audio_tagging_loss=0.01284, over 25000.00 frames. ], tot_loss[loss=0.01347, audio_tagging_loss=0.01347, over 4855101.72 frames. ], batch size: 100, lr: 4.97e-03, grad_scale: 32.0 2023-12-22 16:18:59,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=672640.0, ans=0.0 2023-12-22 16:19:13,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=672706.6666666666, ans=0.1 2023-12-22 16:19:15,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=672773.3333333334, ans=0.0 2023-12-22 16:19:36,120 INFO [train.py:886] (1/4) Epoch 22, batch 850, loss[loss=0.01322, audio_tagging_loss=0.01322, over 25000.00 frames. ], tot_loss[loss=0.01343, audio_tagging_loss=0.01343, over 4874931.35 frames. ], batch size: 100, lr: 4.97e-03, grad_scale: 32.0 2023-12-22 16:19:40,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=672906.6666666666, ans=0.125 2023-12-22 16:19:47,078 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.653e+01 2.942e+01 3.056e+01 3.166e+01 3.620e+01, threshold=6.111e+01, percent-clipped=0.0 2023-12-22 16:20:28,692 INFO [train.py:886] (1/4) Epoch 22, batch 900, loss[loss=0.01411, audio_tagging_loss=0.01411, over 24750.00 frames. ], tot_loss[loss=0.01348, audio_tagging_loss=0.01348, over 4894152.34 frames. ], batch size: 99, lr: 4.97e-03, grad_scale: 32.0 2023-12-22 16:20:32,739 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.39 vs. limit=22.5 2023-12-22 16:20:50,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=673373.3333333334, ans=0.125 2023-12-22 16:21:20,361 INFO [train.py:886] (1/4) Epoch 22, batch 950, loss[loss=0.01325, audio_tagging_loss=0.01325, over 24011.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4901195.02 frames. ], batch size: 100, lr: 4.96e-03, grad_scale: 32.0 2023-12-22 16:21:24,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=673573.3333333334, ans=0.125 2023-12-22 16:21:30,888 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.611e+01 2.985e+01 3.099e+01 3.290e+01 3.638e+01, threshold=6.198e+01, percent-clipped=0.0 2023-12-22 16:21:36,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=673640.0, ans=0.0 2023-12-22 16:22:00,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=673773.3333333334, ans=0.09899494936611666 2023-12-22 16:22:11,954 INFO [train.py:886] (1/4) Epoch 22, batch 1000, loss[loss=0.01169, audio_tagging_loss=0.01169, over 24750.00 frames. ], tot_loss[loss=0.0136, audio_tagging_loss=0.0136, over 4908670.58 frames. ], batch size: 99, lr: 4.96e-03, grad_scale: 32.0 2023-12-22 16:22:15,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=673906.6666666666, ans=0.0 2023-12-22 16:22:17,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=673906.6666666666, ans=0.2 2023-12-22 16:22:18,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=673906.6666666666, ans=0.0 2023-12-22 16:22:33,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=674040.0, ans=0.0 2023-12-22 16:22:48,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=674106.6666666666, ans=0.0 2023-12-22 16:22:55,878 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.36 vs. limit=12.0 2023-12-22 16:23:05,201 INFO [train.py:886] (1/4) Epoch 22, batch 1050, loss[loss=0.01095, audio_tagging_loss=0.01095, over 25000.00 frames. ], tot_loss[loss=0.01349, audio_tagging_loss=0.01349, over 4919071.31 frames. ], batch size: 100, lr: 4.96e-03, grad_scale: 32.0 2023-12-22 16:23:13,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=674306.6666666666, ans=0.015 2023-12-22 16:23:14,729 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.557e+01 2.919e+01 3.070e+01 3.239e+01 4.205e+01, threshold=6.141e+01, percent-clipped=0.0 2023-12-22 16:23:17,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=674306.6666666666, ans=0.0 2023-12-22 16:23:36,171 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.67 vs. limit=15.0 2023-12-22 16:23:49,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=674506.6666666666, ans=0.125 2023-12-22 16:23:53,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=674506.6666666666, ans=0.035 2023-12-22 16:23:54,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=674506.6666666666, ans=10.0 2023-12-22 16:23:56,081 INFO [train.py:886] (1/4) Epoch 22, batch 1100, loss[loss=0.01218, audio_tagging_loss=0.01218, over 25000.00 frames. ], tot_loss[loss=0.01329, audio_tagging_loss=0.01329, over 4932659.04 frames. ], batch size: 100, lr: 4.96e-03, grad_scale: 32.0 2023-12-22 16:24:13,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=674640.0, ans=0.0 2023-12-22 16:24:17,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=674706.6666666666, ans=0.125 2023-12-22 16:24:20,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=674706.6666666666, ans=0.015 2023-12-22 16:24:20,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=674706.6666666666, ans=0.0 2023-12-22 16:24:23,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=674706.6666666666, ans=0.125 2023-12-22 16:24:49,294 INFO [train.py:886] (1/4) Epoch 22, batch 1150, loss[loss=0.01388, audio_tagging_loss=0.01388, over 25000.00 frames. ], tot_loss[loss=0.01335, audio_tagging_loss=0.01335, over 4939486.34 frames. ], batch size: 100, lr: 4.96e-03, grad_scale: 32.0 2023-12-22 16:24:59,418 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.584e+01 2.884e+01 2.992e+01 3.161e+01 3.623e+01, threshold=5.985e+01, percent-clipped=0.0 2023-12-22 16:25:12,768 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.50 vs. limit=15.0 2023-12-22 16:25:21,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=675106.6666666666, ans=0.2 2023-12-22 16:25:21,874 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.71 vs. limit=15.0 2023-12-22 16:25:29,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=675173.3333333334, ans=10.0 2023-12-22 16:25:40,898 INFO [train.py:886] (1/4) Epoch 22, batch 1200, loss[loss=0.01506, audio_tagging_loss=0.01506, over 25000.00 frames. ], tot_loss[loss=0.01343, audio_tagging_loss=0.01343, over 4943411.00 frames. ], batch size: 100, lr: 4.96e-03, grad_scale: 32.0 2023-12-22 16:25:41,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=675240.0, ans=0.125 2023-12-22 16:25:41,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=675240.0, ans=0.2 2023-12-22 16:26:32,296 INFO [train.py:886] (1/4) Epoch 22, batch 1250, loss[loss=0.01541, audio_tagging_loss=0.01541, over 24750.00 frames. ], tot_loss[loss=0.01348, audio_tagging_loss=0.01348, over 4941475.42 frames. ], batch size: 99, lr: 4.96e-03, grad_scale: 32.0 2023-12-22 16:26:40,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=675573.3333333334, ans=0.0 2023-12-22 16:26:43,014 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.720e+01 3.009e+01 3.140e+01 3.242e+01 3.734e+01, threshold=6.280e+01, percent-clipped=0.0 2023-12-22 16:26:48,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=675640.0, ans=0.1 2023-12-22 16:26:56,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=675706.6666666666, ans=0.1 2023-12-22 16:26:58,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=675706.6666666666, ans=0.07 2023-12-22 16:27:02,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=675773.3333333334, ans=0.04949747468305833 2023-12-22 16:27:03,665 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 16:27:14,130 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.20 vs. limit=15.0 2023-12-22 16:27:22,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=675840.0, ans=0.125 2023-12-22 16:27:23,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=675840.0, ans=15.0 2023-12-22 16:27:24,925 INFO [train.py:886] (1/4) Epoch 22, batch 1300, loss[loss=0.01184, audio_tagging_loss=0.01184, over 24750.00 frames. ], tot_loss[loss=0.01347, audio_tagging_loss=0.01347, over 4936169.52 frames. ], batch size: 99, lr: 4.96e-03, grad_scale: 32.0 2023-12-22 16:27:51,934 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.68 vs. limit=15.0 2023-12-22 16:28:02,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=676106.6666666666, ans=0.125 2023-12-22 16:28:17,149 INFO [train.py:886] (1/4) Epoch 22, batch 1350, loss[loss=0.01436, audio_tagging_loss=0.01436, over 24750.00 frames. ], tot_loss[loss=0.01346, audio_tagging_loss=0.01346, over 4941883.17 frames. ], batch size: 99, lr: 4.95e-03, grad_scale: 32.0 2023-12-22 16:28:27,392 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.688e+01 2.919e+01 3.091e+01 3.263e+01 3.767e+01, threshold=6.183e+01, percent-clipped=0.0 2023-12-22 16:28:29,531 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.054e-02 2023-12-22 16:28:36,931 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 16:28:47,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=676440.0, ans=0.2 2023-12-22 16:29:08,764 INFO [train.py:886] (1/4) Epoch 22, batch 1400, loss[loss=0.01398, audio_tagging_loss=0.01398, over 25000.00 frames. ], tot_loss[loss=0.01342, audio_tagging_loss=0.01342, over 4944916.94 frames. ], batch size: 100, lr: 4.95e-03, grad_scale: 32.0 2023-12-22 16:29:10,867 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.365e-03 2023-12-22 16:29:39,532 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.17 vs. limit=15.0 2023-12-22 16:29:47,180 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 16:30:00,721 INFO [train.py:886] (1/4) Epoch 22, batch 1450, loss[loss=0.01274, audio_tagging_loss=0.01274, over 24750.00 frames. ], tot_loss[loss=0.01349, audio_tagging_loss=0.01349, over 4947250.68 frames. ], batch size: 99, lr: 4.95e-03, grad_scale: 32.0 2023-12-22 16:30:10,225 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.636e+01 2.926e+01 3.092e+01 3.201e+01 4.336e+01, threshold=6.185e+01, percent-clipped=0.0 2023-12-22 16:30:44,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=677173.3333333334, ans=0.0 2023-12-22 16:30:51,773 INFO [train.py:886] (1/4) Epoch 22, batch 1500, loss[loss=0.01492, audio_tagging_loss=0.01492, over 25000.00 frames. ], tot_loss[loss=0.0135, audio_tagging_loss=0.0135, over 4952667.23 frames. ], batch size: 100, lr: 4.95e-03, grad_scale: 32.0 2023-12-22 16:30:59,044 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.35 vs. limit=15.0 2023-12-22 16:30:59,108 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.01 vs. limit=15.0 2023-12-22 16:31:06,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=677306.6666666666, ans=0.0 2023-12-22 16:31:15,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=677373.3333333334, ans=0.125 2023-12-22 16:31:19,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=677373.3333333334, ans=0.125 2023-12-22 16:31:28,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=677440.0, ans=0.125 2023-12-22 16:31:35,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=677506.6666666666, ans=0.0 2023-12-22 16:31:44,052 INFO [train.py:886] (1/4) Epoch 22, batch 1550, loss[loss=0.01381, audio_tagging_loss=0.01381, over 24750.00 frames. ], tot_loss[loss=0.01355, audio_tagging_loss=0.01355, over 4951359.95 frames. ], batch size: 99, lr: 4.95e-03, grad_scale: 32.0 2023-12-22 16:31:47,313 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.87 vs. limit=22.5 2023-12-22 16:31:49,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=677573.3333333334, ans=0.125 2023-12-22 16:31:54,083 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.798e+01 3.056e+01 3.155e+01 3.308e+01 3.901e+01, threshold=6.310e+01, percent-clipped=0.0 2023-12-22 16:32:00,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=677640.0, ans=0.0 2023-12-22 16:32:04,342 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.93 vs. limit=22.5 2023-12-22 16:32:08,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=677706.6666666666, ans=0.0 2023-12-22 16:32:25,168 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.04 vs. limit=10.0 2023-12-22 16:32:35,994 INFO [train.py:886] (1/4) Epoch 22, batch 1600, loss[loss=0.01507, audio_tagging_loss=0.01507, over 25000.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4950132.51 frames. ], batch size: 100, lr: 4.95e-03, grad_scale: 32.0 2023-12-22 16:32:40,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=677906.6666666666, ans=0.1 2023-12-22 16:32:47,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=677973.3333333334, ans=0.0 2023-12-22 16:33:05,676 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2023-12-22 16:33:10,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=678106.6666666666, ans=0.125 2023-12-22 16:33:14,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=678106.6666666666, ans=0.0 2023-12-22 16:33:18,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=678173.3333333334, ans=0.125 2023-12-22 16:33:26,345 INFO [train.py:886] (1/4) Epoch 22, batch 1650, loss[loss=0.01231, audio_tagging_loss=0.01231, over 25000.00 frames. ], tot_loss[loss=0.01351, audio_tagging_loss=0.01351, over 4947574.99 frames. ], batch size: 100, lr: 4.95e-03, grad_scale: 32.0 2023-12-22 16:33:26,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=678240.0, ans=0.125 2023-12-22 16:33:27,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=678240.0, ans=0.125 2023-12-22 16:33:27,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=678240.0, ans=0.2 2023-12-22 16:33:32,990 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.38 vs. limit=22.5 2023-12-22 16:33:37,919 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.694e+01 2.963e+01 3.106e+01 3.211e+01 3.845e+01, threshold=6.212e+01, percent-clipped=0.0 2023-12-22 16:33:48,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=678373.3333333334, ans=0.0 2023-12-22 16:33:49,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=678373.3333333334, ans=0.125 2023-12-22 16:33:51,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=678373.3333333334, ans=0.2 2023-12-22 16:34:17,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=678506.6666666666, ans=15.0 2023-12-22 16:34:19,403 INFO [train.py:886] (1/4) Epoch 22, batch 1700, loss[loss=0.01387, audio_tagging_loss=0.01387, over 25000.00 frames. ], tot_loss[loss=0.01349, audio_tagging_loss=0.01349, over 4951840.23 frames. ], batch size: 100, lr: 4.95e-03, grad_scale: 32.0 2023-12-22 16:34:33,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=678640.0, ans=0.1 2023-12-22 16:34:34,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=678640.0, ans=0.5 2023-12-22 16:34:47,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=678706.6666666666, ans=0.125 2023-12-22 16:34:54,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=678773.3333333334, ans=0.07 2023-12-22 16:35:00,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=678840.0, ans=0.125 2023-12-22 16:35:11,976 INFO [train.py:886] (1/4) Epoch 22, batch 1750, loss[loss=0.01389, audio_tagging_loss=0.01389, over 25000.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4953237.72 frames. ], batch size: 100, lr: 4.94e-03, grad_scale: 32.0 2023-12-22 16:35:17,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=678906.6666666666, ans=0.0 2023-12-22 16:35:21,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=678973.3333333334, ans=0.125 2023-12-22 16:35:22,225 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.554e+01 2.914e+01 2.997e+01 3.169e+01 3.655e+01, threshold=5.994e+01, percent-clipped=0.0 2023-12-22 16:35:30,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=678973.3333333334, ans=0.125 2023-12-22 16:35:34,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=679040.0, ans=0.125 2023-12-22 16:35:35,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=679040.0, ans=0.0 2023-12-22 16:35:49,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=679106.6666666666, ans=0.04949747468305833 2023-12-22 16:35:54,991 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.95 vs. limit=15.0 2023-12-22 16:36:03,022 INFO [train.py:886] (1/4) Epoch 22, batch 1800, loss[loss=0.01137, audio_tagging_loss=0.01137, over 25000.00 frames. ], tot_loss[loss=0.01341, audio_tagging_loss=0.01341, over 4960066.37 frames. ], batch size: 100, lr: 4.94e-03, grad_scale: 32.0 2023-12-22 16:36:11,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=679240.0, ans=0.0 2023-12-22 16:36:27,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=679373.3333333334, ans=0.025 2023-12-22 16:36:29,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=679373.3333333334, ans=0.125 2023-12-22 16:36:29,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=679373.3333333334, ans=0.125 2023-12-22 16:36:45,059 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.08 vs. limit=15.0 2023-12-22 16:36:49,559 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.11 vs. limit=22.5 2023-12-22 16:36:55,363 INFO [train.py:886] (1/4) Epoch 22, batch 1850, loss[loss=0.01136, audio_tagging_loss=0.01136, over 24750.00 frames. ], tot_loss[loss=0.01348, audio_tagging_loss=0.01348, over 4963852.33 frames. ], batch size: 99, lr: 4.94e-03, grad_scale: 32.0 2023-12-22 16:37:05,644 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.729e+01 2.979e+01 3.098e+01 3.249e+01 3.883e+01, threshold=6.197e+01, percent-clipped=0.0 2023-12-22 16:37:14,512 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.80 vs. limit=12.0 2023-12-22 16:37:46,027 INFO [train.py:886] (1/4) Epoch 22, batch 1900, loss[loss=0.01527, audio_tagging_loss=0.01527, over 24750.00 frames. ], tot_loss[loss=0.01361, audio_tagging_loss=0.01361, over 4959789.51 frames. ], batch size: 99, lr: 4.94e-03, grad_scale: 32.0 2023-12-22 16:37:53,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.44 vs. limit=15.0 2023-12-22 16:37:57,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=679973.3333333334, ans=0.125 2023-12-22 16:38:10,429 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.75 vs. limit=15.0 2023-12-22 16:38:13,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=680040.0, ans=0.0 2023-12-22 16:38:18,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=680106.6666666666, ans=0.04949747468305833 2023-12-22 16:38:24,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=680106.6666666666, ans=0.2 2023-12-22 16:38:39,057 INFO [train.py:886] (1/4) Epoch 22, batch 1950, loss[loss=0.01404, audio_tagging_loss=0.01404, over 24750.00 frames. ], tot_loss[loss=0.01347, audio_tagging_loss=0.01347, over 4949458.66 frames. ], batch size: 99, lr: 4.94e-03, grad_scale: 32.0 2023-12-22 16:38:39,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=680240.0, ans=0.0 2023-12-22 16:38:41,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=680240.0, ans=0.1 2023-12-22 16:38:48,507 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.589e+01 3.054e+01 3.166e+01 3.335e+01 3.897e+01, threshold=6.333e+01, percent-clipped=0.0 2023-12-22 16:39:06,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=680373.3333333334, ans=0.0 2023-12-22 16:39:29,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=680506.6666666666, ans=0.125 2023-12-22 16:39:29,396 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.15 vs. limit=15.0 2023-12-22 16:39:30,780 INFO [train.py:886] (1/4) Epoch 22, batch 2000, loss[loss=0.01298, audio_tagging_loss=0.01298, over 25000.00 frames. ], tot_loss[loss=0.01341, audio_tagging_loss=0.01341, over 4946793.41 frames. ], batch size: 100, lr: 4.94e-03, grad_scale: 32.0 2023-12-22 16:39:30,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=680573.3333333334, ans=0.125 2023-12-22 16:39:58,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=680706.6666666666, ans=0.125 2023-12-22 16:40:02,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=680773.3333333334, ans=0.125 2023-12-22 16:40:22,163 INFO [train.py:886] (1/4) Epoch 22, batch 2050, loss[loss=0.01337, audio_tagging_loss=0.01337, over 25000.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4949320.73 frames. ], batch size: 100, lr: 4.94e-03, grad_scale: 64.0 2023-12-22 16:40:33,018 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.587e+01 2.841e+01 3.013e+01 3.146e+01 3.558e+01, threshold=6.025e+01, percent-clipped=0.0 2023-12-22 16:40:41,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=680973.3333333334, ans=0.125 2023-12-22 16:40:49,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=681040.0, ans=0.0 2023-12-22 16:40:54,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=681106.6666666666, ans=0.0 2023-12-22 16:40:57,165 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.39 vs. limit=15.0 2023-12-22 16:40:57,233 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2023-12-22 16:41:03,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=681173.3333333334, ans=0.0 2023-12-22 16:41:07,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=681173.3333333334, ans=0.1 2023-12-22 16:41:13,765 INFO [train.py:886] (1/4) Epoch 22, batch 2100, loss[loss=0.01175, audio_tagging_loss=0.01175, over 24750.00 frames. ], tot_loss[loss=0.0133, audio_tagging_loss=0.0133, over 4944942.89 frames. ], batch size: 99, lr: 4.94e-03, grad_scale: 64.0 2023-12-22 16:41:14,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=681240.0, ans=0.125 2023-12-22 16:41:21,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=681240.0, ans=0.2 2023-12-22 16:41:22,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=681240.0, ans=0.0 2023-12-22 16:41:32,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=681306.6666666666, ans=0.07 2023-12-22 16:41:41,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=681373.3333333334, ans=0.0 2023-12-22 16:41:43,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=681440.0, ans=0.035 2023-12-22 16:42:05,482 INFO [train.py:886] (1/4) Epoch 22, batch 2150, loss[loss=0.01497, audio_tagging_loss=0.01497, over 24034.00 frames. ], tot_loss[loss=0.0134, audio_tagging_loss=0.0134, over 4944142.88 frames. ], batch size: 100, lr: 4.94e-03, grad_scale: 64.0 2023-12-22 16:42:12,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.13 vs. limit=15.0 2023-12-22 16:42:15,655 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.651e+01 3.015e+01 3.093e+01 3.215e+01 3.763e+01, threshold=6.186e+01, percent-clipped=0.0 2023-12-22 16:42:15,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=681640.0, ans=0.0 2023-12-22 16:42:20,672 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.95 vs. limit=15.0 2023-12-22 16:42:38,647 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.52 vs. limit=15.0 2023-12-22 16:42:43,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=681773.3333333334, ans=0.125 2023-12-22 16:42:44,469 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.02 vs. limit=10.0 2023-12-22 16:42:56,337 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.54 vs. limit=15.0 2023-12-22 16:42:57,810 INFO [train.py:886] (1/4) Epoch 22, batch 2200, loss[loss=0.0157, audio_tagging_loss=0.0157, over 24750.00 frames. ], tot_loss[loss=0.0134, audio_tagging_loss=0.0134, over 4946096.06 frames. ], batch size: 99, lr: 4.93e-03, grad_scale: 64.0 2023-12-22 16:43:08,475 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.38 vs. limit=15.0 2023-12-22 16:43:11,123 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.27 vs. limit=15.0 2023-12-22 16:43:18,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=682040.0, ans=0.2 2023-12-22 16:43:23,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=682040.0, ans=0.0 2023-12-22 16:43:35,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=682106.6666666666, ans=0.09899494936611666 2023-12-22 16:43:37,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=682106.6666666666, ans=0.125 2023-12-22 16:43:48,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=682240.0, ans=0.125 2023-12-22 16:43:49,671 INFO [train.py:886] (1/4) Epoch 22, batch 2250, loss[loss=0.01641, audio_tagging_loss=0.01641, over 24930.00 frames. ], tot_loss[loss=0.01342, audio_tagging_loss=0.01342, over 4937065.99 frames. ], batch size: 100, lr: 4.93e-03, grad_scale: 64.0 2023-12-22 16:43:50,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=682240.0, ans=0.1 2023-12-22 16:43:59,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=682240.0, ans=0.125 2023-12-22 16:44:00,832 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.582e+01 2.973e+01 3.104e+01 3.289e+01 3.674e+01, threshold=6.209e+01, percent-clipped=0.0 2023-12-22 16:44:04,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=682306.6666666666, ans=0.125 2023-12-22 16:44:05,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=682306.6666666666, ans=0.125 2023-12-22 16:44:42,457 INFO [train.py:886] (1/4) Epoch 22, batch 2300, loss[loss=0.01241, audio_tagging_loss=0.01241, over 24750.00 frames. ], tot_loss[loss=0.0134, audio_tagging_loss=0.0134, over 4942515.05 frames. ], batch size: 99, lr: 4.93e-03, grad_scale: 64.0 2023-12-22 16:44:45,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=682573.3333333334, ans=0.125 2023-12-22 16:44:49,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=682573.3333333334, ans=0.09899494936611666 2023-12-22 16:44:53,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=682640.0, ans=0.125 2023-12-22 16:45:01,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=682640.0, ans=0.125 2023-12-22 16:45:12,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=682773.3333333334, ans=10.0 2023-12-22 16:45:26,156 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.07 vs. limit=12.0 2023-12-22 16:45:28,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=682840.0, ans=0.125 2023-12-22 16:45:34,705 INFO [train.py:886] (1/4) Epoch 22, batch 2350, loss[loss=0.011, audio_tagging_loss=0.011, over 25000.00 frames. ], tot_loss[loss=0.01332, audio_tagging_loss=0.01332, over 4950586.81 frames. ], batch size: 100, lr: 4.93e-03, grad_scale: 64.0 2023-12-22 16:45:45,018 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.610e+01 2.951e+01 3.052e+01 3.214e+01 3.845e+01, threshold=6.104e+01, percent-clipped=0.0 2023-12-22 16:45:46,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=682973.3333333334, ans=0.125 2023-12-22 16:45:49,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=682973.3333333334, ans=0.1 2023-12-22 16:45:49,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=682973.3333333334, ans=0.0 2023-12-22 16:45:54,512 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.71 vs. limit=15.0 2023-12-22 16:45:59,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=683040.0, ans=0.2 2023-12-22 16:46:17,710 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.90 vs. limit=15.0 2023-12-22 16:46:22,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=683173.3333333334, ans=15.0 2023-12-22 16:46:27,133 INFO [train.py:886] (1/4) Epoch 22, batch 2400, loss[loss=0.01099, audio_tagging_loss=0.01099, over 25000.00 frames. ], tot_loss[loss=0.01329, audio_tagging_loss=0.01329, over 4948844.23 frames. ], batch size: 100, lr: 4.93e-03, grad_scale: 64.0 2023-12-22 16:46:52,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=683373.3333333334, ans=0.0 2023-12-22 16:47:18,262 INFO [train.py:886] (1/4) Epoch 22, batch 2450, loss[loss=0.01335, audio_tagging_loss=0.01335, over 25000.00 frames. ], tot_loss[loss=0.01335, audio_tagging_loss=0.01335, over 4948175.05 frames. ], batch size: 100, lr: 4.93e-03, grad_scale: 64.0 2023-12-22 16:47:18,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=683573.3333333334, ans=0.125 2023-12-22 16:47:24,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=683573.3333333334, ans=0.125 2023-12-22 16:47:27,817 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=12.0 2023-12-22 16:47:28,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=683640.0, ans=0.2 2023-12-22 16:47:29,198 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.542e+01 2.984e+01 3.077e+01 3.217e+01 3.781e+01, threshold=6.155e+01, percent-clipped=0.0 2023-12-22 16:47:29,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=683640.0, ans=0.035 2023-12-22 16:47:33,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=683640.0, ans=0.1 2023-12-22 16:47:33,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=683640.0, ans=0.1 2023-12-22 16:47:49,988 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.12 vs. limit=15.0 2023-12-22 16:47:53,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=683773.3333333334, ans=0.0 2023-12-22 16:47:57,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=683773.3333333334, ans=0.1 2023-12-22 16:48:02,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=683840.0, ans=0.0 2023-12-22 16:48:10,593 INFO [train.py:886] (1/4) Epoch 22, batch 2500, loss[loss=0.01431, audio_tagging_loss=0.01431, over 24750.00 frames. ], tot_loss[loss=0.01353, audio_tagging_loss=0.01353, over 4948294.93 frames. ], batch size: 99, lr: 4.93e-03, grad_scale: 64.0 2023-12-22 16:48:15,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=683906.6666666666, ans=0.125 2023-12-22 16:48:45,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=684106.6666666666, ans=0.2 2023-12-22 16:48:51,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=684173.3333333334, ans=0.2 2023-12-22 16:49:03,076 INFO [train.py:886] (1/4) Epoch 22, batch 2550, loss[loss=0.01146, audio_tagging_loss=0.01146, over 25000.00 frames. ], tot_loss[loss=0.01364, audio_tagging_loss=0.01364, over 4946731.93 frames. ], batch size: 100, lr: 4.93e-03, grad_scale: 64.0 2023-12-22 16:49:04,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=684240.0, ans=0.125 2023-12-22 16:49:08,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=684240.0, ans=0.125 2023-12-22 16:49:10,990 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2023-12-22 16:49:13,124 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.756e+01 2.972e+01 3.101e+01 3.260e+01 3.822e+01, threshold=6.203e+01, percent-clipped=0.0 2023-12-22 16:49:31,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=684373.3333333334, ans=0.125 2023-12-22 16:49:36,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=684440.0, ans=0.125 2023-12-22 16:49:54,518 INFO [train.py:886] (1/4) Epoch 22, batch 2600, loss[loss=0.01268, audio_tagging_loss=0.01268, over 25000.00 frames. ], tot_loss[loss=0.01352, audio_tagging_loss=0.01352, over 4944782.38 frames. ], batch size: 100, lr: 4.92e-03, grad_scale: 64.0 2023-12-22 16:50:00,452 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2023-12-22 16:50:26,958 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.99 vs. limit=15.0 2023-12-22 16:50:43,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=684840.0, ans=0.125 2023-12-22 16:50:46,546 INFO [train.py:886] (1/4) Epoch 22, batch 2650, loss[loss=0.01205, audio_tagging_loss=0.01205, over 24750.00 frames. ], tot_loss[loss=0.01341, audio_tagging_loss=0.01341, over 4953329.92 frames. ], batch size: 99, lr: 4.92e-03, grad_scale: 64.0 2023-12-22 16:50:53,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=684906.6666666666, ans=0.04949747468305833 2023-12-22 16:50:53,400 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.77 vs. limit=15.0 2023-12-22 16:50:56,720 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.462e+01 2.949e+01 3.109e+01 3.258e+01 4.396e+01, threshold=6.219e+01, percent-clipped=0.0 2023-12-22 16:51:00,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=684973.3333333334, ans=0.0 2023-12-22 16:51:09,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=685040.0, ans=0.125 2023-12-22 16:51:13,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=685040.0, ans=0.0 2023-12-22 16:51:17,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=685106.6666666666, ans=0.125 2023-12-22 16:51:30,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=685173.3333333334, ans=0.125 2023-12-22 16:51:31,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=685173.3333333334, ans=0.0 2023-12-22 16:51:38,260 INFO [train.py:886] (1/4) Epoch 22, batch 2700, loss[loss=0.01169, audio_tagging_loss=0.01169, over 25000.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4947059.87 frames. ], batch size: 100, lr: 4.92e-03, grad_scale: 64.0 2023-12-22 16:51:45,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=685240.0, ans=0.0 2023-12-22 16:51:49,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=685306.6666666666, ans=0.125 2023-12-22 16:51:49,714 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=5.75 vs. limit=15.0 2023-12-22 16:51:53,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=685306.6666666666, ans=0.2 2023-12-22 16:51:55,812 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.70 vs. limit=12.0 2023-12-22 16:52:13,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=685440.0, ans=0.125 2023-12-22 16:52:22,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=685506.6666666666, ans=0.125 2023-12-22 16:52:23,368 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.23 vs. limit=6.0 2023-12-22 16:52:29,515 INFO [train.py:886] (1/4) Epoch 22, batch 2750, loss[loss=0.01449, audio_tagging_loss=0.01449, over 25000.00 frames. ], tot_loss[loss=0.01329, audio_tagging_loss=0.01329, over 4952372.82 frames. ], batch size: 100, lr: 4.92e-03, grad_scale: 64.0 2023-12-22 16:52:37,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=685573.3333333334, ans=0.125 2023-12-22 16:52:39,642 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.604e+01 2.941e+01 3.076e+01 3.293e+01 3.896e+01, threshold=6.152e+01, percent-clipped=0.0 2023-12-22 16:52:39,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=685640.0, ans=0.125 2023-12-22 16:52:39,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=685640.0, ans=0.1 2023-12-22 16:52:41,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=685640.0, ans=0.125 2023-12-22 16:52:51,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=685706.6666666666, ans=0.1 2023-12-22 16:52:59,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=685773.3333333334, ans=0.1 2023-12-22 16:53:22,636 INFO [train.py:886] (1/4) Epoch 22, batch 2800, loss[loss=0.0126, audio_tagging_loss=0.0126, over 25000.00 frames. ], tot_loss[loss=0.01341, audio_tagging_loss=0.01341, over 4947591.85 frames. ], batch size: 100, lr: 4.92e-03, grad_scale: 64.0 2023-12-22 16:53:31,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=685973.3333333334, ans=0.125 2023-12-22 16:53:44,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=686040.0, ans=0.0 2023-12-22 16:53:47,365 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.21 vs. limit=22.5 2023-12-22 16:54:13,863 INFO [train.py:886] (1/4) Epoch 22, batch 2850, loss[loss=0.01284, audio_tagging_loss=0.01284, over 24750.00 frames. ], tot_loss[loss=0.0135, audio_tagging_loss=0.0135, over 4943181.62 frames. ], batch size: 99, lr: 4.92e-03, grad_scale: 64.0 2023-12-22 16:54:20,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=686240.0, ans=0.125 2023-12-22 16:54:23,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=686240.0, ans=0.0 2023-12-22 16:54:24,832 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.661e+01 2.963e+01 3.132e+01 3.265e+01 3.712e+01, threshold=6.264e+01, percent-clipped=0.0 2023-12-22 16:54:34,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=686373.3333333334, ans=0.035 2023-12-22 16:54:43,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=686373.3333333334, ans=0.05 2023-12-22 16:54:52,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=686440.0, ans=0.2 2023-12-22 16:54:52,959 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.78 vs. limit=15.0 2023-12-22 16:54:54,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=686440.0, ans=0.09899494936611666 2023-12-22 16:54:58,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=686506.6666666666, ans=0.0 2023-12-22 16:54:59,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=686506.6666666666, ans=0.1 2023-12-22 16:55:05,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=686573.3333333334, ans=0.2 2023-12-22 16:55:06,046 INFO [train.py:886] (1/4) Epoch 22, batch 2900, loss[loss=0.01405, audio_tagging_loss=0.01405, over 25000.00 frames. ], tot_loss[loss=0.01345, audio_tagging_loss=0.01345, over 4944126.30 frames. ], batch size: 100, lr: 4.92e-03, grad_scale: 64.0 2023-12-22 16:55:06,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=686573.3333333334, ans=0.2 2023-12-22 16:55:11,022 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=15.0 2023-12-22 16:55:21,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=686640.0, ans=0.0 2023-12-22 16:55:29,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=686706.6666666666, ans=0.0 2023-12-22 16:55:39,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=686773.3333333334, ans=0.125 2023-12-22 16:55:44,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=686773.3333333334, ans=0.015 2023-12-22 16:55:58,685 INFO [train.py:886] (1/4) Epoch 22, batch 2950, loss[loss=0.01584, audio_tagging_loss=0.01584, over 25000.00 frames. ], tot_loss[loss=0.01349, audio_tagging_loss=0.01349, over 4940103.93 frames. ], batch size: 100, lr: 4.92e-03, grad_scale: 64.0 2023-12-22 16:56:08,729 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.545e+01 2.913e+01 3.029e+01 3.205e+01 3.789e+01, threshold=6.058e+01, percent-clipped=0.0 2023-12-22 16:56:24,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=687040.0, ans=0.2 2023-12-22 16:56:25,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=687040.0, ans=0.0 2023-12-22 16:56:38,505 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.43 vs. limit=15.0 2023-12-22 16:56:44,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=687173.3333333334, ans=0.125 2023-12-22 16:56:50,252 INFO [train.py:886] (1/4) Epoch 22, batch 3000, loss[loss=0.01323, audio_tagging_loss=0.01323, over 25000.00 frames. ], tot_loss[loss=0.01339, audio_tagging_loss=0.01339, over 4945968.00 frames. ], batch size: 100, lr: 4.91e-03, grad_scale: 64.0 2023-12-22 16:56:50,253 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 16:57:03,744 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.1234, 0.8832, 4.2419, 4.1881], device='cuda:1') 2023-12-22 16:57:11,827 INFO [train.py:917] (1/4) Epoch 22, validation: loss=0.03274, audio_tagging_loss=0.03274, over 3737520.00 frames. 2023-12-22 16:57:11,828 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 16:57:18,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=687240.0, ans=0.05 2023-12-22 16:57:19,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=687240.0, ans=0.0 2023-12-22 16:57:21,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=687306.6666666666, ans=0.025 2023-12-22 16:57:35,445 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.94 vs. limit=15.0 2023-12-22 16:57:37,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=687373.3333333334, ans=0.0 2023-12-22 16:57:37,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=687373.3333333334, ans=0.125 2023-12-22 16:57:39,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=687373.3333333334, ans=0.0 2023-12-22 16:58:03,734 INFO [train.py:886] (1/4) Epoch 22, batch 3050, loss[loss=0.01349, audio_tagging_loss=0.01349, over 25000.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4951747.08 frames. ], batch size: 100, lr: 4.91e-03, grad_scale: 64.0 2023-12-22 16:58:03,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=687573.3333333334, ans=0.0 2023-12-22 16:58:13,864 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.728e+01 2.983e+01 3.097e+01 3.226e+01 3.702e+01, threshold=6.194e+01, percent-clipped=0.0 2023-12-22 16:58:56,223 INFO [train.py:886] (1/4) Epoch 22, batch 3100, loss[loss=0.01563, audio_tagging_loss=0.01563, over 24750.00 frames. ], tot_loss[loss=0.01329, audio_tagging_loss=0.01329, over 4956441.50 frames. ], batch size: 99, lr: 4.91e-03, grad_scale: 64.0 2023-12-22 16:58:56,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=687906.6666666666, ans=0.0 2023-12-22 16:59:05,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=687973.3333333334, ans=0.125 2023-12-22 16:59:06,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=687973.3333333334, ans=0.125 2023-12-22 16:59:38,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=688173.3333333334, ans=0.0 2023-12-22 16:59:45,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=688173.3333333334, ans=0.2 2023-12-22 16:59:48,292 INFO [train.py:886] (1/4) Epoch 22, batch 3150, loss[loss=0.01233, audio_tagging_loss=0.01233, over 24750.00 frames. ], tot_loss[loss=0.01349, audio_tagging_loss=0.01349, over 4957600.22 frames. ], batch size: 99, lr: 4.91e-03, grad_scale: 64.0 2023-12-22 16:59:49,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=688240.0, ans=0.2 2023-12-22 16:59:55,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=688240.0, ans=0.125 2023-12-22 16:59:59,264 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.678e+01 2.993e+01 3.103e+01 3.261e+01 3.891e+01, threshold=6.205e+01, percent-clipped=0.0 2023-12-22 17:00:08,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=688373.3333333334, ans=0.0 2023-12-22 17:00:17,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=688373.3333333334, ans=22.5 2023-12-22 17:00:19,108 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.24 vs. limit=15.0 2023-12-22 17:00:21,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=688440.0, ans=0.125 2023-12-22 17:00:22,724 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 17:00:38,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=688506.6666666666, ans=0.0 2023-12-22 17:00:40,700 INFO [train.py:886] (1/4) Epoch 22, batch 3200, loss[loss=0.01467, audio_tagging_loss=0.01467, over 21779.00 frames. ], tot_loss[loss=0.01359, audio_tagging_loss=0.01359, over 4944443.59 frames. ], batch size: 107, lr: 4.91e-03, grad_scale: 64.0 2023-12-22 17:00:53,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=688640.0, ans=0.0 2023-12-22 17:01:02,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=688706.6666666666, ans=0.0 2023-12-22 17:01:07,885 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 17:01:10,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=688773.3333333334, ans=0.0 2023-12-22 17:01:13,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=688773.3333333334, ans=0.0 2023-12-22 17:01:19,657 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 17:01:31,999 INFO [train.py:886] (1/4) Epoch 22, batch 3250, loss[loss=0.01132, audio_tagging_loss=0.01132, over 25000.00 frames. ], tot_loss[loss=0.01347, audio_tagging_loss=0.01347, over 4945831.13 frames. ], batch size: 100, lr: 4.91e-03, grad_scale: 64.0 2023-12-22 17:01:32,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=688906.6666666666, ans=0.05 2023-12-22 17:01:35,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=15.0 2023-12-22 17:01:42,251 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.602e+01 2.942e+01 3.078e+01 3.201e+01 3.535e+01, threshold=6.156e+01, percent-clipped=0.0 2023-12-22 17:01:46,238 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.67 vs. limit=22.5 2023-12-22 17:01:59,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.10 vs. limit=15.0 2023-12-22 17:02:22,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=689173.3333333334, ans=0.0 2023-12-22 17:02:24,437 INFO [train.py:886] (1/4) Epoch 22, batch 3300, loss[loss=0.01465, audio_tagging_loss=0.01465, over 25000.00 frames. ], tot_loss[loss=0.01338, audio_tagging_loss=0.01338, over 4951313.45 frames. ], batch size: 100, lr: 4.91e-03, grad_scale: 64.0 2023-12-22 17:02:35,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=689306.6666666666, ans=0.125 2023-12-22 17:02:44,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=689373.3333333334, ans=0.125 2023-12-22 17:03:00,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=689440.0, ans=0.2 2023-12-22 17:03:03,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=689440.0, ans=0.0 2023-12-22 17:03:16,374 INFO [train.py:886] (1/4) Epoch 22, batch 3350, loss[loss=0.00956, audio_tagging_loss=0.00956, over 22489.00 frames. ], tot_loss[loss=0.0134, audio_tagging_loss=0.0134, over 4954586.22 frames. ], batch size: 107, lr: 4.91e-03, grad_scale: 64.0 2023-12-22 17:03:19,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=689573.3333333334, ans=0.125 2023-12-22 17:03:27,220 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.659e+01 2.983e+01 3.128e+01 3.276e+01 3.724e+01, threshold=6.256e+01, percent-clipped=0.0 2023-12-22 17:03:28,702 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.94 vs. limit=22.5 2023-12-22 17:03:50,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=689773.3333333334, ans=0.0 2023-12-22 17:03:56,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=689773.3333333334, ans=0.0 2023-12-22 17:04:04,861 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.26 vs. limit=15.0 2023-12-22 17:04:08,159 INFO [train.py:886] (1/4) Epoch 22, batch 3400, loss[loss=0.01315, audio_tagging_loss=0.01315, over 24750.00 frames. ], tot_loss[loss=0.01342, audio_tagging_loss=0.01342, over 4958395.90 frames. ], batch size: 99, lr: 4.91e-03, grad_scale: 64.0 2023-12-22 17:04:09,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=689906.6666666666, ans=0.07 2023-12-22 17:04:11,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=689906.6666666666, ans=0.1 2023-12-22 17:04:20,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=689973.3333333334, ans=0.1 2023-12-22 17:04:20,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=689973.3333333334, ans=0.125 2023-12-22 17:04:24,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=689973.3333333334, ans=0.125 2023-12-22 17:04:45,162 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 17:05:00,556 INFO [train.py:886] (1/4) Epoch 22, batch 3450, loss[loss=0.01467, audio_tagging_loss=0.01467, over 24750.00 frames. ], tot_loss[loss=0.01343, audio_tagging_loss=0.01343, over 4947954.02 frames. ], batch size: 99, lr: 4.90e-03, grad_scale: 64.0 2023-12-22 17:05:04,403 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.060e-03 2023-12-22 17:05:10,806 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.582e+01 3.072e+01 3.167e+01 3.266e+01 3.818e+01, threshold=6.334e+01, percent-clipped=0.0 2023-12-22 17:05:17,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=690306.6666666666, ans=0.1 2023-12-22 17:05:22,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=690373.3333333334, ans=0.09899494936611666 2023-12-22 17:05:24,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=690373.3333333334, ans=0.07 2023-12-22 17:05:45,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=690506.6666666666, ans=0.1 2023-12-22 17:05:52,070 INFO [train.py:886] (1/4) Epoch 22, batch 3500, loss[loss=0.01267, audio_tagging_loss=0.01267, over 24750.00 frames. ], tot_loss[loss=0.01349, audio_tagging_loss=0.01349, over 4940734.14 frames. ], batch size: 99, lr: 4.90e-03, grad_scale: 64.0 2023-12-22 17:06:35,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=690840.0, ans=0.07 2023-12-22 17:06:40,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=690840.0, ans=0.2 2023-12-22 17:06:44,594 INFO [train.py:886] (1/4) Epoch 22, batch 3550, loss[loss=0.01573, audio_tagging_loss=0.01573, over 24750.00 frames. ], tot_loss[loss=0.01343, audio_tagging_loss=0.01343, over 4944406.97 frames. ], batch size: 99, lr: 4.90e-03, grad_scale: 64.0 2023-12-22 17:06:54,159 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.635e+01 2.964e+01 3.144e+01 3.307e+01 3.937e+01, threshold=6.289e+01, percent-clipped=0.0 2023-12-22 17:07:01,882 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.37 vs. limit=6.0 2023-12-22 17:07:12,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=691040.0, ans=0.0 2023-12-22 17:07:19,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=691106.6666666666, ans=0.0 2023-12-22 17:07:25,038 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.34 vs. limit=6.0 2023-12-22 17:07:26,178 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2023-12-22 17:07:32,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=691173.3333333334, ans=0.95 2023-12-22 17:07:35,784 INFO [train.py:886] (1/4) Epoch 22, batch 3600, loss[loss=0.0152, audio_tagging_loss=0.0152, over 24750.00 frames. ], tot_loss[loss=0.01331, audio_tagging_loss=0.01331, over 4942634.98 frames. ], batch size: 99, lr: 4.90e-03, grad_scale: 64.0 2023-12-22 17:08:23,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=691506.6666666666, ans=0.125 2023-12-22 17:08:28,094 INFO [train.py:886] (1/4) Epoch 22, batch 3650, loss[loss=0.0122, audio_tagging_loss=0.0122, over 25000.00 frames. ], tot_loss[loss=0.01329, audio_tagging_loss=0.01329, over 4947053.36 frames. ], batch size: 100, lr: 4.90e-03, grad_scale: 64.0 2023-12-22 17:08:29,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=691573.3333333334, ans=0.2 2023-12-22 17:08:29,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=691573.3333333334, ans=0.125 2023-12-22 17:08:30,556 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=15.0 2023-12-22 17:08:32,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=691573.3333333334, ans=0.09899494936611666 2023-12-22 17:08:38,307 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.534e+01 2.894e+01 3.035e+01 3.158e+01 3.520e+01, threshold=6.071e+01, percent-clipped=0.0 2023-12-22 17:08:56,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=691706.6666666666, ans=0.0 2023-12-22 17:09:09,180 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.53 vs. limit=15.0 2023-12-22 17:09:11,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=691840.0, ans=0.125 2023-12-22 17:09:11,841 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.266e-03 2023-12-22 17:09:19,861 INFO [train.py:886] (1/4) Epoch 22, batch 3700, loss[loss=0.01518, audio_tagging_loss=0.01518, over 25000.00 frames. ], tot_loss[loss=0.01329, audio_tagging_loss=0.01329, over 4946452.58 frames. ], batch size: 100, lr: 4.90e-03, grad_scale: 64.0 2023-12-22 17:09:40,056 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=15.0 2023-12-22 17:09:53,309 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.53 vs. limit=22.5 2023-12-22 17:09:58,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=692106.6666666666, ans=10.0 2023-12-22 17:09:59,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=692106.6666666666, ans=0.125 2023-12-22 17:10:04,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=692173.3333333334, ans=0.2 2023-12-22 17:10:09,711 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.09 vs. limit=15.0 2023-12-22 17:10:12,271 INFO [train.py:886] (1/4) Epoch 22, batch 3750, loss[loss=0.01833, audio_tagging_loss=0.01833, over 25000.00 frames. ], tot_loss[loss=0.01339, audio_tagging_loss=0.01339, over 4946859.94 frames. ], batch size: 100, lr: 4.90e-03, grad_scale: 64.0 2023-12-22 17:10:13,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=692240.0, ans=0.2 2023-12-22 17:10:17,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=692240.0, ans=0.125 2023-12-22 17:10:20,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=692240.0, ans=0.125 2023-12-22 17:10:22,355 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.766e+01 3.031e+01 3.113e+01 3.271e+01 3.807e+01, threshold=6.227e+01, percent-clipped=0.0 2023-12-22 17:10:28,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=692306.6666666666, ans=0.0 2023-12-22 17:10:32,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=692373.3333333334, ans=0.125 2023-12-22 17:10:41,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=692373.3333333334, ans=0.1 2023-12-22 17:11:03,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=692573.3333333334, ans=0.2 2023-12-22 17:11:04,400 INFO [train.py:886] (1/4) Epoch 22, batch 3800, loss[loss=0.01301, audio_tagging_loss=0.01301, over 24750.00 frames. ], tot_loss[loss=0.01343, audio_tagging_loss=0.01343, over 4941222.17 frames. ], batch size: 99, lr: 4.90e-03, grad_scale: 64.0 2023-12-22 17:11:04,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=692573.3333333334, ans=0.125 2023-12-22 17:11:12,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=692573.3333333334, ans=0.1 2023-12-22 17:11:17,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=692640.0, ans=0.09899494936611666 2023-12-22 17:11:20,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=692640.0, ans=0.04949747468305833 2023-12-22 17:11:20,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=692640.0, ans=0.0 2023-12-22 17:11:20,820 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.85 vs. limit=12.0 2023-12-22 17:11:32,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=692706.6666666666, ans=0.0 2023-12-22 17:11:34,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=692773.3333333334, ans=0.125 2023-12-22 17:11:41,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=692773.3333333334, ans=0.1 2023-12-22 17:11:46,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=692840.0, ans=0.125 2023-12-22 17:11:55,871 INFO [train.py:886] (1/4) Epoch 22, batch 3850, loss[loss=0.01574, audio_tagging_loss=0.01574, over 25000.00 frames. ], tot_loss[loss=0.01336, audio_tagging_loss=0.01336, over 4940155.71 frames. ], batch size: 100, lr: 4.89e-03, grad_scale: 64.0 2023-12-22 17:11:58,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=692906.6666666666, ans=0.125 2023-12-22 17:12:06,653 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.634e+01 3.023e+01 3.138e+01 3.280e+01 3.905e+01, threshold=6.276e+01, percent-clipped=0.0 2023-12-22 17:12:07,331 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.48 vs. limit=10.0 2023-12-22 17:12:14,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=692973.3333333334, ans=0.0 2023-12-22 17:12:21,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=693040.0, ans=0.1 2023-12-22 17:12:30,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=693106.6666666666, ans=0.0 2023-12-22 17:12:33,884 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.24 vs. limit=22.5 2023-12-22 17:12:47,263 INFO [train.py:886] (1/4) Epoch 22, batch 3900, loss[loss=0.0113, audio_tagging_loss=0.0113, over 24750.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4942834.84 frames. ], batch size: 99, lr: 4.89e-03, grad_scale: 64.0 2023-12-22 17:12:47,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=693240.0, ans=0.125 2023-12-22 17:12:57,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=693306.6666666666, ans=0.125 2023-12-22 17:13:06,314 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.36 vs. limit=15.0 2023-12-22 17:13:11,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=693373.3333333334, ans=0.125 2023-12-22 17:13:17,235 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.45 vs. limit=15.0 2023-12-22 17:13:22,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=693440.0, ans=0.0 2023-12-22 17:13:33,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=693506.6666666666, ans=10.0 2023-12-22 17:13:41,589 INFO [train.py:886] (1/4) Epoch 22, batch 3950, loss[loss=0.01588, audio_tagging_loss=0.01588, over 25000.00 frames. ], tot_loss[loss=0.01331, audio_tagging_loss=0.01331, over 4948134.41 frames. ], batch size: 100, lr: 4.89e-03, grad_scale: 64.0 2023-12-22 17:13:51,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=693640.0, ans=0.0 2023-12-22 17:13:51,726 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.685e+01 2.965e+01 3.079e+01 3.254e+01 4.090e+01, threshold=6.157e+01, percent-clipped=0.0 2023-12-22 17:13:52,362 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.15 vs. limit=15.0 2023-12-22 17:13:55,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=693640.0, ans=0.125 2023-12-22 17:13:56,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=693640.0, ans=0.2 2023-12-22 17:14:21,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2023-12-22 17:14:28,770 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 17:14:33,369 INFO [train.py:886] (1/4) Epoch 22, batch 4000, loss[loss=0.01415, audio_tagging_loss=0.01415, over 25000.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4950062.98 frames. ], batch size: 100, lr: 4.89e-03, grad_scale: 64.0 2023-12-22 17:14:39,207 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.58 vs. limit=15.0 2023-12-22 17:14:51,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=693973.3333333334, ans=0.125 2023-12-22 17:15:05,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=694106.6666666666, ans=0.1 2023-12-22 17:15:09,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=694106.6666666666, ans=0.125 2023-12-22 17:15:13,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=694106.6666666666, ans=0.1 2023-12-22 17:15:15,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=694173.3333333334, ans=0.125 2023-12-22 17:15:16,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=694173.3333333334, ans=0.125 2023-12-22 17:15:20,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=694173.3333333334, ans=0.125 2023-12-22 17:15:25,319 INFO [train.py:886] (1/4) Epoch 22, batch 4050, loss[loss=0.01387, audio_tagging_loss=0.01387, over 24750.00 frames. ], tot_loss[loss=0.01349, audio_tagging_loss=0.01349, over 4949347.84 frames. ], batch size: 99, lr: 4.89e-03, grad_scale: 64.0 2023-12-22 17:15:29,331 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.07 vs. limit=6.0 2023-12-22 17:15:37,179 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.720e+01 3.013e+01 3.150e+01 3.347e+01 3.751e+01, threshold=6.299e+01, percent-clipped=0.0 2023-12-22 17:15:39,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=694306.6666666666, ans=0.0 2023-12-22 17:15:49,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=694373.3333333334, ans=0.2 2023-12-22 17:15:53,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=694373.3333333334, ans=0.0 2023-12-22 17:15:59,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=694440.0, ans=0.035 2023-12-22 17:16:02,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=694440.0, ans=0.025 2023-12-22 17:16:05,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=694440.0, ans=0.125 2023-12-22 17:16:17,448 INFO [train.py:886] (1/4) Epoch 22, batch 4100, loss[loss=0.01326, audio_tagging_loss=0.01326, over 24750.00 frames. ], tot_loss[loss=0.01362, audio_tagging_loss=0.01362, over 4942508.32 frames. ], batch size: 99, lr: 4.89e-03, grad_scale: 64.0 2023-12-22 17:16:29,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=694640.0, ans=0.125 2023-12-22 17:16:35,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=694640.0, ans=0.125 2023-12-22 17:16:52,934 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.37 vs. limit=15.0 2023-12-22 17:17:10,086 INFO [train.py:886] (1/4) Epoch 22, batch 4150, loss[loss=0.01474, audio_tagging_loss=0.01474, over 25000.00 frames. ], tot_loss[loss=0.01359, audio_tagging_loss=0.01359, over 4943955.61 frames. ], batch size: 100, lr: 4.89e-03, grad_scale: 64.0 2023-12-22 17:17:12,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=694906.6666666666, ans=0.0 2023-12-22 17:17:13,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=694906.6666666666, ans=0.015 2023-12-22 17:17:15,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=694906.6666666666, ans=0.2 2023-12-22 17:17:21,197 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.699e+01 2.978e+01 3.136e+01 3.267e+01 3.850e+01, threshold=6.272e+01, percent-clipped=0.0 2023-12-22 17:17:21,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=694973.3333333334, ans=0.125 2023-12-22 17:17:25,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=694973.3333333334, ans=0.125 2023-12-22 17:17:25,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=694973.3333333334, ans=0.09899494936611666 2023-12-22 17:17:26,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=694973.3333333334, ans=0.05 2023-12-22 17:17:30,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=695040.0, ans=0.125 2023-12-22 17:17:36,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=695040.0, ans=0.0 2023-12-22 17:17:45,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.whiten.whitening_limit, batch_count=695106.6666666666, ans=12.0 2023-12-22 17:17:55,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=695173.3333333334, ans=0.07 2023-12-22 17:18:01,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=695240.0, ans=0.1 2023-12-22 17:18:01,995 INFO [train.py:886] (1/4) Epoch 22, batch 4200, loss[loss=0.01351, audio_tagging_loss=0.01351, over 24750.00 frames. ], tot_loss[loss=0.01352, audio_tagging_loss=0.01352, over 4943991.73 frames. ], batch size: 99, lr: 4.89e-03, grad_scale: 64.0 2023-12-22 17:18:06,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=695240.0, ans=0.125 2023-12-22 17:18:14,239 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.83 vs. limit=12.0 2023-12-22 17:18:27,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=695373.3333333334, ans=0.0 2023-12-22 17:18:45,578 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.03 vs. limit=15.0 2023-12-22 17:18:52,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=695506.6666666666, ans=0.125 2023-12-22 17:18:54,183 INFO [train.py:886] (1/4) Epoch 22, batch 4250, loss[loss=0.0139, audio_tagging_loss=0.0139, over 24750.00 frames. ], tot_loss[loss=0.01346, audio_tagging_loss=0.01346, over 4949524.48 frames. ], batch size: 99, lr: 4.89e-03, grad_scale: 64.0 2023-12-22 17:19:05,256 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.700e+01 2.977e+01 3.097e+01 3.231e+01 4.216e+01, threshold=6.193e+01, percent-clipped=0.0 2023-12-22 17:19:16,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=695706.6666666666, ans=0.04949747468305833 2023-12-22 17:19:19,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=695706.6666666666, ans=0.125 2023-12-22 17:19:42,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=695840.0, ans=0.0 2023-12-22 17:19:44,883 INFO [train.py:886] (1/4) Epoch 22, batch 4300, loss[loss=0.01462, audio_tagging_loss=0.01462, over 25000.00 frames. ], tot_loss[loss=0.01339, audio_tagging_loss=0.01339, over 4953430.18 frames. ], batch size: 100, lr: 4.88e-03, grad_scale: 64.0 2023-12-22 17:19:57,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=695973.3333333334, ans=0.0 2023-12-22 17:20:13,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=696040.0, ans=0.1 2023-12-22 17:20:14,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=696040.0, ans=0.1 2023-12-22 17:20:16,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=696106.6666666666, ans=0.125 2023-12-22 17:20:25,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=696106.6666666666, ans=0.2 2023-12-22 17:20:37,674 INFO [train.py:886] (1/4) Epoch 22, batch 4350, loss[loss=0.01381, audio_tagging_loss=0.01381, over 24750.00 frames. ], tot_loss[loss=0.01343, audio_tagging_loss=0.01343, over 4955990.84 frames. ], batch size: 99, lr: 4.88e-03, grad_scale: 64.0 2023-12-22 17:20:42,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=696240.0, ans=0.0 2023-12-22 17:20:48,814 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.645e+01 3.006e+01 3.144e+01 3.307e+01 3.616e+01, threshold=6.289e+01, percent-clipped=0.0 2023-12-22 17:20:56,029 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.80 vs. limit=15.0 2023-12-22 17:21:00,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=696373.3333333334, ans=0.0 2023-12-22 17:21:11,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=696440.0, ans=0.125 2023-12-22 17:21:12,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=696440.0, ans=0.1 2023-12-22 17:21:29,654 INFO [train.py:886] (1/4) Epoch 22, batch 4400, loss[loss=0.01275, audio_tagging_loss=0.01275, over 24750.00 frames. ], tot_loss[loss=0.01362, audio_tagging_loss=0.01362, over 4955844.77 frames. ], batch size: 99, lr: 4.88e-03, grad_scale: 64.0 2023-12-22 17:21:55,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=696706.6666666666, ans=0.125 2023-12-22 17:21:56,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=696706.6666666666, ans=0.0 2023-12-22 17:22:07,670 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.92 vs. limit=10.0 2023-12-22 17:22:07,717 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.57 vs. limit=15.0 2023-12-22 17:22:22,014 INFO [train.py:886] (1/4) Epoch 22, batch 4450, loss[loss=0.01471, audio_tagging_loss=0.01471, over 22084.00 frames. ], tot_loss[loss=0.01359, audio_tagging_loss=0.01359, over 4945463.92 frames. ], batch size: 107, lr: 4.88e-03, grad_scale: 64.0 2023-12-22 17:22:24,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=696906.6666666666, ans=0.125 2023-12-22 17:22:25,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=696906.6666666666, ans=0.05 2023-12-22 17:22:33,127 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.622e+01 3.004e+01 3.128e+01 3.261e+01 3.806e+01, threshold=6.255e+01, percent-clipped=0.0 2023-12-22 17:22:36,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=696973.3333333334, ans=0.0 2023-12-22 17:22:44,753 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.76 vs. limit=22.5 2023-12-22 17:23:03,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=697173.3333333334, ans=0.04949747468305833 2023-12-22 17:23:13,869 INFO [train.py:886] (1/4) Epoch 22, batch 4500, loss[loss=0.01537, audio_tagging_loss=0.01537, over 24750.00 frames. ], tot_loss[loss=0.01356, audio_tagging_loss=0.01356, over 4945983.09 frames. ], batch size: 99, lr: 4.88e-03, grad_scale: 64.0 2023-12-22 17:23:18,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=697240.0, ans=0.2 2023-12-22 17:23:24,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=697306.6666666666, ans=0.125 2023-12-22 17:23:35,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=697373.3333333334, ans=0.125 2023-12-22 17:23:37,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=697373.3333333334, ans=10.0 2023-12-22 17:24:00,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=697506.6666666666, ans=0.1 2023-12-22 17:24:04,995 INFO [train.py:886] (1/4) Epoch 22, batch 4550, loss[loss=0.01326, audio_tagging_loss=0.01326, over 25000.00 frames. ], tot_loss[loss=0.01352, audio_tagging_loss=0.01352, over 4945670.45 frames. ], batch size: 100, lr: 4.88e-03, grad_scale: 64.0 2023-12-22 17:24:09,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=697573.3333333334, ans=0.125 2023-12-22 17:24:14,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=697573.3333333334, ans=0.1 2023-12-22 17:24:17,628 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.664e+01 2.927e+01 3.030e+01 3.234e+01 3.642e+01, threshold=6.060e+01, percent-clipped=0.0 2023-12-22 17:24:18,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=697640.0, ans=0.125 2023-12-22 17:24:19,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=697640.0, ans=0.95 2023-12-22 17:24:36,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=697773.3333333334, ans=0.125 2023-12-22 17:24:58,057 INFO [train.py:886] (1/4) Epoch 22, batch 4600, loss[loss=0.01653, audio_tagging_loss=0.01653, over 25000.00 frames. ], tot_loss[loss=0.01345, audio_tagging_loss=0.01345, over 4946323.01 frames. ], batch size: 100, lr: 4.88e-03, grad_scale: 64.0 2023-12-22 17:25:27,040 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.08 vs. limit=6.0 2023-12-22 17:25:34,643 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.97 vs. limit=10.0 2023-12-22 17:25:43,577 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 17:25:49,195 INFO [train.py:886] (1/4) Epoch 22, batch 4650, loss[loss=0.01292, audio_tagging_loss=0.01292, over 25000.00 frames. ], tot_loss[loss=0.01346, audio_tagging_loss=0.01346, over 4951808.34 frames. ], batch size: 100, lr: 4.88e-03, grad_scale: 32.0 2023-12-22 17:26:00,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=698306.6666666666, ans=0.125 2023-12-22 17:26:01,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=698306.6666666666, ans=0.025 2023-12-22 17:26:02,028 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.620e+01 2.976e+01 3.135e+01 3.289e+01 3.676e+01, threshold=6.270e+01, percent-clipped=0.0 2023-12-22 17:26:23,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=698440.0, ans=0.0 2023-12-22 17:26:25,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=698440.0, ans=0.0 2023-12-22 17:26:27,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=698440.0, ans=0.0 2023-12-22 17:26:32,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=698506.6666666666, ans=0.0 2023-12-22 17:26:34,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=698506.6666666666, ans=0.125 2023-12-22 17:26:36,600 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.63 vs. limit=22.5 2023-12-22 17:26:41,304 INFO [train.py:886] (1/4) Epoch 22, batch 4700, loss[loss=0.01426, audio_tagging_loss=0.01426, over 24750.00 frames. ], tot_loss[loss=0.01352, audio_tagging_loss=0.01352, over 4955336.07 frames. ], batch size: 99, lr: 4.88e-03, grad_scale: 32.0 2023-12-22 17:26:59,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=698706.6666666666, ans=0.2 2023-12-22 17:27:08,113 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.41 vs. limit=22.5 2023-12-22 17:27:25,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=698840.0, ans=0.125 2023-12-22 17:27:28,050 INFO [train.py:886] (1/4) Epoch 22, batch 4750, loss[loss=0.01419, audio_tagging_loss=0.01419, over 24750.00 frames. ], tot_loss[loss=0.01363, audio_tagging_loss=0.01363, over 4951640.80 frames. ], batch size: 99, lr: 4.87e-03, grad_scale: 32.0 2023-12-22 17:27:29,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=698906.6666666666, ans=0.125 2023-12-22 17:27:36,483 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.33 vs. limit=15.0 2023-12-22 17:27:38,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=698973.3333333334, ans=0.125 2023-12-22 17:27:38,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.23 vs. limit=15.0 2023-12-22 17:27:39,151 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.691e+01 3.016e+01 3.140e+01 3.268e+01 3.852e+01, threshold=6.281e+01, percent-clipped=0.0 2023-12-22 17:27:41,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=698973.3333333334, ans=0.1 2023-12-22 17:28:02,098 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.71 vs. limit=15.0 2023-12-22 17:28:02,424 INFO [train.py:886] (1/4) Epoch 23, batch 0, loss[loss=0.03227, audio_tagging_loss=0.03227, over 21481.00 frames. ], tot_loss[loss=0.03227, audio_tagging_loss=0.03227, over 21481.00 frames. ], batch size: 107, lr: 4.76e-03, grad_scale: 32.0 2023-12-22 17:28:02,424 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 17:28:23,554 INFO [train.py:917] (1/4) Epoch 23, validation: loss=0.03207, audio_tagging_loss=0.03207, over 3737520.00 frames. 2023-12-22 17:28:23,555 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 17:28:25,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=699013.3333333334, ans=0.1 2023-12-22 17:28:26,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=699013.3333333334, ans=0.125 2023-12-22 17:28:39,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=699080.0, ans=0.125 2023-12-22 17:28:51,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=699146.6666666666, ans=0.035 2023-12-22 17:28:56,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=699213.3333333334, ans=0.1 2023-12-22 17:29:14,253 INFO [train.py:886] (1/4) Epoch 23, batch 50, loss[loss=0.01983, audio_tagging_loss=0.01983, over 25000.00 frames. ], tot_loss[loss=0.0208, audio_tagging_loss=0.0208, over 1108597.70 frames. ], batch size: 100, lr: 4.76e-03, grad_scale: 32.0 2023-12-22 17:29:21,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=699346.6666666666, ans=0.0 2023-12-22 17:29:29,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=699413.3333333334, ans=0.125 2023-12-22 17:29:31,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=699413.3333333334, ans=0.0 2023-12-22 17:29:38,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=699480.0, ans=0.0 2023-12-22 17:29:51,729 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=15.0 2023-12-22 17:29:55,984 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 17:30:02,796 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.034e+01 3.567e+01 3.829e+01 4.360e+01 9.695e+01, threshold=7.658e+01, percent-clipped=7.0 2023-12-22 17:30:07,313 INFO [train.py:886] (1/4) Epoch 23, batch 100, loss[loss=0.01256, audio_tagging_loss=0.01256, over 25000.00 frames. ], tot_loss[loss=0.01789, audio_tagging_loss=0.01789, over 1968198.57 frames. ], batch size: 100, lr: 4.76e-03, grad_scale: 32.0 2023-12-22 17:30:09,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=699680.0, ans=0.05 2023-12-22 17:30:16,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=699746.6666666666, ans=0.0 2023-12-22 17:30:16,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=699746.6666666666, ans=15.0 2023-12-22 17:30:36,400 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2023-12-22 17:30:51,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=699946.6666666666, ans=0.125 2023-12-22 17:30:54,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=699946.6666666666, ans=0.0 2023-12-22 17:30:57,495 INFO [train.py:886] (1/4) Epoch 23, batch 150, loss[loss=0.01489, audio_tagging_loss=0.01489, over 25000.00 frames. ], tot_loss[loss=0.0164, audio_tagging_loss=0.0164, over 2632139.77 frames. ], batch size: 100, lr: 4.76e-03, grad_scale: 32.0 2023-12-22 17:31:04,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=700013.3333333334, ans=0.125 2023-12-22 17:31:06,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=700013.3333333334, ans=10.0 2023-12-22 17:31:10,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=700080.0, ans=0.1 2023-12-22 17:31:15,939 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.47 vs. limit=22.5 2023-12-22 17:31:17,601 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.55 vs. limit=15.0 2023-12-22 17:31:27,134 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.97 vs. limit=22.5 2023-12-22 17:31:35,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=700213.3333333334, ans=0.125 2023-12-22 17:31:36,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=700213.3333333334, ans=0.125 2023-12-22 17:31:45,965 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.729e+01 3.066e+01 3.193e+01 3.321e+01 3.839e+01, threshold=6.387e+01, percent-clipped=0.0 2023-12-22 17:31:47,615 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.85 vs. limit=22.5 2023-12-22 17:31:50,607 INFO [train.py:886] (1/4) Epoch 23, batch 200, loss[loss=0.01558, audio_tagging_loss=0.01558, over 25000.00 frames. ], tot_loss[loss=0.0156, audio_tagging_loss=0.0156, over 3152561.48 frames. ], batch size: 100, lr: 4.76e-03, grad_scale: 32.0 2023-12-22 17:31:52,014 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.32 vs. limit=15.0 2023-12-22 17:31:57,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=700346.6666666666, ans=0.015 2023-12-22 17:31:58,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=700346.6666666666, ans=0.025 2023-12-22 17:32:08,732 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 17:32:34,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=700613.3333333334, ans=0.125 2023-12-22 17:32:40,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=700613.3333333334, ans=0.1 2023-12-22 17:32:41,866 INFO [train.py:886] (1/4) Epoch 23, batch 250, loss[loss=0.01041, audio_tagging_loss=0.01041, over 23088.00 frames. ], tot_loss[loss=0.01503, audio_tagging_loss=0.01503, over 3549543.21 frames. ], batch size: 107, lr: 4.76e-03, grad_scale: 32.0 2023-12-22 17:32:50,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=700680.0, ans=0.125 2023-12-22 17:32:53,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=700746.6666666666, ans=0.2 2023-12-22 17:32:58,521 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.37 vs. limit=22.5 2023-12-22 17:33:00,146 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 17:33:02,486 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.65 vs. limit=15.0 2023-12-22 17:33:12,701 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2023-12-22 17:33:23,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=700946.6666666666, ans=0.125 2023-12-22 17:33:25,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=700946.6666666666, ans=0.1 2023-12-22 17:33:30,257 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.814e+01 3.034e+01 3.166e+01 3.369e+01 3.955e+01, threshold=6.333e+01, percent-clipped=0.0 2023-12-22 17:33:31,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=700946.6666666666, ans=0.1 2023-12-22 17:33:34,010 INFO [train.py:886] (1/4) Epoch 23, batch 300, loss[loss=0.01268, audio_tagging_loss=0.01268, over 24750.00 frames. ], tot_loss[loss=0.01459, audio_tagging_loss=0.01459, over 3855510.94 frames. ], batch size: 99, lr: 4.76e-03, grad_scale: 32.0 2023-12-22 17:33:44,394 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.80 vs. limit=22.5 2023-12-22 17:33:58,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=701146.6666666666, ans=0.2 2023-12-22 17:34:03,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.06 vs. limit=15.0 2023-12-22 17:34:10,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=701213.3333333334, ans=0.2 2023-12-22 17:34:12,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=701213.3333333334, ans=0.1 2023-12-22 17:34:16,864 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.28 vs. limit=10.0 2023-12-22 17:34:23,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=701280.0, ans=0.125 2023-12-22 17:34:25,915 INFO [train.py:886] (1/4) Epoch 23, batch 350, loss[loss=0.01132, audio_tagging_loss=0.01132, over 24750.00 frames. ], tot_loss[loss=0.01438, audio_tagging_loss=0.01438, over 4091298.39 frames. ], batch size: 99, lr: 4.76e-03, grad_scale: 32.0 2023-12-22 17:34:29,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=701346.6666666666, ans=0.125 2023-12-22 17:34:42,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=701413.3333333334, ans=0.0 2023-12-22 17:34:59,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=701546.6666666666, ans=0.125 2023-12-22 17:35:08,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=701613.3333333334, ans=0.2 2023-12-22 17:35:13,066 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.681e+01 2.991e+01 3.090e+01 3.268e+01 3.987e+01, threshold=6.180e+01, percent-clipped=0.0 2023-12-22 17:35:16,903 INFO [train.py:886] (1/4) Epoch 23, batch 400, loss[loss=0.01573, audio_tagging_loss=0.01573, over 24750.00 frames. ], tot_loss[loss=0.01412, audio_tagging_loss=0.01412, over 4275104.83 frames. ], batch size: 99, lr: 4.76e-03, grad_scale: 32.0 2023-12-22 17:35:30,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=701746.6666666666, ans=0.0 2023-12-22 17:35:32,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=701746.6666666666, ans=0.2 2023-12-22 17:35:45,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=701813.3333333334, ans=0.0 2023-12-22 17:36:09,367 INFO [train.py:886] (1/4) Epoch 23, batch 450, loss[loss=0.01469, audio_tagging_loss=0.01469, over 24927.00 frames. ], tot_loss[loss=0.01388, audio_tagging_loss=0.01388, over 4422581.97 frames. ], batch size: 100, lr: 4.75e-03, grad_scale: 32.0 2023-12-22 17:36:16,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=702013.3333333334, ans=0.1 2023-12-22 17:36:20,136 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.98 vs. limit=22.5 2023-12-22 17:36:38,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=702146.6666666666, ans=0.09899494936611666 2023-12-22 17:36:51,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=702280.0, ans=0.2 2023-12-22 17:36:57,227 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.676e+01 2.894e+01 3.036e+01 3.209e+01 3.784e+01, threshold=6.073e+01, percent-clipped=0.0 2023-12-22 17:36:58,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=702280.0, ans=0.125 2023-12-22 17:37:02,434 INFO [train.py:886] (1/4) Epoch 23, batch 500, loss[loss=0.0149, audio_tagging_loss=0.0149, over 25000.00 frames. ], tot_loss[loss=0.01363, audio_tagging_loss=0.01363, over 4540901.15 frames. ], batch size: 100, lr: 4.75e-03, grad_scale: 32.0 2023-12-22 17:37:17,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.07 vs. limit=15.0 2023-12-22 17:37:28,981 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.20 vs. limit=15.0 2023-12-22 17:37:29,201 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.49 vs. limit=15.0 2023-12-22 17:37:46,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=702613.3333333334, ans=0.1 2023-12-22 17:37:53,967 INFO [train.py:886] (1/4) Epoch 23, batch 550, loss[loss=0.01729, audio_tagging_loss=0.01729, over 25000.00 frames. ], tot_loss[loss=0.01365, audio_tagging_loss=0.01365, over 4638231.58 frames. ], batch size: 100, lr: 4.75e-03, grad_scale: 32.0 2023-12-22 17:37:55,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=702680.0, ans=0.0 2023-12-22 17:38:18,469 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.02 vs. limit=15.0 2023-12-22 17:38:22,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=702813.3333333334, ans=0.125 2023-12-22 17:38:28,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=702880.0, ans=0.0 2023-12-22 17:38:28,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=702880.0, ans=0.0 2023-12-22 17:38:32,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=702880.0, ans=0.125 2023-12-22 17:38:41,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=702946.6666666666, ans=0.0 2023-12-22 17:38:42,269 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.710e+01 2.974e+01 3.097e+01 3.242e+01 4.856e+01, threshold=6.195e+01, percent-clipped=0.0 2023-12-22 17:38:43,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=702946.6666666666, ans=0.125 2023-12-22 17:38:46,282 INFO [train.py:886] (1/4) Epoch 23, batch 600, loss[loss=0.01205, audio_tagging_loss=0.01205, over 24750.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 4712359.22 frames. ], batch size: 99, lr: 4.75e-03, grad_scale: 32.0 2023-12-22 17:38:56,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=703080.0, ans=0.125 2023-12-22 17:38:59,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=703080.0, ans=0.125 2023-12-22 17:38:59,756 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.86 vs. limit=15.0 2023-12-22 17:39:10,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=703146.6666666666, ans=0.05 2023-12-22 17:39:16,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=703146.6666666666, ans=0.1 2023-12-22 17:39:19,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=703213.3333333334, ans=0.0 2023-12-22 17:39:31,832 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 17:39:38,016 INFO [train.py:886] (1/4) Epoch 23, batch 650, loss[loss=0.01537, audio_tagging_loss=0.01537, over 24750.00 frames. ], tot_loss[loss=0.0137, audio_tagging_loss=0.0137, over 4761953.46 frames. ], batch size: 99, lr: 4.75e-03, grad_scale: 32.0 2023-12-22 17:39:40,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=703346.6666666666, ans=0.1 2023-12-22 17:39:43,538 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.58 vs. limit=15.0 2023-12-22 17:39:45,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=703346.6666666666, ans=0.2 2023-12-22 17:40:09,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=703546.6666666666, ans=0.0 2023-12-22 17:40:25,970 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.701e+01 3.073e+01 3.203e+01 3.360e+01 3.704e+01, threshold=6.407e+01, percent-clipped=0.0 2023-12-22 17:40:29,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=703680.0, ans=0.0 2023-12-22 17:40:29,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=703680.0, ans=0.125 2023-12-22 17:40:29,832 INFO [train.py:886] (1/4) Epoch 23, batch 700, loss[loss=0.01333, audio_tagging_loss=0.01333, over 24750.00 frames. ], tot_loss[loss=0.0136, audio_tagging_loss=0.0136, over 4799394.24 frames. ], batch size: 99, lr: 4.75e-03, grad_scale: 32.0 2023-12-22 17:40:48,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=703746.6666666666, ans=0.125 2023-12-22 17:40:49,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=703746.6666666666, ans=0.125 2023-12-22 17:41:05,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=703880.0, ans=0.125 2023-12-22 17:41:17,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=703946.6666666666, ans=0.2 2023-12-22 17:41:23,276 INFO [train.py:886] (1/4) Epoch 23, batch 750, loss[loss=0.01199, audio_tagging_loss=0.01199, over 25000.00 frames. ], tot_loss[loss=0.01344, audio_tagging_loss=0.01344, over 4832412.46 frames. ], batch size: 100, lr: 4.75e-03, grad_scale: 32.0 2023-12-22 17:41:36,664 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.86 vs. limit=15.0 2023-12-22 17:41:56,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=704213.3333333334, ans=0.2 2023-12-22 17:42:00,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=704213.3333333334, ans=0.0 2023-12-22 17:42:00,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=704213.3333333334, ans=0.0 2023-12-22 17:42:10,095 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.669e+01 2.982e+01 3.107e+01 3.204e+01 3.694e+01, threshold=6.214e+01, percent-clipped=0.0 2023-12-22 17:42:13,956 INFO [train.py:886] (1/4) Epoch 23, batch 800, loss[loss=0.01299, audio_tagging_loss=0.01299, over 25000.00 frames. ], tot_loss[loss=0.01348, audio_tagging_loss=0.01348, over 4863938.81 frames. ], batch size: 100, lr: 4.75e-03, grad_scale: 32.0 2023-12-22 17:42:26,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=704413.3333333334, ans=0.125 2023-12-22 17:42:32,431 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.60 vs. limit=22.5 2023-12-22 17:42:39,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=704480.0, ans=0.0 2023-12-22 17:42:39,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=704480.0, ans=0.125 2023-12-22 17:43:06,734 INFO [train.py:886] (1/4) Epoch 23, batch 850, loss[loss=0.0133, audio_tagging_loss=0.0133, over 25000.00 frames. ], tot_loss[loss=0.01345, audio_tagging_loss=0.01345, over 4890563.29 frames. ], batch size: 100, lr: 4.75e-03, grad_scale: 32.0 2023-12-22 17:43:14,051 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.51 vs. limit=15.0 2023-12-22 17:43:16,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=704746.6666666666, ans=0.125 2023-12-22 17:43:17,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=704746.6666666666, ans=0.125 2023-12-22 17:43:23,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=704746.6666666666, ans=0.1 2023-12-22 17:43:27,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=704813.3333333334, ans=0.125 2023-12-22 17:43:29,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=704813.3333333334, ans=0.0 2023-12-22 17:43:33,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=704813.3333333334, ans=0.0 2023-12-22 17:43:52,803 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.669e+01 2.999e+01 3.165e+01 3.314e+01 4.054e+01, threshold=6.329e+01, percent-clipped=0.0 2023-12-22 17:43:58,123 INFO [train.py:886] (1/4) Epoch 23, batch 900, loss[loss=0.01433, audio_tagging_loss=0.01433, over 25000.00 frames. ], tot_loss[loss=0.01352, audio_tagging_loss=0.01352, over 4908481.77 frames. ], batch size: 100, lr: 4.74e-03, grad_scale: 32.0 2023-12-22 17:44:19,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=705146.6666666666, ans=10.0 2023-12-22 17:44:49,883 INFO [train.py:886] (1/4) Epoch 23, batch 950, loss[loss=0.01343, audio_tagging_loss=0.01343, over 24750.00 frames. ], tot_loss[loss=0.01352, audio_tagging_loss=0.01352, over 4913612.36 frames. ], batch size: 99, lr: 4.74e-03, grad_scale: 32.0 2023-12-22 17:44:53,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=705346.6666666666, ans=0.2 2023-12-22 17:44:55,855 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.371e-02 2023-12-22 17:44:59,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=705413.3333333334, ans=0.0 2023-12-22 17:45:03,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=705413.3333333334, ans=0.125 2023-12-22 17:45:13,273 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.47 vs. limit=15.0 2023-12-22 17:45:18,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=705480.0, ans=0.125 2023-12-22 17:45:38,082 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.739e+01 3.002e+01 3.153e+01 3.252e+01 3.769e+01, threshold=6.307e+01, percent-clipped=0.0 2023-12-22 17:45:41,957 INFO [train.py:886] (1/4) Epoch 23, batch 1000, loss[loss=0.01322, audio_tagging_loss=0.01322, over 25000.00 frames. ], tot_loss[loss=0.01355, audio_tagging_loss=0.01355, over 4920465.81 frames. ], batch size: 100, lr: 4.74e-03, grad_scale: 32.0 2023-12-22 17:45:55,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=705746.6666666666, ans=0.2 2023-12-22 17:45:55,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=705746.6666666666, ans=0.0 2023-12-22 17:46:32,418 INFO [train.py:886] (1/4) Epoch 23, batch 1050, loss[loss=0.01324, audio_tagging_loss=0.01324, over 25000.00 frames. ], tot_loss[loss=0.0134, audio_tagging_loss=0.0134, over 4922421.02 frames. ], batch size: 100, lr: 4.74e-03, grad_scale: 32.0 2023-12-22 17:46:36,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=706013.3333333334, ans=0.2 2023-12-22 17:46:40,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=706013.3333333334, ans=0.125 2023-12-22 17:46:56,982 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=12.0 2023-12-22 17:47:22,005 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.686e+01 2.936e+01 3.111e+01 3.242e+01 3.902e+01, threshold=6.222e+01, percent-clipped=0.0 2023-12-22 17:47:23,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=706280.0, ans=0.125 2023-12-22 17:47:25,858 INFO [train.py:886] (1/4) Epoch 23, batch 1100, loss[loss=0.01063, audio_tagging_loss=0.01063, over 25000.00 frames. ], tot_loss[loss=0.01335, audio_tagging_loss=0.01335, over 4927308.90 frames. ], batch size: 100, lr: 4.74e-03, grad_scale: 32.0 2023-12-22 17:47:29,041 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.32 vs. limit=12.0 2023-12-22 17:47:54,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=706480.0, ans=0.0 2023-12-22 17:47:57,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=706546.6666666666, ans=0.0 2023-12-22 17:48:02,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=706546.6666666666, ans=15.0 2023-12-22 17:48:07,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=706613.3333333334, ans=0.125 2023-12-22 17:48:08,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=706613.3333333334, ans=0.125 2023-12-22 17:48:10,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=706613.3333333334, ans=0.015 2023-12-22 17:48:18,185 INFO [train.py:886] (1/4) Epoch 23, batch 1150, loss[loss=0.01125, audio_tagging_loss=0.01125, over 21939.00 frames. ], tot_loss[loss=0.01332, audio_tagging_loss=0.01332, over 4928485.62 frames. ], batch size: 107, lr: 4.74e-03, grad_scale: 32.0 2023-12-22 17:48:22,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=706680.0, ans=0.0 2023-12-22 17:48:27,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=706746.6666666666, ans=0.1 2023-12-22 17:48:32,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=706746.6666666666, ans=0.125 2023-12-22 17:48:51,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=706880.0, ans=0.1 2023-12-22 17:49:05,124 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.619e+01 2.995e+01 3.117e+01 3.266e+01 4.017e+01, threshold=6.234e+01, percent-clipped=0.0 2023-12-22 17:49:06,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=706946.6666666666, ans=0.0 2023-12-22 17:49:08,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=707013.3333333334, ans=0.125 2023-12-22 17:49:08,951 INFO [train.py:886] (1/4) Epoch 23, batch 1200, loss[loss=0.01339, audio_tagging_loss=0.01339, over 25000.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4938514.40 frames. ], batch size: 100, lr: 4.74e-03, grad_scale: 32.0 2023-12-22 17:49:10,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=707013.3333333334, ans=0.125 2023-12-22 17:49:11,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=707013.3333333334, ans=0.125 2023-12-22 17:49:16,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.03 vs. limit=15.0 2023-12-22 17:49:26,752 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.55 vs. limit=22.5 2023-12-22 17:49:29,268 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.85 vs. limit=15.0 2023-12-22 17:49:49,050 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.83 vs. limit=10.0 2023-12-22 17:50:01,238 INFO [train.py:886] (1/4) Epoch 23, batch 1250, loss[loss=0.0143, audio_tagging_loss=0.0143, over 24750.00 frames. ], tot_loss[loss=0.01343, audio_tagging_loss=0.01343, over 4942490.63 frames. ], batch size: 99, lr: 4.74e-03, grad_scale: 32.0 2023-12-22 17:50:03,753 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.64 vs. limit=15.0 2023-12-22 17:50:12,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=707413.3333333334, ans=0.125 2023-12-22 17:50:25,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=707480.0, ans=0.05 2023-12-22 17:50:46,990 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.599e+01 3.099e+01 3.181e+01 3.380e+01 4.641e+01, threshold=6.362e+01, percent-clipped=0.0 2023-12-22 17:50:51,543 INFO [train.py:886] (1/4) Epoch 23, batch 1300, loss[loss=0.0123, audio_tagging_loss=0.0123, over 25000.00 frames. ], tot_loss[loss=0.01353, audio_tagging_loss=0.01353, over 4941193.69 frames. ], batch size: 100, lr: 4.74e-03, grad_scale: 32.0 2023-12-22 17:50:51,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=707680.0, ans=0.1 2023-12-22 17:51:02,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=707746.6666666666, ans=0.125 2023-12-22 17:51:03,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=707746.6666666666, ans=0.125 2023-12-22 17:51:05,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=707746.6666666666, ans=0.0 2023-12-22 17:51:06,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=707746.6666666666, ans=0.95 2023-12-22 17:51:43,551 INFO [train.py:886] (1/4) Epoch 23, batch 1350, loss[loss=0.01236, audio_tagging_loss=0.01236, over 25000.00 frames. ], tot_loss[loss=0.0135, audio_tagging_loss=0.0135, over 4945864.80 frames. ], batch size: 100, lr: 4.73e-03, grad_scale: 32.0 2023-12-22 17:51:53,431 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.98 vs. limit=15.0 2023-12-22 17:52:04,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=708146.6666666666, ans=10.0 2023-12-22 17:52:12,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=708146.6666666666, ans=0.125 2023-12-22 17:52:22,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=708280.0, ans=0.2 2023-12-22 17:52:31,180 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.596e+01 2.956e+01 3.060e+01 3.186e+01 3.861e+01, threshold=6.119e+01, percent-clipped=0.0 2023-12-22 17:52:32,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=708280.0, ans=0.125 2023-12-22 17:52:34,913 INFO [train.py:886] (1/4) Epoch 23, batch 1400, loss[loss=0.01515, audio_tagging_loss=0.01515, over 25000.00 frames. ], tot_loss[loss=0.01342, audio_tagging_loss=0.01342, over 4952373.03 frames. ], batch size: 100, lr: 4.73e-03, grad_scale: 32.0 2023-12-22 17:53:06,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=708546.6666666666, ans=0.125 2023-12-22 17:53:09,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=708546.6666666666, ans=0.2 2023-12-22 17:53:17,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=708613.3333333334, ans=0.0 2023-12-22 17:53:25,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=708680.0, ans=0.1 2023-12-22 17:53:25,967 INFO [train.py:886] (1/4) Epoch 23, batch 1450, loss[loss=0.01171, audio_tagging_loss=0.01171, over 25000.00 frames. ], tot_loss[loss=0.01336, audio_tagging_loss=0.01336, over 4953699.31 frames. ], batch size: 100, lr: 4.73e-03, grad_scale: 32.0 2023-12-22 17:53:56,158 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.04 vs. limit=15.0 2023-12-22 17:54:13,000 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.727e+01 3.011e+01 3.145e+01 3.302e+01 3.909e+01, threshold=6.290e+01, percent-clipped=0.0 2023-12-22 17:54:16,877 INFO [train.py:886] (1/4) Epoch 23, batch 1500, loss[loss=0.01267, audio_tagging_loss=0.01267, over 25000.00 frames. ], tot_loss[loss=0.01337, audio_tagging_loss=0.01337, over 4956036.81 frames. ], batch size: 100, lr: 4.73e-03, grad_scale: 32.0 2023-12-22 17:54:19,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.74 vs. limit=6.0 2023-12-22 17:54:20,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=709013.3333333334, ans=0.125 2023-12-22 17:54:39,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=709146.6666666666, ans=0.125 2023-12-22 17:54:40,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=709146.6666666666, ans=0.125 2023-12-22 17:54:57,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=709280.0, ans=0.2 2023-12-22 17:55:02,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=709280.0, ans=0.1 2023-12-22 17:55:08,680 INFO [train.py:886] (1/4) Epoch 23, batch 1550, loss[loss=0.0164, audio_tagging_loss=0.0164, over 24750.00 frames. ], tot_loss[loss=0.01344, audio_tagging_loss=0.01344, over 4953228.42 frames. ], batch size: 99, lr: 4.73e-03, grad_scale: 32.0 2023-12-22 17:55:13,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=709346.6666666666, ans=0.0 2023-12-22 17:55:31,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=709480.0, ans=0.125 2023-12-22 17:55:47,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=709546.6666666666, ans=0.125 2023-12-22 17:55:50,766 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.33 vs. limit=22.5 2023-12-22 17:55:54,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=709613.3333333334, ans=0.0 2023-12-22 17:55:55,133 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.654e+01 3.064e+01 3.162e+01 3.305e+01 3.702e+01, threshold=6.324e+01, percent-clipped=0.0 2023-12-22 17:55:56,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=709613.3333333334, ans=0.0 2023-12-22 17:55:59,696 INFO [train.py:886] (1/4) Epoch 23, batch 1600, loss[loss=0.009235, audio_tagging_loss=0.009235, over 24750.00 frames. ], tot_loss[loss=0.01347, audio_tagging_loss=0.01347, over 4951964.30 frames. ], batch size: 99, lr: 4.73e-03, grad_scale: 32.0 2023-12-22 17:56:32,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=709880.0, ans=0.1 2023-12-22 17:56:36,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=709880.0, ans=0.125 2023-12-22 17:56:37,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=709880.0, ans=0.125 2023-12-22 17:56:47,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=709946.6666666666, ans=0.125 2023-12-22 17:56:48,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=709946.6666666666, ans=0.125 2023-12-22 17:56:49,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.80 vs. limit=6.0 2023-12-22 17:56:51,517 INFO [train.py:886] (1/4) Epoch 23, batch 1650, loss[loss=0.01338, audio_tagging_loss=0.01338, over 25000.00 frames. ], tot_loss[loss=0.01345, audio_tagging_loss=0.01345, over 4951314.63 frames. ], batch size: 100, lr: 4.73e-03, grad_scale: 32.0 2023-12-22 17:57:12,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=710146.6666666666, ans=0.2 2023-12-22 17:57:30,025 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.47 vs. limit=15.0 2023-12-22 17:57:38,706 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.643e+01 2.998e+01 3.094e+01 3.262e+01 4.064e+01, threshold=6.189e+01, percent-clipped=0.0 2023-12-22 17:57:40,059 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.95 vs. limit=15.0 2023-12-22 17:57:43,156 INFO [train.py:886] (1/4) Epoch 23, batch 1700, loss[loss=0.01235, audio_tagging_loss=0.01235, over 24750.00 frames. ], tot_loss[loss=0.01334, audio_tagging_loss=0.01334, over 4948177.81 frames. ], batch size: 99, lr: 4.73e-03, grad_scale: 32.0 2023-12-22 17:58:31,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=710613.3333333334, ans=0.2 2023-12-22 17:58:35,034 INFO [train.py:886] (1/4) Epoch 23, batch 1750, loss[loss=0.01445, audio_tagging_loss=0.01445, over 20902.00 frames. ], tot_loss[loss=0.01324, audio_tagging_loss=0.01324, over 4941299.56 frames. ], batch size: 107, lr: 4.73e-03, grad_scale: 32.0 2023-12-22 17:58:46,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=710746.6666666666, ans=0.125 2023-12-22 17:58:54,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=710746.6666666666, ans=0.0 2023-12-22 17:58:54,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=710746.6666666666, ans=0.125 2023-12-22 17:59:06,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=710880.0, ans=0.1 2023-12-22 17:59:08,899 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=15.0 2023-12-22 17:59:09,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=710880.0, ans=0.125 2023-12-22 17:59:22,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=710946.6666666666, ans=0.1 2023-12-22 17:59:23,062 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.669e+01 3.006e+01 3.113e+01 3.269e+01 3.526e+01, threshold=6.226e+01, percent-clipped=0.0 2023-12-22 17:59:23,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=710946.6666666666, ans=0.2 2023-12-22 17:59:25,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=710946.6666666666, ans=0.0 2023-12-22 17:59:28,209 INFO [train.py:886] (1/4) Epoch 23, batch 1800, loss[loss=0.01379, audio_tagging_loss=0.01379, over 25000.00 frames. ], tot_loss[loss=0.0132, audio_tagging_loss=0.0132, over 4946313.09 frames. ], batch size: 100, lr: 4.72e-03, grad_scale: 32.0 2023-12-22 17:59:46,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=711146.6666666666, ans=0.125 2023-12-22 18:00:05,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=711213.3333333334, ans=0.1 2023-12-22 18:00:07,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=711213.3333333334, ans=0.05 2023-12-22 18:00:13,096 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.08 vs. limit=15.0 2023-12-22 18:00:18,270 INFO [train.py:886] (1/4) Epoch 23, batch 1850, loss[loss=0.01292, audio_tagging_loss=0.01292, over 24750.00 frames. ], tot_loss[loss=0.01329, audio_tagging_loss=0.01329, over 4950865.14 frames. ], batch size: 99, lr: 4.72e-03, grad_scale: 32.0 2023-12-22 18:00:38,137 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.27 vs. limit=15.0 2023-12-22 18:00:44,744 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.76 vs. limit=15.0 2023-12-22 18:01:02,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=711613.3333333334, ans=0.125 2023-12-22 18:01:05,840 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.716e+01 3.028e+01 3.200e+01 3.336e+01 4.130e+01, threshold=6.400e+01, percent-clipped=0.0 2023-12-22 18:01:06,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=711613.3333333334, ans=0.125 2023-12-22 18:01:06,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=711613.3333333334, ans=0.125 2023-12-22 18:01:09,756 INFO [train.py:886] (1/4) Epoch 23, batch 1900, loss[loss=0.01355, audio_tagging_loss=0.01355, over 24750.00 frames. ], tot_loss[loss=0.01334, audio_tagging_loss=0.01334, over 4948629.90 frames. ], batch size: 99, lr: 4.72e-03, grad_scale: 64.0 2023-12-22 18:01:13,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=711680.0, ans=0.125 2023-12-22 18:01:19,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=711746.6666666666, ans=0.0 2023-12-22 18:01:30,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=711813.3333333334, ans=0.125 2023-12-22 18:01:40,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=711880.0, ans=0.0 2023-12-22 18:02:00,781 INFO [train.py:886] (1/4) Epoch 23, batch 1950, loss[loss=0.01222, audio_tagging_loss=0.01222, over 24750.00 frames. ], tot_loss[loss=0.01332, audio_tagging_loss=0.01332, over 4942884.33 frames. ], batch size: 99, lr: 4.72e-03, grad_scale: 64.0 2023-12-22 18:02:22,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=712146.6666666666, ans=0.125 2023-12-22 18:02:33,649 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:02:43,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=712280.0, ans=0.0 2023-12-22 18:02:45,746 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.588e+01 2.985e+01 3.120e+01 3.302e+01 3.747e+01, threshold=6.240e+01, percent-clipped=0.0 2023-12-22 18:02:49,580 INFO [train.py:886] (1/4) Epoch 23, batch 2000, loss[loss=0.01241, audio_tagging_loss=0.01241, over 24750.00 frames. ], tot_loss[loss=0.01329, audio_tagging_loss=0.01329, over 4947317.33 frames. ], batch size: 99, lr: 4.72e-03, grad_scale: 64.0 2023-12-22 18:02:54,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=712346.6666666666, ans=0.125 2023-12-22 18:03:00,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=712413.3333333334, ans=0.04949747468305833 2023-12-22 18:03:17,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=712480.0, ans=0.07 2023-12-22 18:03:22,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=712546.6666666666, ans=0.125 2023-12-22 18:03:22,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=712546.6666666666, ans=0.125 2023-12-22 18:03:41,263 INFO [train.py:886] (1/4) Epoch 23, batch 2050, loss[loss=0.01341, audio_tagging_loss=0.01341, over 25000.00 frames. ], tot_loss[loss=0.01314, audio_tagging_loss=0.01314, over 4943925.51 frames. ], batch size: 100, lr: 4.72e-03, grad_scale: 64.0 2023-12-22 18:04:00,667 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.02 vs. limit=15.0 2023-12-22 18:04:04,683 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=6.926e-02 2023-12-22 18:04:27,757 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.661e+01 2.995e+01 3.150e+01 3.287e+01 3.794e+01, threshold=6.300e+01, percent-clipped=0.0 2023-12-22 18:04:31,586 INFO [train.py:886] (1/4) Epoch 23, batch 2100, loss[loss=0.01503, audio_tagging_loss=0.01503, over 25000.00 frames. ], tot_loss[loss=0.01316, audio_tagging_loss=0.01316, over 4951797.43 frames. ], batch size: 100, lr: 4.72e-03, grad_scale: 64.0 2023-12-22 18:04:34,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=713013.3333333334, ans=0.125 2023-12-22 18:04:55,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=713146.6666666666, ans=0.125 2023-12-22 18:04:55,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=713146.6666666666, ans=0.2 2023-12-22 18:05:04,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=713213.3333333334, ans=0.2 2023-12-22 18:05:20,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=713280.0, ans=0.125 2023-12-22 18:05:24,616 INFO [train.py:886] (1/4) Epoch 23, batch 2150, loss[loss=0.0152, audio_tagging_loss=0.0152, over 25000.00 frames. ], tot_loss[loss=0.01321, audio_tagging_loss=0.01321, over 4951369.24 frames. ], batch size: 100, lr: 4.72e-03, grad_scale: 64.0 2023-12-22 18:05:28,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=713346.6666666666, ans=0.0 2023-12-22 18:05:33,225 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:05:44,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=713480.0, ans=0.0 2023-12-22 18:06:02,850 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.63 vs. limit=10.0 2023-12-22 18:06:06,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=713613.3333333334, ans=0.1 2023-12-22 18:06:11,596 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.649e+01 3.004e+01 3.147e+01 3.263e+01 3.799e+01, threshold=6.294e+01, percent-clipped=0.0 2023-12-22 18:06:15,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=713680.0, ans=0.2 2023-12-22 18:06:16,123 INFO [train.py:886] (1/4) Epoch 23, batch 2200, loss[loss=0.01199, audio_tagging_loss=0.01199, over 24750.00 frames. ], tot_loss[loss=0.01332, audio_tagging_loss=0.01332, over 4948052.54 frames. ], batch size: 99, lr: 4.72e-03, grad_scale: 64.0 2023-12-22 18:06:25,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=713746.6666666666, ans=0.125 2023-12-22 18:06:27,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=713746.6666666666, ans=0.125 2023-12-22 18:06:34,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=713813.3333333334, ans=0.2 2023-12-22 18:07:06,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=714013.3333333334, ans=0.125 2023-12-22 18:07:06,816 INFO [train.py:886] (1/4) Epoch 23, batch 2250, loss[loss=0.01244, audio_tagging_loss=0.01244, over 24750.00 frames. ], tot_loss[loss=0.01336, audio_tagging_loss=0.01336, over 4944487.36 frames. ], batch size: 99, lr: 4.71e-03, grad_scale: 64.0 2023-12-22 18:07:55,155 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.733e+01 2.950e+01 3.106e+01 3.281e+01 3.764e+01, threshold=6.212e+01, percent-clipped=0.0 2023-12-22 18:07:58,978 INFO [train.py:886] (1/4) Epoch 23, batch 2300, loss[loss=0.01183, audio_tagging_loss=0.01183, over 24750.00 frames. ], tot_loss[loss=0.01338, audio_tagging_loss=0.01338, over 4939112.45 frames. ], batch size: 99, lr: 4.71e-03, grad_scale: 64.0 2023-12-22 18:08:00,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=714346.6666666666, ans=0.125 2023-12-22 18:08:12,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=714413.3333333334, ans=0.125 2023-12-22 18:08:29,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=714546.6666666666, ans=0.125 2023-12-22 18:08:37,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=714546.6666666666, ans=0.125 2023-12-22 18:08:37,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=714546.6666666666, ans=0.1 2023-12-22 18:08:42,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=714613.3333333334, ans=0.125 2023-12-22 18:08:43,357 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.62 vs. limit=8.0 2023-12-22 18:08:49,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=714680.0, ans=0.125 2023-12-22 18:08:51,197 INFO [train.py:886] (1/4) Epoch 23, batch 2350, loss[loss=0.01293, audio_tagging_loss=0.01293, over 25000.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4948529.71 frames. ], batch size: 100, lr: 4.71e-03, grad_scale: 64.0 2023-12-22 18:09:05,453 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.31 vs. limit=15.0 2023-12-22 18:09:09,085 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.76 vs. limit=22.5 2023-12-22 18:09:15,746 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.69 vs. limit=10.0 2023-12-22 18:09:18,386 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.571e-03 2023-12-22 18:09:23,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=714880.0, ans=0.0 2023-12-22 18:09:28,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=714880.0, ans=0.07 2023-12-22 18:09:33,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=714946.6666666666, ans=0.2 2023-12-22 18:09:38,598 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.611e+01 2.969e+01 3.079e+01 3.242e+01 3.705e+01, threshold=6.159e+01, percent-clipped=0.0 2023-12-22 18:09:43,079 INFO [train.py:886] (1/4) Epoch 23, batch 2400, loss[loss=0.01141, audio_tagging_loss=0.01141, over 22051.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4947588.19 frames. ], batch size: 107, lr: 4.71e-03, grad_scale: 64.0 2023-12-22 18:09:52,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=715080.0, ans=0.0 2023-12-22 18:10:24,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=715280.0, ans=0.0 2023-12-22 18:10:35,280 INFO [train.py:886] (1/4) Epoch 23, batch 2450, loss[loss=0.01573, audio_tagging_loss=0.01573, over 25000.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4946959.64 frames. ], batch size: 100, lr: 4.71e-03, grad_scale: 64.0 2023-12-22 18:10:41,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=715346.6666666666, ans=0.1 2023-12-22 18:10:53,650 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.50 vs. limit=15.0 2023-12-22 18:10:57,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=715480.0, ans=0.1 2023-12-22 18:11:08,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=715546.6666666666, ans=0.09899494936611666 2023-12-22 18:11:22,432 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.810e+01 2.984e+01 3.127e+01 3.304e+01 3.935e+01, threshold=6.253e+01, percent-clipped=0.0 2023-12-22 18:11:26,267 INFO [train.py:886] (1/4) Epoch 23, batch 2500, loss[loss=0.01578, audio_tagging_loss=0.01578, over 24750.00 frames. ], tot_loss[loss=0.01338, audio_tagging_loss=0.01338, over 4949644.10 frames. ], batch size: 99, lr: 4.71e-03, grad_scale: 64.0 2023-12-22 18:11:47,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=715813.3333333334, ans=0.125 2023-12-22 18:11:53,444 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.874e-02 2023-12-22 18:12:01,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=715880.0, ans=0.1 2023-12-22 18:12:18,522 INFO [train.py:886] (1/4) Epoch 23, batch 2550, loss[loss=0.01453, audio_tagging_loss=0.01453, over 24750.00 frames. ], tot_loss[loss=0.01348, audio_tagging_loss=0.01348, over 4947303.26 frames. ], batch size: 99, lr: 4.71e-03, grad_scale: 64.0 2023-12-22 18:12:22,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=716013.3333333334, ans=0.0 2023-12-22 18:12:26,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=716013.3333333334, ans=0.0 2023-12-22 18:12:29,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=716080.0, ans=0.0 2023-12-22 18:12:50,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=716213.3333333334, ans=0.0 2023-12-22 18:12:59,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=716280.0, ans=0.1 2023-12-22 18:13:05,419 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.696e+01 2.995e+01 3.127e+01 3.270e+01 4.281e+01, threshold=6.254e+01, percent-clipped=0.0 2023-12-22 18:13:10,606 INFO [train.py:886] (1/4) Epoch 23, batch 2600, loss[loss=0.01159, audio_tagging_loss=0.01159, over 24045.00 frames. ], tot_loss[loss=0.01344, audio_tagging_loss=0.01344, over 4945303.12 frames. ], batch size: 100, lr: 4.71e-03, grad_scale: 64.0 2023-12-22 18:13:29,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=716480.0, ans=0.125 2023-12-22 18:13:45,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=716546.6666666666, ans=0.1 2023-12-22 18:13:46,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=716546.6666666666, ans=0.1 2023-12-22 18:13:49,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=716546.6666666666, ans=10.0 2023-12-22 18:13:57,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=716613.3333333334, ans=0.125 2023-12-22 18:14:00,213 INFO [train.py:886] (1/4) Epoch 23, batch 2650, loss[loss=0.01354, audio_tagging_loss=0.01354, over 25000.00 frames. ], tot_loss[loss=0.01332, audio_tagging_loss=0.01332, over 4948284.45 frames. ], batch size: 100, lr: 4.71e-03, grad_scale: 64.0 2023-12-22 18:14:01,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=716680.0, ans=0.1 2023-12-22 18:14:01,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=716680.0, ans=0.05 2023-12-22 18:14:01,693 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.59 vs. limit=15.0 2023-12-22 18:14:14,148 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.78 vs. limit=15.0 2023-12-22 18:14:20,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=716813.3333333334, ans=0.125 2023-12-22 18:14:22,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=716813.3333333334, ans=0.125 2023-12-22 18:14:34,059 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.30 vs. limit=22.5 2023-12-22 18:14:48,296 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.721e+01 3.007e+01 3.105e+01 3.259e+01 4.059e+01, threshold=6.210e+01, percent-clipped=0.0 2023-12-22 18:14:52,123 INFO [train.py:886] (1/4) Epoch 23, batch 2700, loss[loss=0.01075, audio_tagging_loss=0.01075, over 24750.00 frames. ], tot_loss[loss=0.01322, audio_tagging_loss=0.01322, over 4945277.38 frames. ], batch size: 99, lr: 4.70e-03, grad_scale: 64.0 2023-12-22 18:15:23,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=717213.3333333334, ans=0.125 2023-12-22 18:15:25,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=717213.3333333334, ans=0.1 2023-12-22 18:15:37,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=717280.0, ans=0.125 2023-12-22 18:15:42,644 INFO [train.py:886] (1/4) Epoch 23, batch 2750, loss[loss=0.01247, audio_tagging_loss=0.01247, over 25000.00 frames. ], tot_loss[loss=0.01324, audio_tagging_loss=0.01324, over 4947221.31 frames. ], batch size: 100, lr: 4.70e-03, grad_scale: 64.0 2023-12-22 18:15:44,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=717346.6666666666, ans=0.0 2023-12-22 18:15:52,014 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.99 vs. limit=15.0 2023-12-22 18:15:54,978 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.57 vs. limit=15.0 2023-12-22 18:15:56,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=717413.3333333334, ans=0.2 2023-12-22 18:16:06,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=717480.0, ans=0.0 2023-12-22 18:16:07,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=717480.0, ans=0.125 2023-12-22 18:16:12,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=717546.6666666666, ans=0.125 2023-12-22 18:16:30,042 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.651e+01 3.005e+01 3.171e+01 3.321e+01 3.773e+01, threshold=6.342e+01, percent-clipped=0.0 2023-12-22 18:16:30,675 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.42 vs. limit=12.0 2023-12-22 18:16:33,804 INFO [train.py:886] (1/4) Epoch 23, batch 2800, loss[loss=0.01434, audio_tagging_loss=0.01434, over 24750.00 frames. ], tot_loss[loss=0.01326, audio_tagging_loss=0.01326, over 4944436.55 frames. ], batch size: 99, lr: 4.70e-03, grad_scale: 64.0 2023-12-22 18:16:49,111 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.69 vs. limit=15.0 2023-12-22 18:16:49,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=717746.6666666666, ans=0.125 2023-12-22 18:16:52,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=717746.6666666666, ans=0.125 2023-12-22 18:17:12,201 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.786e-02 2023-12-22 18:17:15,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=717946.6666666666, ans=0.1 2023-12-22 18:17:18,836 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.85 vs. limit=15.0 2023-12-22 18:17:20,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=717946.6666666666, ans=0.125 2023-12-22 18:17:23,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=717946.6666666666, ans=0.125 2023-12-22 18:17:24,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=718013.3333333334, ans=0.2 2023-12-22 18:17:25,539 INFO [train.py:886] (1/4) Epoch 23, batch 2850, loss[loss=0.01025, audio_tagging_loss=0.01025, over 24750.00 frames. ], tot_loss[loss=0.01336, audio_tagging_loss=0.01336, over 4941195.40 frames. ], batch size: 99, lr: 4.70e-03, grad_scale: 64.0 2023-12-22 18:17:35,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=718080.0, ans=0.125 2023-12-22 18:17:53,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=718146.6666666666, ans=0.2 2023-12-22 18:18:01,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=718213.3333333334, ans=0.1 2023-12-22 18:18:07,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=718280.0, ans=0.125 2023-12-22 18:18:10,879 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.718e+01 3.010e+01 3.133e+01 3.291e+01 3.864e+01, threshold=6.266e+01, percent-clipped=0.0 2023-12-22 18:18:12,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=718280.0, ans=0.0 2023-12-22 18:18:14,650 INFO [train.py:886] (1/4) Epoch 23, batch 2900, loss[loss=0.01329, audio_tagging_loss=0.01329, over 25000.00 frames. ], tot_loss[loss=0.01334, audio_tagging_loss=0.01334, over 4945860.33 frames. ], batch size: 100, lr: 4.70e-03, grad_scale: 64.0 2023-12-22 18:18:50,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=718546.6666666666, ans=0.0 2023-12-22 18:19:02,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=718613.3333333334, ans=0.07 2023-12-22 18:19:06,759 INFO [train.py:886] (1/4) Epoch 23, batch 2950, loss[loss=0.01108, audio_tagging_loss=0.01108, over 24750.00 frames. ], tot_loss[loss=0.01327, audio_tagging_loss=0.01327, over 4943119.73 frames. ], batch size: 99, lr: 4.70e-03, grad_scale: 64.0 2023-12-22 18:19:07,216 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.69 vs. limit=15.0 2023-12-22 18:19:18,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=718746.6666666666, ans=0.0 2023-12-22 18:19:25,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=718746.6666666666, ans=0.025 2023-12-22 18:19:43,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=718880.0, ans=0.125 2023-12-22 18:19:44,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=718880.0, ans=0.125 2023-12-22 18:19:45,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=718880.0, ans=0.0 2023-12-22 18:19:47,735 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.28 vs. limit=15.0 2023-12-22 18:19:49,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=718946.6666666666, ans=15.0 2023-12-22 18:19:50,489 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.71 vs. limit=15.0 2023-12-22 18:19:52,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=718946.6666666666, ans=0.125 2023-12-22 18:19:52,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=718946.6666666666, ans=0.1 2023-12-22 18:19:52,813 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.684e+01 2.910e+01 3.043e+01 3.179e+01 4.091e+01, threshold=6.086e+01, percent-clipped=0.0 2023-12-22 18:19:58,624 INFO [train.py:886] (1/4) Epoch 23, batch 3000, loss[loss=0.0127, audio_tagging_loss=0.0127, over 25000.00 frames. ], tot_loss[loss=0.0132, audio_tagging_loss=0.0132, over 4941140.05 frames. ], batch size: 100, lr: 4.70e-03, grad_scale: 64.0 2023-12-22 18:19:58,625 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 18:20:08,565 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.5914, 3.6549, 3.3323, 3.1476], device='cuda:1') 2023-12-22 18:20:11,883 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.5288, 3.2950, 3.6145, 3.3277, 3.4600, 3.6761, 2.5851, 2.9017], device='cuda:1') 2023-12-22 18:20:19,163 INFO [train.py:917] (1/4) Epoch 23, validation: loss=0.03349, audio_tagging_loss=0.03349, over 3737520.00 frames. 2023-12-22 18:20:19,164 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 18:20:28,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=719013.3333333334, ans=0.1 2023-12-22 18:20:30,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=719080.0, ans=0.125 2023-12-22 18:20:35,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=719080.0, ans=0.125 2023-12-22 18:20:44,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=719146.6666666666, ans=0.125 2023-12-22 18:20:46,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=719146.6666666666, ans=0.125 2023-12-22 18:20:49,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=719213.3333333334, ans=0.125 2023-12-22 18:21:03,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.23 vs. limit=6.0 2023-12-22 18:21:05,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=719280.0, ans=0.125 2023-12-22 18:21:11,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=719346.6666666666, ans=0.1 2023-12-22 18:21:11,892 INFO [train.py:886] (1/4) Epoch 23, batch 3050, loss[loss=0.01233, audio_tagging_loss=0.01233, over 25000.00 frames. ], tot_loss[loss=0.01323, audio_tagging_loss=0.01323, over 4946297.99 frames. ], batch size: 100, lr: 4.70e-03, grad_scale: 64.0 2023-12-22 18:21:20,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=719413.3333333334, ans=0.0 2023-12-22 18:21:36,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=719480.0, ans=0.1 2023-12-22 18:21:45,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=719546.6666666666, ans=0.125 2023-12-22 18:21:58,932 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.742e+01 3.000e+01 3.103e+01 3.222e+01 3.770e+01, threshold=6.206e+01, percent-clipped=0.0 2023-12-22 18:22:04,184 INFO [train.py:886] (1/4) Epoch 23, batch 3100, loss[loss=0.01182, audio_tagging_loss=0.01182, over 25000.00 frames. ], tot_loss[loss=0.01339, audio_tagging_loss=0.01339, over 4954858.98 frames. ], batch size: 100, lr: 4.70e-03, grad_scale: 64.0 2023-12-22 18:22:12,927 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.20 vs. limit=15.0 2023-12-22 18:22:21,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=719746.6666666666, ans=0.125 2023-12-22 18:22:21,428 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.90 vs. limit=10.0 2023-12-22 18:22:24,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=719813.3333333334, ans=0.125 2023-12-22 18:22:35,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=719880.0, ans=0.125 2023-12-22 18:22:36,605 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.29 vs. limit=22.5 2023-12-22 18:22:56,623 INFO [train.py:886] (1/4) Epoch 23, batch 3150, loss[loss=0.01474, audio_tagging_loss=0.01474, over 24750.00 frames. ], tot_loss[loss=0.01344, audio_tagging_loss=0.01344, over 4947545.88 frames. ], batch size: 99, lr: 4.69e-03, grad_scale: 64.0 2023-12-22 18:22:58,172 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.32 vs. limit=6.0 2023-12-22 18:23:01,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=720013.3333333334, ans=0.05 2023-12-22 18:23:01,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=720013.3333333334, ans=0.0 2023-12-22 18:23:16,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=720146.6666666666, ans=0.125 2023-12-22 18:23:20,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=720146.6666666666, ans=0.125 2023-12-22 18:23:27,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=720213.3333333334, ans=0.125 2023-12-22 18:23:43,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=720280.0, ans=0.125 2023-12-22 18:23:44,196 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.698e+01 3.010e+01 3.145e+01 3.312e+01 3.855e+01, threshold=6.290e+01, percent-clipped=0.0 2023-12-22 18:23:48,790 INFO [train.py:886] (1/4) Epoch 23, batch 3200, loss[loss=0.01186, audio_tagging_loss=0.01186, over 24750.00 frames. ], tot_loss[loss=0.01341, audio_tagging_loss=0.01341, over 4948453.46 frames. ], batch size: 99, lr: 4.69e-03, grad_scale: 64.0 2023-12-22 18:24:01,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=720413.3333333334, ans=0.0 2023-12-22 18:24:20,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=720546.6666666666, ans=0.2 2023-12-22 18:24:21,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=720546.6666666666, ans=0.0 2023-12-22 18:24:34,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=720613.3333333334, ans=0.0 2023-12-22 18:24:40,418 INFO [train.py:886] (1/4) Epoch 23, batch 3250, loss[loss=0.01404, audio_tagging_loss=0.01404, over 25000.00 frames. ], tot_loss[loss=0.01341, audio_tagging_loss=0.01341, over 4947182.72 frames. ], batch size: 100, lr: 4.69e-03, grad_scale: 64.0 2023-12-22 18:24:44,572 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.64 vs. limit=15.0 2023-12-22 18:24:49,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=720746.6666666666, ans=0.125 2023-12-22 18:25:00,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=720813.3333333334, ans=0.0 2023-12-22 18:25:14,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=720880.0, ans=0.0 2023-12-22 18:25:19,412 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.94 vs. limit=22.5 2023-12-22 18:25:20,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=720880.0, ans=0.09899494936611666 2023-12-22 18:25:22,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=720946.6666666666, ans=0.125 2023-12-22 18:25:28,112 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.689e+01 2.895e+01 3.079e+01 3.188e+01 4.978e+01, threshold=6.159e+01, percent-clipped=0.0 2023-12-22 18:25:30,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=720946.6666666666, ans=0.0 2023-12-22 18:25:32,125 INFO [train.py:886] (1/4) Epoch 23, batch 3300, loss[loss=0.0135, audio_tagging_loss=0.0135, over 25000.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4943201.41 frames. ], batch size: 100, lr: 4.69e-03, grad_scale: 64.0 2023-12-22 18:25:48,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=721080.0, ans=0.125 2023-12-22 18:25:57,475 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.48 vs. limit=15.0 2023-12-22 18:26:00,309 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.04 vs. limit=15.0 2023-12-22 18:26:03,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=721213.3333333334, ans=0.035 2023-12-22 18:26:12,752 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.29 vs. limit=15.0 2023-12-22 18:26:25,017 INFO [train.py:886] (1/4) Epoch 23, batch 3350, loss[loss=0.01486, audio_tagging_loss=0.01486, over 25000.00 frames. ], tot_loss[loss=0.0133, audio_tagging_loss=0.0133, over 4942845.85 frames. ], batch size: 100, lr: 4.69e-03, grad_scale: 64.0 2023-12-22 18:26:34,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=721413.3333333334, ans=0.125 2023-12-22 18:26:48,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=721480.0, ans=0.125 2023-12-22 18:27:04,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=721546.6666666666, ans=0.125 2023-12-22 18:27:12,422 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.691e+01 3.001e+01 3.141e+01 3.291e+01 3.725e+01, threshold=6.283e+01, percent-clipped=0.0 2023-12-22 18:27:16,201 INFO [train.py:886] (1/4) Epoch 23, batch 3400, loss[loss=0.01494, audio_tagging_loss=0.01494, over 25000.00 frames. ], tot_loss[loss=0.01332, audio_tagging_loss=0.01332, over 4948394.85 frames. ], batch size: 100, lr: 4.69e-03, grad_scale: 64.0 2023-12-22 18:27:25,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=721680.0, ans=0.1 2023-12-22 18:27:46,223 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.23 vs. limit=22.5 2023-12-22 18:28:01,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=721946.6666666666, ans=0.1 2023-12-22 18:28:06,899 INFO [train.py:886] (1/4) Epoch 23, batch 3450, loss[loss=0.01423, audio_tagging_loss=0.01423, over 24750.00 frames. ], tot_loss[loss=0.01339, audio_tagging_loss=0.01339, over 4949028.24 frames. ], batch size: 99, lr: 4.69e-03, grad_scale: 64.0 2023-12-22 18:28:08,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=722013.3333333334, ans=0.125 2023-12-22 18:28:09,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=722013.3333333334, ans=0.125 2023-12-22 18:28:46,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=722280.0, ans=0.125 2023-12-22 18:28:53,917 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.611e+01 3.061e+01 3.208e+01 3.330e+01 3.695e+01, threshold=6.416e+01, percent-clipped=0.0 2023-12-22 18:28:55,098 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:28:58,440 INFO [train.py:886] (1/4) Epoch 23, batch 3500, loss[loss=0.01118, audio_tagging_loss=0.01118, over 25000.00 frames. ], tot_loss[loss=0.01336, audio_tagging_loss=0.01336, over 4947207.62 frames. ], batch size: 100, lr: 4.69e-03, grad_scale: 64.0 2023-12-22 18:29:05,481 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.11 vs. limit=15.0 2023-12-22 18:29:31,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=722546.6666666666, ans=0.125 2023-12-22 18:29:34,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=722546.6666666666, ans=0.0 2023-12-22 18:29:39,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=722613.3333333334, ans=0.0 2023-12-22 18:29:43,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=722613.3333333334, ans=0.125 2023-12-22 18:29:48,882 INFO [train.py:886] (1/4) Epoch 23, batch 3550, loss[loss=0.009984, audio_tagging_loss=0.009984, over 25000.00 frames. ], tot_loss[loss=0.01329, audio_tagging_loss=0.01329, over 4947299.56 frames. ], batch size: 100, lr: 4.69e-03, grad_scale: 64.0 2023-12-22 18:29:52,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=722680.0, ans=0.1 2023-12-22 18:29:57,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=722680.0, ans=0.125 2023-12-22 18:30:02,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=722746.6666666666, ans=0.125 2023-12-22 18:30:07,957 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.00 vs. limit=22.5 2023-12-22 18:30:12,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=722813.3333333334, ans=0.09899494936611666 2023-12-22 18:30:25,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=722880.0, ans=0.125 2023-12-22 18:30:26,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=722880.0, ans=0.125 2023-12-22 18:30:36,388 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.627e+01 2.997e+01 3.123e+01 3.295e+01 3.646e+01, threshold=6.246e+01, percent-clipped=0.0 2023-12-22 18:30:40,164 INFO [train.py:886] (1/4) Epoch 23, batch 3600, loss[loss=0.01206, audio_tagging_loss=0.01206, over 25000.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4954225.09 frames. ], batch size: 100, lr: 4.69e-03, grad_scale: 64.0 2023-12-22 18:30:40,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=723013.3333333334, ans=0.125 2023-12-22 18:30:43,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=723013.3333333334, ans=10.0 2023-12-22 18:31:02,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=723146.6666666666, ans=0.125 2023-12-22 18:31:06,530 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.06 vs. limit=22.5 2023-12-22 18:31:13,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=723213.3333333334, ans=0.0 2023-12-22 18:31:14,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=723213.3333333334, ans=0.1 2023-12-22 18:31:20,620 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.80 vs. limit=15.0 2023-12-22 18:31:31,392 INFO [train.py:886] (1/4) Epoch 23, batch 3650, loss[loss=0.01351, audio_tagging_loss=0.01351, over 25000.00 frames. ], tot_loss[loss=0.01314, audio_tagging_loss=0.01314, over 4952800.77 frames. ], batch size: 100, lr: 4.68e-03, grad_scale: 64.0 2023-12-22 18:31:39,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=723346.6666666666, ans=0.125 2023-12-22 18:31:43,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=723413.3333333334, ans=0.0 2023-12-22 18:31:45,396 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.16 vs. limit=22.5 2023-12-22 18:31:49,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=723413.3333333334, ans=0.125 2023-12-22 18:31:53,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=723480.0, ans=0.07 2023-12-22 18:31:54,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=723480.0, ans=0.125 2023-12-22 18:32:03,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=723546.6666666666, ans=0.125 2023-12-22 18:32:11,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=723546.6666666666, ans=0.0 2023-12-22 18:32:13,999 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.64 vs. limit=15.0 2023-12-22 18:32:20,214 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.539e+01 2.967e+01 3.120e+01 3.208e+01 3.587e+01, threshold=6.240e+01, percent-clipped=0.0 2023-12-22 18:32:20,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=723613.3333333334, ans=0.0 2023-12-22 18:32:21,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=723613.3333333334, ans=0.0 2023-12-22 18:32:24,015 INFO [train.py:886] (1/4) Epoch 23, batch 3700, loss[loss=0.01365, audio_tagging_loss=0.01365, over 25000.00 frames. ], tot_loss[loss=0.01314, audio_tagging_loss=0.01314, over 4944884.64 frames. ], batch size: 100, lr: 4.68e-03, grad_scale: 64.0 2023-12-22 18:32:49,124 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:32:53,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=723880.0, ans=0.125 2023-12-22 18:33:13,862 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.55 vs. limit=15.0 2023-12-22 18:33:15,982 INFO [train.py:886] (1/4) Epoch 23, batch 3750, loss[loss=0.01089, audio_tagging_loss=0.01089, over 24750.00 frames. ], tot_loss[loss=0.01326, audio_tagging_loss=0.01326, over 4945976.00 frames. ], batch size: 99, lr: 4.68e-03, grad_scale: 64.0 2023-12-22 18:33:22,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=724013.3333333334, ans=0.125 2023-12-22 18:33:22,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=724013.3333333334, ans=0.1 2023-12-22 18:33:34,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=724146.6666666666, ans=0.125 2023-12-22 18:33:37,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=724146.6666666666, ans=0.0 2023-12-22 18:33:43,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=724146.6666666666, ans=0.125 2023-12-22 18:33:52,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=724213.3333333334, ans=0.125 2023-12-22 18:34:01,963 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.690e+01 3.057e+01 3.159e+01 3.344e+01 3.975e+01, threshold=6.319e+01, percent-clipped=0.0 2023-12-22 18:34:05,796 INFO [train.py:886] (1/4) Epoch 23, batch 3800, loss[loss=0.01144, audio_tagging_loss=0.01144, over 21935.00 frames. ], tot_loss[loss=0.01335, audio_tagging_loss=0.01335, over 4939715.76 frames. ], batch size: 107, lr: 4.68e-03, grad_scale: 64.0 2023-12-22 18:34:07,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=724346.6666666666, ans=0.07 2023-12-22 18:34:19,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=724413.3333333334, ans=0.0 2023-12-22 18:34:26,967 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:34:52,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=724613.3333333334, ans=0.2 2023-12-22 18:34:57,460 INFO [train.py:886] (1/4) Epoch 23, batch 3850, loss[loss=0.0111, audio_tagging_loss=0.0111, over 25000.00 frames. ], tot_loss[loss=0.01322, audio_tagging_loss=0.01322, over 4938682.33 frames. ], batch size: 100, lr: 4.68e-03, grad_scale: 64.0 2023-12-22 18:35:09,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=724746.6666666666, ans=0.07 2023-12-22 18:35:24,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=724813.3333333334, ans=0.125 2023-12-22 18:35:24,754 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.12 vs. limit=10.0 2023-12-22 18:35:28,393 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.09 vs. limit=15.0 2023-12-22 18:35:43,190 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.690e+01 3.009e+01 3.160e+01 3.363e+01 3.949e+01, threshold=6.320e+01, percent-clipped=0.0 2023-12-22 18:35:49,156 INFO [train.py:886] (1/4) Epoch 23, batch 3900, loss[loss=0.01401, audio_tagging_loss=0.01401, over 25000.00 frames. ], tot_loss[loss=0.01322, audio_tagging_loss=0.01322, over 4938521.11 frames. ], batch size: 100, lr: 4.68e-03, grad_scale: 128.0 2023-12-22 18:35:52,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=725013.3333333334, ans=0.125 2023-12-22 18:36:00,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=725080.0, ans=0.0 2023-12-22 18:36:03,593 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.13 vs. limit=15.0 2023-12-22 18:36:04,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=725080.0, ans=0.2 2023-12-22 18:36:27,467 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2023-12-22 18:36:35,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=725280.0, ans=0.1 2023-12-22 18:36:40,543 INFO [train.py:886] (1/4) Epoch 23, batch 3950, loss[loss=0.01275, audio_tagging_loss=0.01275, over 25000.00 frames. ], tot_loss[loss=0.01316, audio_tagging_loss=0.01316, over 4941675.92 frames. ], batch size: 100, lr: 4.68e-03, grad_scale: 128.0 2023-12-22 18:36:46,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=725346.6666666666, ans=0.125 2023-12-22 18:36:49,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=725413.3333333334, ans=0.05 2023-12-22 18:36:50,179 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.50 vs. limit=15.0 2023-12-22 18:37:02,790 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.63 vs. limit=10.0 2023-12-22 18:37:10,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=725480.0, ans=0.125 2023-12-22 18:37:18,051 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.29 vs. limit=10.0 2023-12-22 18:37:19,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=725546.6666666666, ans=0.2 2023-12-22 18:37:20,915 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.08 vs. limit=15.0 2023-12-22 18:37:23,590 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.56 vs. limit=6.0 2023-12-22 18:37:30,104 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.629e+01 2.967e+01 3.138e+01 3.271e+01 3.711e+01, threshold=6.275e+01, percent-clipped=0.0 2023-12-22 18:37:33,025 INFO [train.py:886] (1/4) Epoch 23, batch 4000, loss[loss=0.0123, audio_tagging_loss=0.0123, over 21917.00 frames. ], tot_loss[loss=0.01321, audio_tagging_loss=0.01321, over 4941249.23 frames. ], batch size: 107, lr: 4.68e-03, grad_scale: 64.0 2023-12-22 18:37:45,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=725746.6666666666, ans=0.0 2023-12-22 18:37:48,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=725746.6666666666, ans=0.125 2023-12-22 18:38:04,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=725880.0, ans=15.0 2023-12-22 18:38:14,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=725946.6666666666, ans=0.125 2023-12-22 18:38:15,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=725946.6666666666, ans=0.125 2023-12-22 18:38:23,323 INFO [train.py:886] (1/4) Epoch 23, batch 4050, loss[loss=0.01331, audio_tagging_loss=0.01331, over 24039.00 frames. ], tot_loss[loss=0.01326, audio_tagging_loss=0.01326, over 4936461.09 frames. ], batch size: 100, lr: 4.68e-03, grad_scale: 64.0 2023-12-22 18:38:28,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=726013.3333333334, ans=0.1 2023-12-22 18:38:36,193 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.18 vs. limit=15.0 2023-12-22 18:38:50,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=726146.6666666666, ans=0.125 2023-12-22 18:39:00,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=726213.3333333334, ans=0.125 2023-12-22 18:39:00,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=726213.3333333334, ans=0.0 2023-12-22 18:39:08,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=726280.0, ans=0.95 2023-12-22 18:39:13,486 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.753e+01 3.032e+01 3.162e+01 3.330e+01 3.866e+01, threshold=6.323e+01, percent-clipped=0.0 2023-12-22 18:39:16,374 INFO [train.py:886] (1/4) Epoch 23, batch 4100, loss[loss=0.01406, audio_tagging_loss=0.01406, over 24750.00 frames. ], tot_loss[loss=0.01337, audio_tagging_loss=0.01337, over 4939330.43 frames. ], batch size: 99, lr: 4.67e-03, grad_scale: 64.0 2023-12-22 18:39:19,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=726346.6666666666, ans=0.125 2023-12-22 18:39:32,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=726413.3333333334, ans=0.035 2023-12-22 18:39:41,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=726480.0, ans=0.0 2023-12-22 18:39:44,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=726480.0, ans=0.1 2023-12-22 18:39:49,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=726546.6666666666, ans=0.125 2023-12-22 18:40:08,620 INFO [train.py:886] (1/4) Epoch 23, batch 4150, loss[loss=0.0123, audio_tagging_loss=0.0123, over 24750.00 frames. ], tot_loss[loss=0.01336, audio_tagging_loss=0.01336, over 4940059.02 frames. ], batch size: 99, lr: 4.67e-03, grad_scale: 64.0 2023-12-22 18:40:51,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=726946.6666666666, ans=0.05 2023-12-22 18:40:56,281 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.726e+01 3.004e+01 3.122e+01 3.283e+01 3.797e+01, threshold=6.243e+01, percent-clipped=0.0 2023-12-22 18:40:59,168 INFO [train.py:886] (1/4) Epoch 23, batch 4200, loss[loss=0.01384, audio_tagging_loss=0.01384, over 25000.00 frames. ], tot_loss[loss=0.01326, audio_tagging_loss=0.01326, over 4943105.38 frames. ], batch size: 100, lr: 4.67e-03, grad_scale: 64.0 2023-12-22 18:41:06,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=727013.3333333334, ans=0.125 2023-12-22 18:41:12,907 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.79 vs. limit=15.0 2023-12-22 18:41:28,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=727146.6666666666, ans=0.125 2023-12-22 18:41:30,807 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=15.0 2023-12-22 18:41:52,125 INFO [train.py:886] (1/4) Epoch 23, batch 4250, loss[loss=0.01392, audio_tagging_loss=0.01392, over 25000.00 frames. ], tot_loss[loss=0.0132, audio_tagging_loss=0.0132, over 4945770.03 frames. ], batch size: 100, lr: 4.67e-03, grad_scale: 64.0 2023-12-22 18:41:54,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=727346.6666666666, ans=0.125 2023-12-22 18:42:03,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=727413.3333333334, ans=0.125 2023-12-22 18:42:39,987 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.744e+01 3.028e+01 3.146e+01 3.307e+01 3.959e+01, threshold=6.292e+01, percent-clipped=0.0 2023-12-22 18:42:43,636 INFO [train.py:886] (1/4) Epoch 23, batch 4300, loss[loss=0.01432, audio_tagging_loss=0.01432, over 24068.00 frames. ], tot_loss[loss=0.01319, audio_tagging_loss=0.01319, over 4947123.19 frames. ], batch size: 100, lr: 4.67e-03, grad_scale: 64.0 2023-12-22 18:42:44,979 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.62 vs. limit=15.0 2023-12-22 18:42:56,177 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.34 vs. limit=15.0 2023-12-22 18:43:07,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=727813.3333333334, ans=0.1 2023-12-22 18:43:09,757 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.80 vs. limit=22.5 2023-12-22 18:43:23,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=727880.0, ans=0.0 2023-12-22 18:43:29,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=727946.6666666666, ans=0.07 2023-12-22 18:43:33,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=727946.6666666666, ans=0.1 2023-12-22 18:43:35,871 INFO [train.py:886] (1/4) Epoch 23, batch 4350, loss[loss=0.01278, audio_tagging_loss=0.01278, over 24750.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4953138.65 frames. ], batch size: 99, lr: 4.67e-03, grad_scale: 64.0 2023-12-22 18:44:06,295 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.86 vs. limit=10.0 2023-12-22 18:44:08,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=728213.3333333334, ans=0.1 2023-12-22 18:44:09,221 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.89 vs. limit=6.0 2023-12-22 18:44:11,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=728213.3333333334, ans=0.125 2023-12-22 18:44:24,891 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.796e+01 3.084e+01 3.223e+01 3.361e+01 3.747e+01, threshold=6.446e+01, percent-clipped=0.0 2023-12-22 18:44:27,824 INFO [train.py:886] (1/4) Epoch 23, batch 4400, loss[loss=0.01341, audio_tagging_loss=0.01341, over 24750.00 frames. ], tot_loss[loss=0.01334, audio_tagging_loss=0.01334, over 4954360.91 frames. ], batch size: 99, lr: 4.67e-03, grad_scale: 64.0 2023-12-22 18:44:33,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=728346.6666666666, ans=0.1 2023-12-22 18:44:45,820 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=15.0 2023-12-22 18:44:59,605 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2023-12-22 18:45:11,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=728613.3333333334, ans=0.2 2023-12-22 18:45:15,207 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.941e-02 2023-12-22 18:45:18,740 INFO [train.py:886] (1/4) Epoch 23, batch 4450, loss[loss=0.01237, audio_tagging_loss=0.01237, over 22482.00 frames. ], tot_loss[loss=0.01343, audio_tagging_loss=0.01343, over 4949290.26 frames. ], batch size: 107, lr: 4.67e-03, grad_scale: 64.0 2023-12-22 18:45:41,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=728813.3333333334, ans=0.0 2023-12-22 18:45:51,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=728880.0, ans=0.1 2023-12-22 18:45:52,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=728880.0, ans=0.1 2023-12-22 18:46:08,134 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.762e+01 3.007e+01 3.166e+01 3.319e+01 4.167e+01, threshold=6.332e+01, percent-clipped=0.0 2023-12-22 18:46:10,960 INFO [train.py:886] (1/4) Epoch 23, batch 4500, loss[loss=0.01424, audio_tagging_loss=0.01424, over 24750.00 frames. ], tot_loss[loss=0.01334, audio_tagging_loss=0.01334, over 4950595.54 frames. ], batch size: 99, lr: 4.67e-03, grad_scale: 64.0 2023-12-22 18:46:36,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=729146.6666666666, ans=0.0 2023-12-22 18:46:36,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=729146.6666666666, ans=0.2 2023-12-22 18:46:39,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=729146.6666666666, ans=0.0 2023-12-22 18:46:40,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=729146.6666666666, ans=0.125 2023-12-22 18:46:57,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=729280.0, ans=0.125 2023-12-22 18:47:03,349 INFO [train.py:886] (1/4) Epoch 23, batch 4550, loss[loss=0.01245, audio_tagging_loss=0.01245, over 24750.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4945696.45 frames. ], batch size: 99, lr: 4.66e-03, grad_scale: 64.0 2023-12-22 18:47:07,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=729346.6666666666, ans=0.125 2023-12-22 18:47:11,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=729346.6666666666, ans=0.0 2023-12-22 18:47:12,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=729413.3333333334, ans=0.125 2023-12-22 18:47:25,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=729480.0, ans=22.5 2023-12-22 18:47:44,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=729613.3333333334, ans=0.125 2023-12-22 18:47:51,141 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.681e+01 2.974e+01 3.118e+01 3.247e+01 3.951e+01, threshold=6.237e+01, percent-clipped=0.0 2023-12-22 18:47:53,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=729680.0, ans=0.125 2023-12-22 18:47:54,734 INFO [train.py:886] (1/4) Epoch 23, batch 4600, loss[loss=0.01338, audio_tagging_loss=0.01338, over 24750.00 frames. ], tot_loss[loss=0.01319, audio_tagging_loss=0.01319, over 4948101.85 frames. ], batch size: 99, lr: 4.66e-03, grad_scale: 64.0 2023-12-22 18:48:18,040 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.75 vs. limit=6.0 2023-12-22 18:48:20,933 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.32 vs. limit=6.0 2023-12-22 18:48:25,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=729880.0, ans=0.125 2023-12-22 18:48:28,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=729880.0, ans=0.125 2023-12-22 18:48:35,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=729946.6666666666, ans=0.1 2023-12-22 18:48:46,818 INFO [train.py:886] (1/4) Epoch 23, batch 4650, loss[loss=0.01488, audio_tagging_loss=0.01488, over 25000.00 frames. ], tot_loss[loss=0.01322, audio_tagging_loss=0.01322, over 4955024.01 frames. ], batch size: 100, lr: 4.66e-03, grad_scale: 64.0 2023-12-22 18:49:30,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=730280.0, ans=0.125 2023-12-22 18:49:32,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=730280.0, ans=0.1 2023-12-22 18:49:33,714 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.719e+01 3.050e+01 3.178e+01 3.300e+01 3.735e+01, threshold=6.356e+01, percent-clipped=0.0 2023-12-22 18:49:36,489 INFO [train.py:886] (1/4) Epoch 23, batch 4700, loss[loss=0.01073, audio_tagging_loss=0.01073, over 24750.00 frames. ], tot_loss[loss=0.01334, audio_tagging_loss=0.01334, over 4956346.40 frames. ], batch size: 99, lr: 4.66e-03, grad_scale: 64.0 2023-12-22 18:49:39,353 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:49:42,168 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.28 vs. limit=15.0 2023-12-22 18:49:44,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=730346.6666666666, ans=0.0 2023-12-22 18:49:47,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=730413.3333333334, ans=0.125 2023-12-22 18:49:54,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=730413.3333333334, ans=0.1 2023-12-22 18:50:12,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=730546.6666666666, ans=0.0 2023-12-22 18:50:23,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=730613.3333333334, ans=0.125 2023-12-22 18:50:24,716 INFO [train.py:886] (1/4) Epoch 23, batch 4750, loss[loss=0.01617, audio_tagging_loss=0.01617, over 24750.00 frames. ], tot_loss[loss=0.01342, audio_tagging_loss=0.01342, over 4950625.43 frames. ], batch size: 99, lr: 4.66e-03, grad_scale: 64.0 2023-12-22 18:50:58,842 INFO [train.py:886] (1/4) Epoch 24, batch 0, loss[loss=0.03634, audio_tagging_loss=0.03634, over 21905.00 frames. ], tot_loss[loss=0.03634, audio_tagging_loss=0.03634, over 21905.00 frames. ], batch size: 107, lr: 4.56e-03, grad_scale: 64.0 2023-12-22 18:50:58,843 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 18:51:19,323 INFO [train.py:917] (1/4) Epoch 24, validation: loss=0.03237, audio_tagging_loss=0.03237, over 3737520.00 frames. 2023-12-22 18:51:19,324 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 18:51:23,213 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:51:34,299 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.04 vs. limit=6.0 2023-12-22 18:51:35,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=730853.3333333334, ans=0.1 2023-12-22 18:51:37,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=730853.3333333334, ans=0.0 2023-12-22 18:51:51,195 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.764e+01 3.132e+01 3.325e+01 4.669e+01 9.691e+01, threshold=6.651e+01, percent-clipped=7.0 2023-12-22 18:51:56,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=730986.6666666666, ans=0.0 2023-12-22 18:52:10,595 INFO [train.py:886] (1/4) Epoch 24, batch 50, loss[loss=0.01831, audio_tagging_loss=0.01831, over 25000.00 frames. ], tot_loss[loss=0.02115, audio_tagging_loss=0.02115, over 1116827.05 frames. ], batch size: 100, lr: 4.56e-03, grad_scale: 64.0 2023-12-22 18:52:25,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=731186.6666666666, ans=0.0 2023-12-22 18:52:47,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=731320.0, ans=0.1 2023-12-22 18:53:01,995 INFO [train.py:886] (1/4) Epoch 24, batch 100, loss[loss=0.01584, audio_tagging_loss=0.01584, over 25000.00 frames. ], tot_loss[loss=0.01826, audio_tagging_loss=0.01826, over 1971317.04 frames. ], batch size: 100, lr: 4.56e-03, grad_scale: 64.0 2023-12-22 18:53:06,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=731453.3333333334, ans=0.125 2023-12-22 18:53:26,551 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.33 vs. limit=22.5 2023-12-22 18:53:34,160 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.856e+01 3.361e+01 3.546e+01 3.763e+01 4.764e+01, threshold=7.093e+01, percent-clipped=0.0 2023-12-22 18:53:38,679 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.18 vs. limit=12.0 2023-12-22 18:53:45,379 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2023-12-22 18:53:53,550 INFO [train.py:886] (1/4) Epoch 24, batch 150, loss[loss=0.01373, audio_tagging_loss=0.01373, over 25000.00 frames. ], tot_loss[loss=0.0166, audio_tagging_loss=0.0166, over 2634368.76 frames. ], batch size: 100, lr: 4.56e-03, grad_scale: 64.0 2023-12-22 18:54:06,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=731853.3333333334, ans=0.015 2023-12-22 18:54:20,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=731920.0, ans=0.0 2023-12-22 18:54:27,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=731986.6666666666, ans=0.125 2023-12-22 18:54:41,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=732053.3333333334, ans=0.2 2023-12-22 18:54:45,355 INFO [train.py:886] (1/4) Epoch 24, batch 200, loss[loss=0.01186, audio_tagging_loss=0.01186, over 23962.00 frames. ], tot_loss[loss=0.01556, audio_tagging_loss=0.01556, over 3150252.25 frames. ], batch size: 100, lr: 4.56e-03, grad_scale: 64.0 2023-12-22 18:54:45,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=732120.0, ans=0.0 2023-12-22 18:54:52,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=732120.0, ans=0.125 2023-12-22 18:54:58,508 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.01 vs. limit=15.0 2023-12-22 18:55:17,473 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.730e+01 3.030e+01 3.161e+01 3.286e+01 3.848e+01, threshold=6.323e+01, percent-clipped=0.0 2023-12-22 18:55:21,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=732320.0, ans=0.125 2023-12-22 18:55:38,561 INFO [train.py:886] (1/4) Epoch 24, batch 250, loss[loss=0.01311, audio_tagging_loss=0.01311, over 25000.00 frames. ], tot_loss[loss=0.01498, audio_tagging_loss=0.01498, over 3550486.44 frames. ], batch size: 100, lr: 4.55e-03, grad_scale: 64.0 2023-12-22 18:56:02,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=732586.6666666666, ans=0.125 2023-12-22 18:56:21,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=732720.0, ans=0.125 2023-12-22 18:56:30,535 INFO [train.py:886] (1/4) Epoch 24, batch 300, loss[loss=0.01354, audio_tagging_loss=0.01354, over 24750.00 frames. ], tot_loss[loss=0.01459, audio_tagging_loss=0.01459, over 3861049.19 frames. ], batch size: 99, lr: 4.55e-03, grad_scale: 64.0 2023-12-22 18:56:48,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=732853.3333333334, ans=0.125 2023-12-22 18:56:53,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=732920.0, ans=0.125 2023-12-22 18:56:54,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=732920.0, ans=0.125 2023-12-22 18:56:54,506 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.62 vs. limit=15.0 2023-12-22 18:57:01,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=732986.6666666666, ans=0.125 2023-12-22 18:57:02,324 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.786e+01 3.035e+01 3.194e+01 3.337e+01 3.823e+01, threshold=6.389e+01, percent-clipped=0.0 2023-12-22 18:57:05,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=732986.6666666666, ans=0.1 2023-12-22 18:57:08,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=732986.6666666666, ans=0.125 2023-12-22 18:57:11,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=733053.3333333334, ans=0.125 2023-12-22 18:57:21,944 INFO [train.py:886] (1/4) Epoch 24, batch 350, loss[loss=0.01116, audio_tagging_loss=0.01116, over 24750.00 frames. ], tot_loss[loss=0.01436, audio_tagging_loss=0.01436, over 4096307.17 frames. ], batch size: 99, lr: 4.55e-03, grad_scale: 64.0 2023-12-22 18:57:29,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=733120.0, ans=0.2 2023-12-22 18:57:38,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=733186.6666666666, ans=0.125 2023-12-22 18:57:41,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=733186.6666666666, ans=0.0 2023-12-22 18:58:03,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=733386.6666666666, ans=0.1 2023-12-22 18:58:15,028 INFO [train.py:886] (1/4) Epoch 24, batch 400, loss[loss=0.01752, audio_tagging_loss=0.01752, over 24750.00 frames. ], tot_loss[loss=0.01395, audio_tagging_loss=0.01395, over 4282659.25 frames. ], batch size: 99, lr: 4.55e-03, grad_scale: 64.0 2023-12-22 18:58:17,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=733453.3333333334, ans=0.0 2023-12-22 18:58:18,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=733453.3333333334, ans=0.125 2023-12-22 18:58:46,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.91 vs. limit=15.0 2023-12-22 18:58:47,569 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.554e+01 2.966e+01 3.157e+01 3.316e+01 3.738e+01, threshold=6.314e+01, percent-clipped=0.0 2023-12-22 18:59:01,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=733720.0, ans=0.0 2023-12-22 18:59:05,913 INFO [train.py:886] (1/4) Epoch 24, batch 450, loss[loss=0.01286, audio_tagging_loss=0.01286, over 25000.00 frames. ], tot_loss[loss=0.01378, audio_tagging_loss=0.01378, over 4432799.52 frames. ], batch size: 100, lr: 4.55e-03, grad_scale: 64.0 2023-12-22 18:59:18,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=733853.3333333334, ans=0.125 2023-12-22 18:59:42,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=733986.6666666666, ans=0.0 2023-12-22 18:59:59,016 INFO [train.py:886] (1/4) Epoch 24, batch 500, loss[loss=0.01238, audio_tagging_loss=0.01238, over 25000.00 frames. ], tot_loss[loss=0.01363, audio_tagging_loss=0.01363, over 4553146.74 frames. ], batch size: 100, lr: 4.55e-03, grad_scale: 64.0 2023-12-22 19:00:14,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=734186.6666666666, ans=0.125 2023-12-22 19:00:19,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=734253.3333333334, ans=0.125 2023-12-22 19:00:27,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=734253.3333333334, ans=0.0 2023-12-22 19:00:30,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=734320.0, ans=0.0 2023-12-22 19:00:31,464 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.566e+01 2.996e+01 3.110e+01 3.243e+01 4.536e+01, threshold=6.220e+01, percent-clipped=0.0 2023-12-22 19:00:37,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=734320.0, ans=0.1 2023-12-22 19:00:38,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=734320.0, ans=0.0 2023-12-22 19:00:43,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=734386.6666666666, ans=0.125 2023-12-22 19:00:44,990 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:00:48,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=734386.6666666666, ans=0.125 2023-12-22 19:00:50,157 INFO [train.py:886] (1/4) Epoch 24, batch 550, loss[loss=0.0137, audio_tagging_loss=0.0137, over 25000.00 frames. ], tot_loss[loss=0.0136, audio_tagging_loss=0.0136, over 4645891.72 frames. ], batch size: 100, lr: 4.55e-03, grad_scale: 64.0 2023-12-22 19:00:53,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=734453.3333333334, ans=0.2 2023-12-22 19:00:59,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=734453.3333333334, ans=0.125 2023-12-22 19:01:02,206 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.17 vs. limit=15.0 2023-12-22 19:01:12,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=734586.6666666666, ans=0.125 2023-12-22 19:01:42,431 INFO [train.py:886] (1/4) Epoch 24, batch 600, loss[loss=0.01408, audio_tagging_loss=0.01408, over 24750.00 frames. ], tot_loss[loss=0.01365, audio_tagging_loss=0.01365, over 4716206.09 frames. ], batch size: 99, lr: 4.55e-03, grad_scale: 64.0 2023-12-22 19:02:12,402 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.38 vs. limit=15.0 2023-12-22 19:02:13,849 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.711e+01 3.028e+01 3.193e+01 3.328e+01 3.819e+01, threshold=6.386e+01, percent-clipped=0.0 2023-12-22 19:02:33,929 INFO [train.py:886] (1/4) Epoch 24, batch 650, loss[loss=0.01595, audio_tagging_loss=0.01595, over 24750.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 4763065.52 frames. ], batch size: 99, lr: 4.55e-03, grad_scale: 64.0 2023-12-22 19:03:14,779 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.76 vs. limit=15.0 2023-12-22 19:03:24,389 INFO [train.py:886] (1/4) Epoch 24, batch 700, loss[loss=0.01485, audio_tagging_loss=0.01485, over 24750.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4802158.29 frames. ], batch size: 99, lr: 4.55e-03, grad_scale: 64.0 2023-12-22 19:03:48,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=735586.6666666666, ans=0.125 2023-12-22 19:03:55,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=735653.3333333334, ans=0.1 2023-12-22 19:03:55,822 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.626e+01 3.024e+01 3.156e+01 3.315e+01 3.612e+01, threshold=6.313e+01, percent-clipped=0.0 2023-12-22 19:03:58,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=735653.3333333334, ans=0.1 2023-12-22 19:04:06,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=735720.0, ans=0.1 2023-12-22 19:04:15,535 INFO [train.py:886] (1/4) Epoch 24, batch 750, loss[loss=0.01112, audio_tagging_loss=0.01112, over 24750.00 frames. ], tot_loss[loss=0.01342, audio_tagging_loss=0.01342, over 4834244.79 frames. ], batch size: 99, lr: 4.54e-03, grad_scale: 64.0 2023-12-22 19:04:20,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=735786.6666666666, ans=0.0 2023-12-22 19:04:32,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=735853.3333333334, ans=0.1 2023-12-22 19:04:39,122 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.11 vs. limit=22.5 2023-12-22 19:04:47,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=735986.6666666666, ans=0.125 2023-12-22 19:04:50,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=735986.6666666666, ans=0.125 2023-12-22 19:04:52,062 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:04:55,189 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.27 vs. limit=10.0 2023-12-22 19:04:56,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=736053.3333333334, ans=0.02 2023-12-22 19:05:00,773 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.80 vs. limit=10.0 2023-12-22 19:05:03,326 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:05:05,914 INFO [train.py:886] (1/4) Epoch 24, batch 800, loss[loss=0.01202, audio_tagging_loss=0.01202, over 25000.00 frames. ], tot_loss[loss=0.01336, audio_tagging_loss=0.01336, over 4863514.83 frames. ], batch size: 100, lr: 4.54e-03, grad_scale: 64.0 2023-12-22 19:05:08,132 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.62 vs. limit=22.5 2023-12-22 19:05:11,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=736120.0, ans=0.125 2023-12-22 19:05:16,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=736120.0, ans=0.0 2023-12-22 19:05:19,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=736186.6666666666, ans=0.2 2023-12-22 19:05:29,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=736253.3333333334, ans=0.0 2023-12-22 19:05:34,097 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.69 vs. limit=15.0 2023-12-22 19:05:35,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=736253.3333333334, ans=0.125 2023-12-22 19:05:37,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=736320.0, ans=0.5 2023-12-22 19:05:38,414 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.566e+01 2.955e+01 3.119e+01 3.236e+01 3.787e+01, threshold=6.239e+01, percent-clipped=0.0 2023-12-22 19:05:55,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=736386.6666666666, ans=0.1 2023-12-22 19:05:58,618 INFO [train.py:886] (1/4) Epoch 24, batch 850, loss[loss=0.0138, audio_tagging_loss=0.0138, over 25000.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4880777.65 frames. ], batch size: 100, lr: 4.54e-03, grad_scale: 64.0 2023-12-22 19:05:59,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=736453.3333333334, ans=0.0 2023-12-22 19:06:05,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=736453.3333333334, ans=0.125 2023-12-22 19:06:07,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=736453.3333333334, ans=0.0 2023-12-22 19:06:10,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=736520.0, ans=0.125 2023-12-22 19:06:18,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=736520.0, ans=0.0 2023-12-22 19:06:24,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=736586.6666666666, ans=0.1 2023-12-22 19:06:28,767 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=12.0 2023-12-22 19:06:32,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=736653.3333333334, ans=0.07 2023-12-22 19:06:45,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=736720.0, ans=0.125 2023-12-22 19:06:48,459 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.07 vs. limit=15.0 2023-12-22 19:06:50,809 INFO [train.py:886] (1/4) Epoch 24, batch 900, loss[loss=0.01491, audio_tagging_loss=0.01491, over 24750.00 frames. ], tot_loss[loss=0.01337, audio_tagging_loss=0.01337, over 4899292.81 frames. ], batch size: 99, lr: 4.54e-03, grad_scale: 64.0 2023-12-22 19:06:51,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=736786.6666666666, ans=0.0 2023-12-22 19:06:52,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=736786.6666666666, ans=0.125 2023-12-22 19:07:00,585 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.45 vs. limit=6.0 2023-12-22 19:07:08,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=736853.3333333334, ans=0.125 2023-12-22 19:07:19,826 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.60 vs. limit=15.0 2023-12-22 19:07:22,871 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.749e+01 3.064e+01 3.189e+01 3.308e+01 4.129e+01, threshold=6.377e+01, percent-clipped=0.0 2023-12-22 19:07:42,182 INFO [train.py:886] (1/4) Epoch 24, batch 950, loss[loss=0.01291, audio_tagging_loss=0.01291, over 24750.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4909572.91 frames. ], batch size: 99, lr: 4.54e-03, grad_scale: 64.0 2023-12-22 19:07:46,619 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2023-12-22 19:07:55,471 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.38 vs. limit=15.0 2023-12-22 19:08:10,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=737253.3333333334, ans=0.1 2023-12-22 19:08:15,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=737320.0, ans=0.0 2023-12-22 19:08:18,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=737320.0, ans=0.0 2023-12-22 19:08:34,726 INFO [train.py:886] (1/4) Epoch 24, batch 1000, loss[loss=0.01335, audio_tagging_loss=0.01335, over 24750.00 frames. ], tot_loss[loss=0.01346, audio_tagging_loss=0.01346, over 4907809.40 frames. ], batch size: 99, lr: 4.54e-03, grad_scale: 64.0 2023-12-22 19:09:02,100 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.28 vs. limit=22.5 2023-12-22 19:09:06,970 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.839e+01 3.078e+01 3.149e+01 3.357e+01 3.682e+01, threshold=6.299e+01, percent-clipped=0.0 2023-12-22 19:09:26,880 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.15 vs. limit=15.0 2023-12-22 19:09:27,339 INFO [train.py:886] (1/4) Epoch 24, batch 1050, loss[loss=0.01089, audio_tagging_loss=0.01089, over 22373.00 frames. ], tot_loss[loss=0.01329, audio_tagging_loss=0.01329, over 4918979.50 frames. ], batch size: 107, lr: 4.54e-03, grad_scale: 64.0 2023-12-22 19:09:32,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=737786.6666666666, ans=0.0 2023-12-22 19:09:42,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=737853.3333333334, ans=0.125 2023-12-22 19:09:44,660 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.53 vs. limit=15.0 2023-12-22 19:09:51,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=737920.0, ans=0.0 2023-12-22 19:09:52,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=737920.0, ans=0.125 2023-12-22 19:10:18,278 INFO [train.py:886] (1/4) Epoch 24, batch 1100, loss[loss=0.01389, audio_tagging_loss=0.01389, over 23976.00 frames. ], tot_loss[loss=0.01328, audio_tagging_loss=0.01328, over 4932508.07 frames. ], batch size: 100, lr: 4.54e-03, grad_scale: 64.0 2023-12-22 19:10:24,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=738120.0, ans=0.0 2023-12-22 19:10:46,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=738253.3333333334, ans=0.95 2023-12-22 19:10:49,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=738320.0, ans=0.1 2023-12-22 19:10:50,309 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.735e+01 2.982e+01 3.129e+01 3.253e+01 3.544e+01, threshold=6.257e+01, percent-clipped=0.0 2023-12-22 19:11:10,324 INFO [train.py:886] (1/4) Epoch 24, batch 1150, loss[loss=0.01319, audio_tagging_loss=0.01319, over 22231.00 frames. ], tot_loss[loss=0.01324, audio_tagging_loss=0.01324, over 4937184.81 frames. ], batch size: 107, lr: 4.54e-03, grad_scale: 64.0 2023-12-22 19:11:38,185 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.13 vs. limit=6.0 2023-12-22 19:11:39,992 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=15.0 2023-12-22 19:12:02,049 INFO [train.py:886] (1/4) Epoch 24, batch 1200, loss[loss=0.01321, audio_tagging_loss=0.01321, over 25000.00 frames. ], tot_loss[loss=0.01326, audio_tagging_loss=0.01326, over 4939028.63 frames. ], batch size: 100, lr: 4.54e-03, grad_scale: 128.0 2023-12-22 19:12:12,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=738853.3333333334, ans=0.1 2023-12-22 19:12:16,305 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.40 vs. limit=12.0 2023-12-22 19:12:20,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=738853.3333333334, ans=0.125 2023-12-22 19:12:28,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=738920.0, ans=0.0 2023-12-22 19:12:34,315 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.793e+01 3.042e+01 3.211e+01 3.340e+01 3.947e+01, threshold=6.423e+01, percent-clipped=0.0 2023-12-22 19:12:34,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=738986.6666666666, ans=0.125 2023-12-22 19:12:54,874 INFO [train.py:886] (1/4) Epoch 24, batch 1250, loss[loss=0.01486, audio_tagging_loss=0.01486, over 24750.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4933929.60 frames. ], batch size: 99, lr: 4.53e-03, grad_scale: 128.0 2023-12-22 19:13:10,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=739186.6666666666, ans=0.125 2023-12-22 19:13:20,049 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:13:22,200 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.93 vs. limit=22.5 2023-12-22 19:13:46,898 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.49 vs. limit=15.0 2023-12-22 19:13:47,182 INFO [train.py:886] (1/4) Epoch 24, batch 1300, loss[loss=0.01331, audio_tagging_loss=0.01331, over 22111.00 frames. ], tot_loss[loss=0.01334, audio_tagging_loss=0.01334, over 4926807.53 frames. ], batch size: 107, lr: 4.53e-03, grad_scale: 64.0 2023-12-22 19:14:19,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=739653.3333333334, ans=0.125 2023-12-22 19:14:19,863 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.49 vs. limit=15.0 2023-12-22 19:14:20,278 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.767e+01 3.066e+01 3.219e+01 3.358e+01 3.789e+01, threshold=6.439e+01, percent-clipped=0.0 2023-12-22 19:14:25,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=739653.3333333334, ans=10.0 2023-12-22 19:14:27,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=739720.0, ans=0.0 2023-12-22 19:14:30,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=739720.0, ans=0.2 2023-12-22 19:14:38,044 INFO [train.py:886] (1/4) Epoch 24, batch 1350, loss[loss=0.01225, audio_tagging_loss=0.01225, over 25000.00 frames. ], tot_loss[loss=0.01322, audio_tagging_loss=0.01322, over 4925574.51 frames. ], batch size: 100, lr: 4.53e-03, grad_scale: 64.0 2023-12-22 19:14:45,795 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.72 vs. limit=22.5 2023-12-22 19:15:02,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=739920.0, ans=0.125 2023-12-22 19:15:02,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=739920.0, ans=0.125 2023-12-22 19:15:05,664 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:15:11,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=739986.6666666666, ans=15.0 2023-12-22 19:15:16,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=739986.6666666666, ans=0.1 2023-12-22 19:15:21,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=740053.3333333334, ans=0.125 2023-12-22 19:15:27,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=740053.3333333334, ans=0.1 2023-12-22 19:15:30,562 INFO [train.py:886] (1/4) Epoch 24, batch 1400, loss[loss=0.01196, audio_tagging_loss=0.01196, over 24750.00 frames. ], tot_loss[loss=0.01309, audio_tagging_loss=0.01309, over 4934101.24 frames. ], batch size: 99, lr: 4.53e-03, grad_scale: 64.0 2023-12-22 19:15:46,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=740186.6666666666, ans=0.125 2023-12-22 19:16:03,680 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.772e+01 2.993e+01 3.155e+01 3.299e+01 3.866e+01, threshold=6.310e+01, percent-clipped=0.0 2023-12-22 19:16:22,184 INFO [train.py:886] (1/4) Epoch 24, batch 1450, loss[loss=0.01491, audio_tagging_loss=0.01491, over 25000.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 4943266.62 frames. ], batch size: 100, lr: 4.53e-03, grad_scale: 64.0 2023-12-22 19:16:37,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=740520.0, ans=0.125 2023-12-22 19:16:40,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=740586.6666666666, ans=0.5 2023-12-22 19:16:51,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=740653.3333333334, ans=0.0 2023-12-22 19:17:06,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=740720.0, ans=0.125 2023-12-22 19:17:10,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=740720.0, ans=0.2 2023-12-22 19:17:10,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=740720.0, ans=10.0 2023-12-22 19:17:13,104 INFO [train.py:886] (1/4) Epoch 24, batch 1500, loss[loss=0.01249, audio_tagging_loss=0.01249, over 25000.00 frames. ], tot_loss[loss=0.01311, audio_tagging_loss=0.01311, over 4950454.95 frames. ], batch size: 100, lr: 4.53e-03, grad_scale: 64.0 2023-12-22 19:17:25,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=740853.3333333334, ans=0.1 2023-12-22 19:17:31,049 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.71 vs. limit=22.5 2023-12-22 19:17:32,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=740853.3333333334, ans=0.125 2023-12-22 19:17:46,189 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.671e+01 3.063e+01 3.189e+01 3.318e+01 3.904e+01, threshold=6.378e+01, percent-clipped=0.0 2023-12-22 19:17:48,349 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.569e-03 2023-12-22 19:17:50,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=740986.6666666666, ans=0.125 2023-12-22 19:17:50,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=740986.6666666666, ans=0.0 2023-12-22 19:18:05,233 INFO [train.py:886] (1/4) Epoch 24, batch 1550, loss[loss=0.0137, audio_tagging_loss=0.0137, over 24750.00 frames. ], tot_loss[loss=0.01321, audio_tagging_loss=0.01321, over 4954117.81 frames. ], batch size: 99, lr: 4.53e-03, grad_scale: 64.0 2023-12-22 19:18:11,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=741120.0, ans=0.125 2023-12-22 19:18:16,400 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2023-12-22 19:18:20,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=741186.6666666666, ans=0.125 2023-12-22 19:18:41,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=741320.0, ans=0.1 2023-12-22 19:18:56,369 INFO [train.py:886] (1/4) Epoch 24, batch 1600, loss[loss=0.0143, audio_tagging_loss=0.0143, over 24750.00 frames. ], tot_loss[loss=0.01336, audio_tagging_loss=0.01336, over 4945085.77 frames. ], batch size: 99, lr: 4.53e-03, grad_scale: 64.0 2023-12-22 19:19:26,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=741653.3333333334, ans=15.0 2023-12-22 19:19:28,585 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.712e+01 3.060e+01 3.214e+01 3.363e+01 3.973e+01, threshold=6.429e+01, percent-clipped=0.0 2023-12-22 19:19:46,320 INFO [train.py:886] (1/4) Epoch 24, batch 1650, loss[loss=0.0138, audio_tagging_loss=0.0138, over 24750.00 frames. ], tot_loss[loss=0.01328, audio_tagging_loss=0.01328, over 4947752.68 frames. ], batch size: 99, lr: 4.53e-03, grad_scale: 64.0 2023-12-22 19:20:23,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=741986.6666666666, ans=0.125 2023-12-22 19:20:39,651 INFO [train.py:886] (1/4) Epoch 24, batch 1700, loss[loss=0.01123, audio_tagging_loss=0.01123, over 24014.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 4947054.78 frames. ], batch size: 100, lr: 4.53e-03, grad_scale: 64.0 2023-12-22 19:20:43,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.56 vs. limit=10.0 2023-12-22 19:20:51,196 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:20:54,500 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=22.5 2023-12-22 19:20:57,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=742253.3333333334, ans=0.0 2023-12-22 19:21:03,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=742253.3333333334, ans=0.125 2023-12-22 19:21:12,329 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.769e+01 2.988e+01 3.129e+01 3.264e+01 3.998e+01, threshold=6.258e+01, percent-clipped=0.0 2023-12-22 19:21:16,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=742320.0, ans=0.125 2023-12-22 19:21:27,156 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.94 vs. limit=15.0 2023-12-22 19:21:29,584 INFO [train.py:886] (1/4) Epoch 24, batch 1750, loss[loss=0.01404, audio_tagging_loss=0.01404, over 25000.00 frames. ], tot_loss[loss=0.0131, audio_tagging_loss=0.0131, over 4951506.81 frames. ], batch size: 100, lr: 4.52e-03, grad_scale: 64.0 2023-12-22 19:21:41,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=742520.0, ans=0.125 2023-12-22 19:21:58,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=742586.6666666666, ans=0.125 2023-12-22 19:22:22,569 INFO [train.py:886] (1/4) Epoch 24, batch 1800, loss[loss=0.01308, audio_tagging_loss=0.01308, over 25000.00 frames. ], tot_loss[loss=0.01316, audio_tagging_loss=0.01316, over 4956448.87 frames. ], batch size: 100, lr: 4.52e-03, grad_scale: 64.0 2023-12-22 19:22:23,205 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.05 vs. limit=15.0 2023-12-22 19:22:23,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=742786.6666666666, ans=0.0 2023-12-22 19:22:39,938 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.46 vs. limit=15.0 2023-12-22 19:22:44,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=742920.0, ans=0.0 2023-12-22 19:22:52,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=742986.6666666666, ans=0.125 2023-12-22 19:22:55,020 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.809e+01 3.072e+01 3.188e+01 3.328e+01 3.715e+01, threshold=6.376e+01, percent-clipped=0.0 2023-12-22 19:23:01,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=743053.3333333334, ans=0.02 2023-12-22 19:23:12,663 INFO [train.py:886] (1/4) Epoch 24, batch 1850, loss[loss=0.01214, audio_tagging_loss=0.01214, over 24750.00 frames. ], tot_loss[loss=0.01321, audio_tagging_loss=0.01321, over 4959078.59 frames. ], batch size: 99, lr: 4.52e-03, grad_scale: 64.0 2023-12-22 19:23:16,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=743120.0, ans=0.0 2023-12-22 19:23:25,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=743186.6666666666, ans=0.0 2023-12-22 19:23:38,499 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.96 vs. limit=15.0 2023-12-22 19:23:50,022 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:23:51,878 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:23:56,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=743386.6666666666, ans=0.125 2023-12-22 19:24:02,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=743453.3333333334, ans=0.0 2023-12-22 19:24:03,133 INFO [train.py:886] (1/4) Epoch 24, batch 1900, loss[loss=0.01667, audio_tagging_loss=0.01667, over 24750.00 frames. ], tot_loss[loss=0.01337, audio_tagging_loss=0.01337, over 4948444.55 frames. ], batch size: 99, lr: 4.52e-03, grad_scale: 64.0 2023-12-22 19:24:07,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=743453.3333333334, ans=0.1 2023-12-22 19:24:16,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=743520.0, ans=0.125 2023-12-22 19:24:17,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=743520.0, ans=0.125 2023-12-22 19:24:23,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=743586.6666666666, ans=0.125 2023-12-22 19:24:29,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=743586.6666666666, ans=0.125 2023-12-22 19:24:30,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=743586.6666666666, ans=0.125 2023-12-22 19:24:35,414 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.696e+01 3.079e+01 3.221e+01 3.350e+01 3.840e+01, threshold=6.442e+01, percent-clipped=0.0 2023-12-22 19:24:55,068 INFO [train.py:886] (1/4) Epoch 24, batch 1950, loss[loss=0.01327, audio_tagging_loss=0.01327, over 25000.00 frames. ], tot_loss[loss=0.01319, audio_tagging_loss=0.01319, over 4943812.81 frames. ], batch size: 100, lr: 4.52e-03, grad_scale: 64.0 2023-12-22 19:24:59,113 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:25:06,171 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.95 vs. limit=15.0 2023-12-22 19:25:06,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=743853.3333333334, ans=0.125 2023-12-22 19:25:08,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=743853.3333333334, ans=0.125 2023-12-22 19:25:16,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=743920.0, ans=0.125 2023-12-22 19:25:18,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=743920.0, ans=0.1 2023-12-22 19:25:24,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=743986.6666666666, ans=0.0 2023-12-22 19:25:31,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=743986.6666666666, ans=0.0 2023-12-22 19:25:36,038 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.76 vs. limit=6.0 2023-12-22 19:25:45,862 INFO [train.py:886] (1/4) Epoch 24, batch 2000, loss[loss=0.01626, audio_tagging_loss=0.01626, over 25000.00 frames. ], tot_loss[loss=0.0132, audio_tagging_loss=0.0132, over 4942298.02 frames. ], batch size: 100, lr: 4.52e-03, grad_scale: 64.0 2023-12-22 19:25:46,039 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:26:05,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.40 vs. limit=12.0 2023-12-22 19:26:07,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=744253.3333333334, ans=0.1 2023-12-22 19:26:17,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=744320.0, ans=0.125 2023-12-22 19:26:18,953 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.621e+01 2.994e+01 3.122e+01 3.277e+01 4.126e+01, threshold=6.244e+01, percent-clipped=0.0 2023-12-22 19:26:30,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=744386.6666666666, ans=0.2 2023-12-22 19:26:38,098 INFO [train.py:886] (1/4) Epoch 24, batch 2050, loss[loss=0.01306, audio_tagging_loss=0.01306, over 25000.00 frames. ], tot_loss[loss=0.01315, audio_tagging_loss=0.01315, over 4944747.67 frames. ], batch size: 100, lr: 4.52e-03, grad_scale: 64.0 2023-12-22 19:26:43,294 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.64 vs. limit=22.5 2023-12-22 19:26:48,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=744520.0, ans=0.1 2023-12-22 19:26:55,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=744520.0, ans=0.05 2023-12-22 19:27:03,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=744586.6666666666, ans=0.1 2023-12-22 19:27:22,936 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.98 vs. limit=10.0 2023-12-22 19:27:28,906 INFO [train.py:886] (1/4) Epoch 24, batch 2100, loss[loss=0.01553, audio_tagging_loss=0.01553, over 25000.00 frames. ], tot_loss[loss=0.0131, audio_tagging_loss=0.0131, over 4949917.14 frames. ], batch size: 100, lr: 4.52e-03, grad_scale: 64.0 2023-12-22 19:27:30,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=744786.6666666666, ans=0.125 2023-12-22 19:27:35,282 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.30 vs. limit=12.0 2023-12-22 19:27:46,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=744853.3333333334, ans=0.125 2023-12-22 19:27:49,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=744920.0, ans=0.0 2023-12-22 19:28:02,087 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.817e+01 3.004e+01 3.137e+01 3.300e+01 3.808e+01, threshold=6.274e+01, percent-clipped=0.0 2023-12-22 19:28:21,337 INFO [train.py:886] (1/4) Epoch 24, batch 2150, loss[loss=0.01062, audio_tagging_loss=0.01062, over 24057.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4953428.07 frames. ], batch size: 100, lr: 4.52e-03, grad_scale: 64.0 2023-12-22 19:28:28,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=745120.0, ans=0.07 2023-12-22 19:28:29,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=745120.0, ans=0.1 2023-12-22 19:28:37,177 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:28:51,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=745320.0, ans=0.0 2023-12-22 19:29:13,600 INFO [train.py:886] (1/4) Epoch 24, batch 2200, loss[loss=0.01277, audio_tagging_loss=0.01277, over 24750.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4947587.93 frames. ], batch size: 99, lr: 4.52e-03, grad_scale: 64.0 2023-12-22 19:29:46,460 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.589e+01 3.066e+01 3.183e+01 3.365e+01 3.931e+01, threshold=6.367e+01, percent-clipped=0.0 2023-12-22 19:30:04,874 INFO [train.py:886] (1/4) Epoch 24, batch 2250, loss[loss=0.01246, audio_tagging_loss=0.01246, over 24750.00 frames. ], tot_loss[loss=0.0133, audio_tagging_loss=0.0133, over 4944194.44 frames. ], batch size: 99, lr: 4.51e-03, grad_scale: 64.0 2023-12-22 19:30:06,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=745786.6666666666, ans=0.0 2023-12-22 19:30:07,227 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.58 vs. limit=22.5 2023-12-22 19:30:10,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=745786.6666666666, ans=0.0 2023-12-22 19:30:25,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=745920.0, ans=0.125 2023-12-22 19:30:36,684 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2023-12-22 19:30:39,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=745986.6666666666, ans=0.125 2023-12-22 19:30:56,862 INFO [train.py:886] (1/4) Epoch 24, batch 2300, loss[loss=0.01189, audio_tagging_loss=0.01189, over 25000.00 frames. ], tot_loss[loss=0.0133, audio_tagging_loss=0.0133, over 4944470.67 frames. ], batch size: 100, lr: 4.51e-03, grad_scale: 64.0 2023-12-22 19:31:22,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=746253.3333333334, ans=0.125 2023-12-22 19:31:29,089 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.760e+01 2.999e+01 3.171e+01 3.296e+01 4.791e+01, threshold=6.342e+01, percent-clipped=0.0 2023-12-22 19:31:48,239 INFO [train.py:886] (1/4) Epoch 24, batch 2350, loss[loss=0.01283, audio_tagging_loss=0.01283, over 25000.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4947224.72 frames. ], batch size: 100, lr: 4.51e-03, grad_scale: 64.0 2023-12-22 19:31:48,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=746453.3333333334, ans=0.1 2023-12-22 19:31:58,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=746520.0, ans=0.0 2023-12-22 19:32:41,824 INFO [train.py:886] (1/4) Epoch 24, batch 2400, loss[loss=0.01359, audio_tagging_loss=0.01359, over 25000.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 4950779.22 frames. ], batch size: 100, lr: 4.51e-03, grad_scale: 64.0 2023-12-22 19:33:05,221 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.01 vs. limit=22.5 2023-12-22 19:33:14,825 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.682e+01 2.999e+01 3.142e+01 3.302e+01 4.831e+01, threshold=6.284e+01, percent-clipped=0.0 2023-12-22 19:33:34,066 INFO [train.py:886] (1/4) Epoch 24, batch 2450, loss[loss=0.01486, audio_tagging_loss=0.01486, over 25000.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 4955366.60 frames. ], batch size: 100, lr: 4.51e-03, grad_scale: 64.0 2023-12-22 19:33:51,046 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.08 vs. limit=15.0 2023-12-22 19:33:52,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=747186.6666666666, ans=0.125 2023-12-22 19:34:09,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=747320.0, ans=0.125 2023-12-22 19:34:25,712 INFO [train.py:886] (1/4) Epoch 24, batch 2500, loss[loss=0.01432, audio_tagging_loss=0.01432, over 24750.00 frames. ], tot_loss[loss=0.01324, audio_tagging_loss=0.01324, over 4954611.00 frames. ], batch size: 99, lr: 4.51e-03, grad_scale: 64.0 2023-12-22 19:34:39,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=747520.0, ans=0.125 2023-12-22 19:34:58,727 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.722e+01 3.071e+01 3.224e+01 3.383e+01 4.585e+01, threshold=6.447e+01, percent-clipped=0.0 2023-12-22 19:35:05,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=747653.3333333334, ans=0.125 2023-12-22 19:35:17,088 INFO [train.py:886] (1/4) Epoch 24, batch 2550, loss[loss=0.01257, audio_tagging_loss=0.01257, over 24750.00 frames. ], tot_loss[loss=0.01334, audio_tagging_loss=0.01334, over 4948894.93 frames. ], batch size: 99, lr: 4.51e-03, grad_scale: 64.0 2023-12-22 19:35:25,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=747786.6666666666, ans=0.125 2023-12-22 19:35:42,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=747920.0, ans=0.125 2023-12-22 19:35:50,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=747986.6666666666, ans=0.125 2023-12-22 19:35:50,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=747986.6666666666, ans=0.125 2023-12-22 19:35:55,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=747986.6666666666, ans=0.125 2023-12-22 19:36:09,583 INFO [train.py:886] (1/4) Epoch 24, batch 2600, loss[loss=0.01476, audio_tagging_loss=0.01476, over 24750.00 frames. ], tot_loss[loss=0.01331, audio_tagging_loss=0.01331, over 4948476.95 frames. ], batch size: 99, lr: 4.51e-03, grad_scale: 64.0 2023-12-22 19:36:16,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=748120.0, ans=0.0 2023-12-22 19:36:17,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=748120.0, ans=0.125 2023-12-22 19:36:20,256 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:36:41,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=748320.0, ans=0.125 2023-12-22 19:36:42,264 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.710e+01 3.007e+01 3.151e+01 3.323e+01 3.960e+01, threshold=6.303e+01, percent-clipped=0.0 2023-12-22 19:36:50,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=748386.6666666666, ans=0.1 2023-12-22 19:37:00,806 INFO [train.py:886] (1/4) Epoch 24, batch 2650, loss[loss=0.011, audio_tagging_loss=0.011, over 25000.00 frames. ], tot_loss[loss=0.01327, audio_tagging_loss=0.01327, over 4950031.11 frames. ], batch size: 100, lr: 4.51e-03, grad_scale: 64.0 2023-12-22 19:37:02,295 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.11 vs. limit=15.0 2023-12-22 19:37:22,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=748586.6666666666, ans=0.07 2023-12-22 19:37:35,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=748653.3333333334, ans=0.1 2023-12-22 19:37:52,402 INFO [train.py:886] (1/4) Epoch 24, batch 2700, loss[loss=0.01526, audio_tagging_loss=0.01526, over 24750.00 frames. ], tot_loss[loss=0.01319, audio_tagging_loss=0.01319, over 4951417.83 frames. ], batch size: 99, lr: 4.51e-03, grad_scale: 32.0 2023-12-22 19:38:02,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=748853.3333333334, ans=0.1 2023-12-22 19:38:04,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=748853.3333333334, ans=0.125 2023-12-22 19:38:26,497 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.684e+01 3.070e+01 3.195e+01 3.314e+01 3.793e+01, threshold=6.390e+01, percent-clipped=0.0 2023-12-22 19:38:31,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=748986.6666666666, ans=0.2 2023-12-22 19:38:44,138 INFO [train.py:886] (1/4) Epoch 24, batch 2750, loss[loss=0.01286, audio_tagging_loss=0.01286, over 25000.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 4954816.98 frames. ], batch size: 100, lr: 4.50e-03, grad_scale: 32.0 2023-12-22 19:38:45,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=749120.0, ans=0.125 2023-12-22 19:38:48,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=749120.0, ans=0.2 2023-12-22 19:39:01,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=749186.6666666666, ans=0.0 2023-12-22 19:39:03,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=749253.3333333334, ans=0.0 2023-12-22 19:39:20,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=749320.0, ans=0.125 2023-12-22 19:39:20,343 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=15.0 2023-12-22 19:39:35,699 INFO [train.py:886] (1/4) Epoch 24, batch 2800, loss[loss=0.01748, audio_tagging_loss=0.01748, over 24946.00 frames. ], tot_loss[loss=0.01322, audio_tagging_loss=0.01322, over 4956318.85 frames. ], batch size: 100, lr: 4.50e-03, grad_scale: 32.0 2023-12-22 19:39:50,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=749520.0, ans=0.125 2023-12-22 19:40:09,523 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.619e+01 3.066e+01 3.201e+01 3.379e+01 3.899e+01, threshold=6.402e+01, percent-clipped=0.0 2023-12-22 19:40:09,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=749653.3333333334, ans=0.125 2023-12-22 19:40:11,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=749653.3333333334, ans=0.125 2023-12-22 19:40:15,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=749653.3333333334, ans=0.0 2023-12-22 19:40:19,803 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.11 vs. limit=15.0 2023-12-22 19:40:28,424 INFO [train.py:886] (1/4) Epoch 24, batch 2850, loss[loss=0.015, audio_tagging_loss=0.015, over 24750.00 frames. ], tot_loss[loss=0.01324, audio_tagging_loss=0.01324, over 4949714.57 frames. ], batch size: 99, lr: 4.50e-03, grad_scale: 32.0 2023-12-22 19:40:31,711 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.28 vs. limit=15.0 2023-12-22 19:40:35,769 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.64 vs. limit=6.0 2023-12-22 19:40:51,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=749920.0, ans=0.125 2023-12-22 19:40:58,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=749920.0, ans=0.0 2023-12-22 19:41:19,648 INFO [train.py:886] (1/4) Epoch 24, batch 2900, loss[loss=0.01257, audio_tagging_loss=0.01257, over 24750.00 frames. ], tot_loss[loss=0.01327, audio_tagging_loss=0.01327, over 4949505.92 frames. ], batch size: 99, lr: 4.50e-03, grad_scale: 32.0 2023-12-22 19:41:23,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=750120.0, ans=0.125 2023-12-22 19:41:52,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=750320.0, ans=0.09899494936611666 2023-12-22 19:41:53,959 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.695e+01 3.002e+01 3.182e+01 3.351e+01 5.287e+01, threshold=6.364e+01, percent-clipped=0.0 2023-12-22 19:42:01,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=750386.6666666666, ans=0.125 2023-12-22 19:42:09,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=750386.6666666666, ans=0.1 2023-12-22 19:42:12,212 INFO [train.py:886] (1/4) Epoch 24, batch 2950, loss[loss=0.01397, audio_tagging_loss=0.01397, over 25000.00 frames. ], tot_loss[loss=0.01312, audio_tagging_loss=0.01312, over 4947464.36 frames. ], batch size: 100, lr: 4.50e-03, grad_scale: 32.0 2023-12-22 19:42:23,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=750520.0, ans=0.125 2023-12-22 19:42:38,815 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:42:44,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=750653.3333333334, ans=0.125 2023-12-22 19:43:03,928 INFO [train.py:886] (1/4) Epoch 24, batch 3000, loss[loss=0.01405, audio_tagging_loss=0.01405, over 24750.00 frames. ], tot_loss[loss=0.0131, audio_tagging_loss=0.0131, over 4950136.61 frames. ], batch size: 99, lr: 4.50e-03, grad_scale: 32.0 2023-12-22 19:43:03,929 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 19:43:24,952 INFO [train.py:917] (1/4) Epoch 24, validation: loss=0.03301, audio_tagging_loss=0.03301, over 3737520.00 frames. 2023-12-22 19:43:24,952 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 19:43:38,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=750853.3333333334, ans=0.1 2023-12-22 19:43:44,277 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.90 vs. limit=15.0 2023-12-22 19:43:46,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=750920.0, ans=0.125 2023-12-22 19:43:59,580 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.765e+01 2.995e+01 3.133e+01 3.253e+01 3.784e+01, threshold=6.265e+01, percent-clipped=0.0 2023-12-22 19:43:59,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=750986.6666666666, ans=0.0 2023-12-22 19:44:09,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=751053.3333333334, ans=0.1 2023-12-22 19:44:09,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=751053.3333333334, ans=0.1 2023-12-22 19:44:14,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=751053.3333333334, ans=0.125 2023-12-22 19:44:14,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=751053.3333333334, ans=0.125 2023-12-22 19:44:17,113 INFO [train.py:886] (1/4) Epoch 24, batch 3050, loss[loss=0.01334, audio_tagging_loss=0.01334, over 25000.00 frames. ], tot_loss[loss=0.01309, audio_tagging_loss=0.01309, over 4957289.62 frames. ], batch size: 100, lr: 4.50e-03, grad_scale: 32.0 2023-12-22 19:44:22,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=751120.0, ans=0.0 2023-12-22 19:44:24,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=751120.0, ans=0.2 2023-12-22 19:44:33,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=751186.6666666666, ans=0.125 2023-12-22 19:44:54,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=751320.0, ans=0.1 2023-12-22 19:45:05,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=751386.6666666666, ans=0.125 2023-12-22 19:45:08,382 INFO [train.py:886] (1/4) Epoch 24, batch 3100, loss[loss=0.01531, audio_tagging_loss=0.01531, over 25000.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 4956028.60 frames. ], batch size: 100, lr: 4.50e-03, grad_scale: 32.0 2023-12-22 19:45:11,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=751453.3333333334, ans=0.0 2023-12-22 19:45:26,257 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.60 vs. limit=15.0 2023-12-22 19:45:43,073 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.717e+01 3.050e+01 3.194e+01 3.363e+01 4.178e+01, threshold=6.387e+01, percent-clipped=0.0 2023-12-22 19:45:52,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=751720.0, ans=0.1 2023-12-22 19:45:56,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=751720.0, ans=0.0 2023-12-22 19:45:59,791 INFO [train.py:886] (1/4) Epoch 24, batch 3150, loss[loss=0.01368, audio_tagging_loss=0.01368, over 24750.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4953807.47 frames. ], batch size: 99, lr: 4.50e-03, grad_scale: 32.0 2023-12-22 19:46:00,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=751786.6666666666, ans=0.0 2023-12-22 19:46:02,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=751786.6666666666, ans=0.125 2023-12-22 19:46:06,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=751786.6666666666, ans=0.0 2023-12-22 19:46:25,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=751920.0, ans=0.0 2023-12-22 19:46:28,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=751920.0, ans=0.125 2023-12-22 19:46:46,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=752053.3333333334, ans=0.0 2023-12-22 19:46:48,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=752053.3333333334, ans=0.125 2023-12-22 19:46:52,406 INFO [train.py:886] (1/4) Epoch 24, batch 3200, loss[loss=0.01309, audio_tagging_loss=0.01309, over 24750.00 frames. ], tot_loss[loss=0.01323, audio_tagging_loss=0.01323, over 4948176.96 frames. ], batch size: 99, lr: 4.50e-03, grad_scale: 32.0 2023-12-22 19:46:57,755 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.54 vs. limit=15.0 2023-12-22 19:47:00,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=752120.0, ans=0.0 2023-12-22 19:47:23,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=752320.0, ans=0.125 2023-12-22 19:47:23,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=752320.0, ans=0.125 2023-12-22 19:47:26,255 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.746e+01 3.034e+01 3.141e+01 3.276e+01 3.703e+01, threshold=6.281e+01, percent-clipped=0.0 2023-12-22 19:47:30,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=752320.0, ans=0.125 2023-12-22 19:47:42,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=752386.6666666666, ans=0.2 2023-12-22 19:47:43,744 INFO [train.py:886] (1/4) Epoch 24, batch 3250, loss[loss=0.01176, audio_tagging_loss=0.01176, over 24750.00 frames. ], tot_loss[loss=0.01323, audio_tagging_loss=0.01323, over 4947697.25 frames. ], batch size: 99, lr: 4.49e-03, grad_scale: 32.0 2023-12-22 19:47:47,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=752453.3333333334, ans=0.125 2023-12-22 19:47:50,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=752453.3333333334, ans=0.125 2023-12-22 19:47:51,369 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=15.0 2023-12-22 19:48:01,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=752520.0, ans=0.2 2023-12-22 19:48:16,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=752653.3333333334, ans=0.125 2023-12-22 19:48:17,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=752653.3333333334, ans=0.125 2023-12-22 19:48:35,259 INFO [train.py:886] (1/4) Epoch 24, batch 3300, loss[loss=0.01568, audio_tagging_loss=0.01568, over 25000.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4945314.42 frames. ], batch size: 100, lr: 4.49e-03, grad_scale: 32.0 2023-12-22 19:48:45,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.70 vs. limit=15.0 2023-12-22 19:49:00,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=752920.0, ans=0.0 2023-12-22 19:49:09,402 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.696e+01 3.005e+01 3.126e+01 3.283e+01 3.763e+01, threshold=6.252e+01, percent-clipped=0.0 2023-12-22 19:49:27,688 INFO [train.py:886] (1/4) Epoch 24, batch 3350, loss[loss=0.01365, audio_tagging_loss=0.01365, over 25000.00 frames. ], tot_loss[loss=0.01315, audio_tagging_loss=0.01315, over 4945863.49 frames. ], batch size: 100, lr: 4.49e-03, grad_scale: 32.0 2023-12-22 19:49:30,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=753120.0, ans=0.125 2023-12-22 19:49:44,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=753186.6666666666, ans=0.0 2023-12-22 19:49:58,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=753320.0, ans=0.0 2023-12-22 19:50:04,380 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.56 vs. limit=15.0 2023-12-22 19:50:16,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=753386.6666666666, ans=0.125 2023-12-22 19:50:16,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=753386.6666666666, ans=0.125 2023-12-22 19:50:18,464 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.76 vs. limit=15.0 2023-12-22 19:50:18,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=753453.3333333334, ans=0.0 2023-12-22 19:50:19,716 INFO [train.py:886] (1/4) Epoch 24, batch 3400, loss[loss=0.01009, audio_tagging_loss=0.01009, over 25000.00 frames. ], tot_loss[loss=0.01313, audio_tagging_loss=0.01313, over 4951522.67 frames. ], batch size: 100, lr: 4.49e-03, grad_scale: 32.0 2023-12-22 19:50:24,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=753453.3333333334, ans=0.1 2023-12-22 19:50:25,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=753453.3333333334, ans=0.2 2023-12-22 19:50:26,724 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.04 vs. limit=22.5 2023-12-22 19:50:37,064 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.41 vs. limit=15.0 2023-12-22 19:50:42,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=753586.6666666666, ans=0.125 2023-12-22 19:50:54,455 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.741e+01 3.015e+01 3.148e+01 3.318e+01 3.787e+01, threshold=6.295e+01, percent-clipped=0.0 2023-12-22 19:50:54,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=753653.3333333334, ans=0.0 2023-12-22 19:51:11,218 INFO [train.py:886] (1/4) Epoch 24, batch 3450, loss[loss=0.01177, audio_tagging_loss=0.01177, over 24750.00 frames. ], tot_loss[loss=0.01323, audio_tagging_loss=0.01323, over 4940939.05 frames. ], batch size: 99, lr: 4.49e-03, grad_scale: 32.0 2023-12-22 19:51:13,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=753786.6666666666, ans=0.0 2023-12-22 19:51:27,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=753853.3333333334, ans=0.125 2023-12-22 19:51:38,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=753920.0, ans=0.09899494936611666 2023-12-22 19:51:44,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=753986.6666666666, ans=0.0 2023-12-22 19:51:47,450 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.51 vs. limit=15.0 2023-12-22 19:51:54,342 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:52:03,993 INFO [train.py:886] (1/4) Epoch 24, batch 3500, loss[loss=0.01292, audio_tagging_loss=0.01292, over 24750.00 frames. ], tot_loss[loss=0.01339, audio_tagging_loss=0.01339, over 4943455.85 frames. ], batch size: 99, lr: 4.49e-03, grad_scale: 32.0 2023-12-22 19:52:38,069 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.772e+01 3.028e+01 3.212e+01 3.392e+01 3.862e+01, threshold=6.425e+01, percent-clipped=0.0 2023-12-22 19:52:54,934 INFO [train.py:886] (1/4) Epoch 24, batch 3550, loss[loss=0.01319, audio_tagging_loss=0.01319, over 25000.00 frames. ], tot_loss[loss=0.01334, audio_tagging_loss=0.01334, over 4947491.93 frames. ], batch size: 100, lr: 4.49e-03, grad_scale: 32.0 2023-12-22 19:53:10,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=754520.0, ans=0.0 2023-12-22 19:53:16,611 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.98 vs. limit=22.5 2023-12-22 19:53:26,853 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.23 vs. limit=15.0 2023-12-22 19:53:33,718 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.584e-02 2023-12-22 19:53:47,387 INFO [train.py:886] (1/4) Epoch 24, batch 3600, loss[loss=0.00956, audio_tagging_loss=0.00956, over 25000.00 frames. ], tot_loss[loss=0.01324, audio_tagging_loss=0.01324, over 4948335.37 frames. ], batch size: 100, lr: 4.49e-03, grad_scale: 32.0 2023-12-22 19:53:57,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=754853.3333333334, ans=0.125 2023-12-22 19:54:08,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=754920.0, ans=0.1 2023-12-22 19:54:10,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=754920.0, ans=0.1 2023-12-22 19:54:18,193 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.83 vs. limit=12.0 2023-12-22 19:54:20,698 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.714e+01 2.974e+01 3.085e+01 3.211e+01 3.722e+01, threshold=6.169e+01, percent-clipped=0.0 2023-12-22 19:54:37,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=755120.0, ans=0.0 2023-12-22 19:54:38,350 INFO [train.py:886] (1/4) Epoch 24, batch 3650, loss[loss=0.01353, audio_tagging_loss=0.01353, over 25000.00 frames. ], tot_loss[loss=0.01313, audio_tagging_loss=0.01313, over 4950811.74 frames. ], batch size: 100, lr: 4.49e-03, grad_scale: 32.0 2023-12-22 19:55:01,477 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.76 vs. limit=15.0 2023-12-22 19:55:08,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=755320.0, ans=0.1 2023-12-22 19:55:29,376 INFO [train.py:886] (1/4) Epoch 24, batch 3700, loss[loss=0.0148, audio_tagging_loss=0.0148, over 25000.00 frames. ], tot_loss[loss=0.01315, audio_tagging_loss=0.01315, over 4954956.99 frames. ], batch size: 100, lr: 4.49e-03, grad_scale: 32.0 2023-12-22 19:55:47,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=755520.0, ans=0.1 2023-12-22 19:55:49,753 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:55:54,574 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.55 vs. limit=6.0 2023-12-22 19:56:03,431 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.793e+01 3.064e+01 3.214e+01 3.340e+01 3.763e+01, threshold=6.428e+01, percent-clipped=0.0 2023-12-22 19:56:20,939 INFO [train.py:886] (1/4) Epoch 24, batch 3750, loss[loss=0.01622, audio_tagging_loss=0.01622, over 24750.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4948031.86 frames. ], batch size: 99, lr: 4.48e-03, grad_scale: 32.0 2023-12-22 19:56:34,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=755853.3333333334, ans=0.125 2023-12-22 19:56:36,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=755853.3333333334, ans=0.125 2023-12-22 19:56:39,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.49 vs. limit=22.5 2023-12-22 19:57:12,855 INFO [train.py:886] (1/4) Epoch 24, batch 3800, loss[loss=0.01475, audio_tagging_loss=0.01475, over 24750.00 frames. ], tot_loss[loss=0.01329, audio_tagging_loss=0.01329, over 4946897.13 frames. ], batch size: 99, lr: 4.48e-03, grad_scale: 32.0 2023-12-22 19:57:47,017 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.557e+01 3.075e+01 3.225e+01 3.353e+01 3.871e+01, threshold=6.449e+01, percent-clipped=0.0 2023-12-22 19:57:53,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=756386.6666666666, ans=0.0 2023-12-22 19:58:04,514 INFO [train.py:886] (1/4) Epoch 24, batch 3850, loss[loss=0.01463, audio_tagging_loss=0.01463, over 25000.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4943872.89 frames. ], batch size: 100, lr: 4.48e-03, grad_scale: 32.0 2023-12-22 19:58:07,970 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.07 vs. limit=12.0 2023-12-22 19:58:54,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=756720.0, ans=0.125 2023-12-22 19:58:56,726 INFO [train.py:886] (1/4) Epoch 24, batch 3900, loss[loss=0.0137, audio_tagging_loss=0.0137, over 25000.00 frames. ], tot_loss[loss=0.01322, audio_tagging_loss=0.01322, over 4951258.22 frames. ], batch size: 100, lr: 4.48e-03, grad_scale: 32.0 2023-12-22 19:58:59,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=756786.6666666666, ans=0.125 2023-12-22 19:59:11,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=756853.3333333334, ans=0.125 2023-12-22 19:59:22,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.37 vs. limit=15.0 2023-12-22 19:59:29,998 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.786e+01 3.006e+01 3.171e+01 3.357e+01 4.139e+01, threshold=6.342e+01, percent-clipped=0.0 2023-12-22 19:59:34,188 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.46 vs. limit=6.0 2023-12-22 19:59:42,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.13 vs. limit=15.0 2023-12-22 19:59:45,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=757053.3333333334, ans=0.125 2023-12-22 19:59:46,974 INFO [train.py:886] (1/4) Epoch 24, batch 3950, loss[loss=0.01222, audio_tagging_loss=0.01222, over 25000.00 frames. ], tot_loss[loss=0.01311, audio_tagging_loss=0.01311, over 4951861.48 frames. ], batch size: 100, lr: 4.48e-03, grad_scale: 32.0 2023-12-22 20:00:25,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=757320.0, ans=0.0 2023-12-22 20:00:26,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=757320.0, ans=0.1 2023-12-22 20:00:28,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=757386.6666666666, ans=0.2 2023-12-22 20:00:32,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=757386.6666666666, ans=0.0 2023-12-22 20:00:35,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=757386.6666666666, ans=10.0 2023-12-22 20:00:36,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=757386.6666666666, ans=0.0 2023-12-22 20:00:39,003 INFO [train.py:886] (1/4) Epoch 24, batch 4000, loss[loss=0.01186, audio_tagging_loss=0.01186, over 25000.00 frames. ], tot_loss[loss=0.01313, audio_tagging_loss=0.01313, over 4947803.10 frames. ], batch size: 100, lr: 4.48e-03, grad_scale: 32.0 2023-12-22 20:01:11,813 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.788e+01 3.071e+01 3.185e+01 3.335e+01 3.976e+01, threshold=6.370e+01, percent-clipped=0.0 2023-12-22 20:01:12,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=757653.3333333334, ans=0.0 2023-12-22 20:01:20,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=757720.0, ans=0.125 2023-12-22 20:01:29,361 INFO [train.py:886] (1/4) Epoch 24, batch 4050, loss[loss=0.01362, audio_tagging_loss=0.01362, over 24750.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4950766.07 frames. ], batch size: 99, lr: 4.48e-03, grad_scale: 32.0 2023-12-22 20:01:33,835 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.22 vs. limit=10.0 2023-12-22 20:01:34,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=757786.6666666666, ans=0.125 2023-12-22 20:01:38,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=757853.3333333334, ans=0.07 2023-12-22 20:01:40,600 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 20:01:42,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=757853.3333333334, ans=0.04949747468305833 2023-12-22 20:01:55,074 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.75 vs. limit=6.0 2023-12-22 20:02:18,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.68 vs. limit=12.0 2023-12-22 20:02:20,030 INFO [train.py:886] (1/4) Epoch 24, batch 4100, loss[loss=0.01223, audio_tagging_loss=0.01223, over 24750.00 frames. ], tot_loss[loss=0.01337, audio_tagging_loss=0.01337, over 4949277.62 frames. ], batch size: 99, lr: 4.48e-03, grad_scale: 32.0 2023-12-22 20:02:21,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=758120.0, ans=0.2 2023-12-22 20:02:33,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=758186.6666666666, ans=0.125 2023-12-22 20:02:40,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=758253.3333333334, ans=0.0 2023-12-22 20:02:54,135 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.829e+01 3.126e+01 3.267e+01 3.430e+01 4.193e+01, threshold=6.535e+01, percent-clipped=0.0 2023-12-22 20:02:57,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=758320.0, ans=0.125 2023-12-22 20:03:04,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=758386.6666666666, ans=0.125 2023-12-22 20:03:06,664 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.28 vs. limit=12.0 2023-12-22 20:03:08,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=758386.6666666666, ans=0.125 2023-12-22 20:03:11,638 INFO [train.py:886] (1/4) Epoch 24, batch 4150, loss[loss=0.01472, audio_tagging_loss=0.01472, over 24750.00 frames. ], tot_loss[loss=0.01324, audio_tagging_loss=0.01324, over 4953610.10 frames. ], batch size: 99, lr: 4.48e-03, grad_scale: 32.0 2023-12-22 20:03:17,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=758453.3333333334, ans=0.0 2023-12-22 20:03:44,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=758653.3333333334, ans=0.125 2023-12-22 20:03:51,639 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.06 vs. limit=22.5 2023-12-22 20:03:52,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=758653.3333333334, ans=0.0 2023-12-22 20:04:03,479 INFO [train.py:886] (1/4) Epoch 24, batch 4200, loss[loss=0.01303, audio_tagging_loss=0.01303, over 24750.00 frames. ], tot_loss[loss=0.01321, audio_tagging_loss=0.01321, over 4957987.87 frames. ], batch size: 99, lr: 4.48e-03, grad_scale: 32.0 2023-12-22 20:04:17,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=758853.3333333334, ans=0.125 2023-12-22 20:04:35,358 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.57 vs. limit=6.0 2023-12-22 20:04:38,258 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.640e+01 3.033e+01 3.184e+01 3.387e+01 4.147e+01, threshold=6.368e+01, percent-clipped=0.0 2023-12-22 20:04:45,159 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.16 vs. limit=12.0 2023-12-22 20:04:48,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=759053.3333333334, ans=0.025 2023-12-22 20:04:55,890 INFO [train.py:886] (1/4) Epoch 24, batch 4250, loss[loss=0.0144, audio_tagging_loss=0.0144, over 25000.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4953051.37 frames. ], batch size: 100, lr: 4.47e-03, grad_scale: 32.0 2023-12-22 20:04:58,965 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2023-12-22 20:05:06,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=759186.6666666666, ans=0.0 2023-12-22 20:05:19,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=759253.3333333334, ans=0.0 2023-12-22 20:05:26,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=759320.0, ans=0.125 2023-12-22 20:05:34,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=759320.0, ans=0.0 2023-12-22 20:05:36,014 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.31 vs. limit=15.0 2023-12-22 20:05:47,503 INFO [train.py:886] (1/4) Epoch 24, batch 4300, loss[loss=0.01206, audio_tagging_loss=0.01206, over 25000.00 frames. ], tot_loss[loss=0.01309, audio_tagging_loss=0.01309, over 4954881.04 frames. ], batch size: 100, lr: 4.47e-03, grad_scale: 32.0 2023-12-22 20:05:54,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=759453.3333333334, ans=0.05 2023-12-22 20:06:04,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=759520.0, ans=0.2 2023-12-22 20:06:11,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=759586.6666666666, ans=0.1 2023-12-22 20:06:21,639 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.739e+01 3.083e+01 3.257e+01 3.408e+01 3.899e+01, threshold=6.514e+01, percent-clipped=0.0 2023-12-22 20:06:28,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=759720.0, ans=0.125 2023-12-22 20:06:39,220 INFO [train.py:886] (1/4) Epoch 24, batch 4350, loss[loss=0.01498, audio_tagging_loss=0.01498, over 25000.00 frames. ], tot_loss[loss=0.01321, audio_tagging_loss=0.01321, over 4951316.34 frames. ], batch size: 100, lr: 4.47e-03, grad_scale: 32.0 2023-12-22 20:06:39,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=759786.6666666666, ans=0.125 2023-12-22 20:07:25,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=760053.3333333334, ans=0.0 2023-12-22 20:07:29,314 INFO [train.py:886] (1/4) Epoch 24, batch 4400, loss[loss=0.01276, audio_tagging_loss=0.01276, over 24750.00 frames. ], tot_loss[loss=0.01332, audio_tagging_loss=0.01332, over 4948670.36 frames. ], batch size: 99, lr: 4.47e-03, grad_scale: 32.0 2023-12-22 20:07:33,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=760120.0, ans=0.0 2023-12-22 20:07:59,251 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.28 vs. limit=15.0 2023-12-22 20:08:01,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=760320.0, ans=0.125 2023-12-22 20:08:03,221 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.725e+01 3.094e+01 3.285e+01 3.414e+01 4.025e+01, threshold=6.569e+01, percent-clipped=0.0 2023-12-22 20:08:17,029 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.55 vs. limit=6.0 2023-12-22 20:08:20,887 INFO [train.py:886] (1/4) Epoch 24, batch 4450, loss[loss=0.01364, audio_tagging_loss=0.01364, over 24750.00 frames. ], tot_loss[loss=0.01338, audio_tagging_loss=0.01338, over 4944607.20 frames. ], batch size: 99, lr: 4.47e-03, grad_scale: 32.0 2023-12-22 20:08:31,945 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.63 vs. limit=22.5 2023-12-22 20:09:00,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=760720.0, ans=0.0 2023-12-22 20:09:07,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=760720.0, ans=0.125 2023-12-22 20:09:11,069 INFO [train.py:886] (1/4) Epoch 24, batch 4500, loss[loss=0.01208, audio_tagging_loss=0.01208, over 24750.00 frames. ], tot_loss[loss=0.0133, audio_tagging_loss=0.0133, over 4942907.91 frames. ], batch size: 99, lr: 4.47e-03, grad_scale: 32.0 2023-12-22 20:09:44,441 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.800e+01 3.026e+01 3.195e+01 3.356e+01 3.785e+01, threshold=6.390e+01, percent-clipped=0.0 2023-12-22 20:09:51,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=761053.3333333334, ans=0.125 2023-12-22 20:10:02,168 INFO [train.py:886] (1/4) Epoch 24, batch 4550, loss[loss=0.01011, audio_tagging_loss=0.01011, over 25000.00 frames. ], tot_loss[loss=0.01319, audio_tagging_loss=0.01319, over 4948338.56 frames. ], batch size: 100, lr: 4.47e-03, grad_scale: 32.0 2023-12-22 20:10:08,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=761120.0, ans=0.0 2023-12-22 20:10:52,872 INFO [train.py:886] (1/4) Epoch 24, batch 4600, loss[loss=0.01252, audio_tagging_loss=0.01252, over 21959.00 frames. ], tot_loss[loss=0.01315, audio_tagging_loss=0.01315, over 4949687.99 frames. ], batch size: 107, lr: 4.47e-03, grad_scale: 32.0 2023-12-22 20:10:53,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=761453.3333333334, ans=0.2 2023-12-22 20:11:07,848 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 20:11:09,078 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.24 vs. limit=15.0 2023-12-22 20:11:26,353 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.788e+01 3.049e+01 3.188e+01 3.293e+01 3.934e+01, threshold=6.377e+01, percent-clipped=0.0 2023-12-22 20:11:29,313 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.93 vs. limit=12.0 2023-12-22 20:11:43,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=761786.6666666666, ans=0.125 2023-12-22 20:11:43,936 INFO [train.py:886] (1/4) Epoch 24, batch 4650, loss[loss=0.01235, audio_tagging_loss=0.01235, over 25000.00 frames. ], tot_loss[loss=0.01311, audio_tagging_loss=0.01311, over 4954633.00 frames. ], batch size: 100, lr: 4.47e-03, grad_scale: 32.0 2023-12-22 20:11:45,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=761786.6666666666, ans=0.125 2023-12-22 20:11:55,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=761853.3333333334, ans=0.125 2023-12-22 20:12:09,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=761920.0, ans=0.125 2023-12-22 20:12:15,992 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.83 vs. limit=22.5 2023-12-22 20:12:23,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=761986.6666666666, ans=0.125 2023-12-22 20:12:34,692 INFO [train.py:886] (1/4) Epoch 24, batch 4700, loss[loss=0.01059, audio_tagging_loss=0.01059, over 25000.00 frames. ], tot_loss[loss=0.01316, audio_tagging_loss=0.01316, over 4948317.67 frames. ], batch size: 100, lr: 4.47e-03, grad_scale: 64.0 2023-12-22 20:12:42,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=762186.6666666666, ans=0.125 2023-12-22 20:12:52,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=762186.6666666666, ans=0.125 2023-12-22 20:13:04,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=762320.0, ans=0.125 2023-12-22 20:13:04,929 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.51 vs. limit=15.0 2023-12-22 20:13:06,269 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.692e+01 3.057e+01 3.259e+01 3.422e+01 3.879e+01, threshold=6.518e+01, percent-clipped=0.0 2023-12-22 20:13:09,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=762320.0, ans=0.1 2023-12-22 20:13:16,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=762386.6666666666, ans=0.0 2023-12-22 20:13:22,086 INFO [train.py:886] (1/4) Epoch 24, batch 4750, loss[loss=0.01384, audio_tagging_loss=0.01384, over 24750.00 frames. ], tot_loss[loss=0.01334, audio_tagging_loss=0.01334, over 4943395.66 frames. ], batch size: 99, lr: 4.46e-03, grad_scale: 64.0 2023-12-22 20:13:33,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=762520.0, ans=0.125 2023-12-22 20:13:57,231 INFO [train.py:886] (1/4) Epoch 25, batch 0, loss[loss=0.02823, audio_tagging_loss=0.02823, over 24096.00 frames. ], tot_loss[loss=0.02823, audio_tagging_loss=0.02823, over 24096.00 frames. ], batch size: 100, lr: 4.37e-03, grad_scale: 32.0 2023-12-22 20:13:57,231 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 20:14:18,052 INFO [train.py:917] (1/4) Epoch 25, validation: loss=0.03205, audio_tagging_loss=0.03205, over 3737520.00 frames. 2023-12-22 20:14:18,053 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 20:14:18,626 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.64 vs. limit=15.0 2023-12-22 20:14:20,391 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.66 vs. limit=15.0 2023-12-22 20:14:55,642 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.39 vs. limit=10.0 2023-12-22 20:15:09,647 INFO [train.py:886] (1/4) Epoch 25, batch 50, loss[loss=0.01724, audio_tagging_loss=0.01724, over 23997.00 frames. ], tot_loss[loss=0.02059, audio_tagging_loss=0.02059, over 1118694.52 frames. ], batch size: 100, lr: 4.37e-03, grad_scale: 32.0 2023-12-22 20:15:15,595 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 20:15:18,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=762960.0, ans=0.1 2023-12-22 20:15:27,233 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.949e+01 3.385e+01 3.837e+01 4.351e+01 9.829e+01, threshold=7.674e+01, percent-clipped=6.0 2023-12-22 20:15:31,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=763026.6666666666, ans=0.0 2023-12-22 20:15:45,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=763093.3333333334, ans=0.1 2023-12-22 20:16:00,581 INFO [train.py:886] (1/4) Epoch 25, batch 100, loss[loss=0.01391, audio_tagging_loss=0.01391, over 25000.00 frames. ], tot_loss[loss=0.0179, audio_tagging_loss=0.0179, over 1968112.69 frames. ], batch size: 100, lr: 4.37e-03, grad_scale: 32.0 2023-12-22 20:16:22,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=763360.0, ans=0.125 2023-12-22 20:16:52,795 INFO [train.py:886] (1/4) Epoch 25, batch 150, loss[loss=0.01477, audio_tagging_loss=0.01477, over 25000.00 frames. ], tot_loss[loss=0.01628, audio_tagging_loss=0.01628, over 2632203.46 frames. ], batch size: 100, lr: 4.37e-03, grad_scale: 32.0 2023-12-22 20:16:58,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.02 vs. limit=15.0 2023-12-22 20:17:10,267 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.43 vs. limit=15.0 2023-12-22 20:17:10,527 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.911e+01 3.174e+01 3.367e+01 3.565e+01 4.203e+01, threshold=6.734e+01, percent-clipped=0.0 2023-12-22 20:17:12,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=763693.3333333334, ans=0.125 2023-12-22 20:17:16,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=763693.3333333334, ans=0.0 2023-12-22 20:17:17,405 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.73 vs. limit=15.0 2023-12-22 20:17:23,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=763760.0, ans=0.1 2023-12-22 20:17:34,611 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.71 vs. limit=10.0 2023-12-22 20:17:40,340 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.57 vs. limit=22.5 2023-12-22 20:17:40,385 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.07 vs. limit=10.0 2023-12-22 20:17:41,412 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.32 vs. limit=22.5 2023-12-22 20:17:44,280 INFO [train.py:886] (1/4) Epoch 25, batch 200, loss[loss=0.01565, audio_tagging_loss=0.01565, over 25000.00 frames. ], tot_loss[loss=0.01522, audio_tagging_loss=0.01522, over 3147340.47 frames. ], batch size: 100, lr: 4.37e-03, grad_scale: 32.0 2023-12-22 20:18:05,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=764026.6666666666, ans=0.125 2023-12-22 20:18:07,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=764026.6666666666, ans=0.0 2023-12-22 20:18:12,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=764026.6666666666, ans=0.0 2023-12-22 20:18:15,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=764093.3333333334, ans=0.125 2023-12-22 20:18:17,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=764093.3333333334, ans=0.0 2023-12-22 20:18:21,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=764093.3333333334, ans=0.2 2023-12-22 20:18:25,599 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.95 vs. limit=15.0 2023-12-22 20:18:37,136 INFO [train.py:886] (1/4) Epoch 25, batch 250, loss[loss=0.01232, audio_tagging_loss=0.01232, over 25000.00 frames. ], tot_loss[loss=0.01468, audio_tagging_loss=0.01468, over 3550944.15 frames. ], batch size: 100, lr: 4.37e-03, grad_scale: 32.0 2023-12-22 20:18:45,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=764226.6666666666, ans=0.125 2023-12-22 20:18:45,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=764293.3333333334, ans=0.1 2023-12-22 20:18:49,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=764293.3333333334, ans=0.0 2023-12-22 20:18:54,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=764293.3333333334, ans=0.1 2023-12-22 20:18:55,656 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.850e+01 3.053e+01 3.207e+01 3.358e+01 3.968e+01, threshold=6.413e+01, percent-clipped=0.0 2023-12-22 20:19:14,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=764426.6666666666, ans=0.125 2023-12-22 20:19:17,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=764493.3333333334, ans=0.125 2023-12-22 20:19:28,798 INFO [train.py:886] (1/4) Epoch 25, batch 300, loss[loss=0.01141, audio_tagging_loss=0.01141, over 24750.00 frames. ], tot_loss[loss=0.01432, audio_tagging_loss=0.01432, over 3857347.03 frames. ], batch size: 99, lr: 4.37e-03, grad_scale: 32.0 2023-12-22 20:19:29,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.89 vs. limit=6.0 2023-12-22 20:19:44,101 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.26 vs. limit=12.0 2023-12-22 20:19:45,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=764626.6666666666, ans=0.0 2023-12-22 20:20:02,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=764760.0, ans=0.125 2023-12-22 20:20:04,621 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.39 vs. limit=12.0 2023-12-22 20:20:05,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=764760.0, ans=0.0 2023-12-22 20:20:14,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=764826.6666666666, ans=0.125 2023-12-22 20:20:18,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=764826.6666666666, ans=0.1 2023-12-22 20:20:19,368 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.36 vs. limit=15.0 2023-12-22 20:20:20,615 INFO [train.py:886] (1/4) Epoch 25, batch 350, loss[loss=0.01246, audio_tagging_loss=0.01246, over 24750.00 frames. ], tot_loss[loss=0.01411, audio_tagging_loss=0.01411, over 4088469.80 frames. ], batch size: 99, lr: 4.37e-03, grad_scale: 32.0 2023-12-22 20:20:23,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=764893.3333333334, ans=0.0 2023-12-22 20:20:26,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=764893.3333333334, ans=0.125 2023-12-22 20:20:37,113 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.88 vs. limit=22.5 2023-12-22 20:20:40,133 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.757e+01 3.026e+01 3.207e+01 3.330e+01 3.805e+01, threshold=6.415e+01, percent-clipped=0.0 2023-12-22 20:20:46,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.92 vs. limit=22.5 2023-12-22 20:21:00,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=765093.3333333334, ans=0.09899494936611666 2023-12-22 20:21:13,462 INFO [train.py:886] (1/4) Epoch 25, batch 400, loss[loss=0.01062, audio_tagging_loss=0.01062, over 25000.00 frames. ], tot_loss[loss=0.01375, audio_tagging_loss=0.01375, over 4270757.42 frames. ], batch size: 100, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:21:37,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=765360.0, ans=0.125 2023-12-22 20:21:38,819 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.10 vs. limit=22.5 2023-12-22 20:21:40,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=765360.0, ans=0.0 2023-12-22 20:21:41,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=765360.0, ans=0.2 2023-12-22 20:21:44,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=765426.6666666666, ans=0.2 2023-12-22 20:21:44,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=765426.6666666666, ans=0.125 2023-12-22 20:21:54,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=765493.3333333334, ans=0.1 2023-12-22 20:21:58,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=15.0 2023-12-22 20:22:04,093 INFO [train.py:886] (1/4) Epoch 25, batch 450, loss[loss=0.01195, audio_tagging_loss=0.01195, over 25000.00 frames. ], tot_loss[loss=0.01355, audio_tagging_loss=0.01355, over 4424128.26 frames. ], batch size: 100, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:22:07,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=765560.0, ans=0.95 2023-12-22 20:22:13,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=765626.6666666666, ans=0.0 2023-12-22 20:22:23,130 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.697e+01 3.006e+01 3.168e+01 3.345e+01 4.036e+01, threshold=6.336e+01, percent-clipped=0.0 2023-12-22 20:22:39,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=765760.0, ans=0.1 2023-12-22 20:22:40,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=765760.0, ans=0.0 2023-12-22 20:22:54,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=765826.6666666666, ans=0.125 2023-12-22 20:22:56,178 INFO [train.py:886] (1/4) Epoch 25, batch 500, loss[loss=0.01245, audio_tagging_loss=0.01245, over 25000.00 frames. ], tot_loss[loss=0.01342, audio_tagging_loss=0.01342, over 4543623.74 frames. ], batch size: 100, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:22:57,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=765893.3333333334, ans=0.125 2023-12-22 20:23:15,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=766026.6666666666, ans=0.1 2023-12-22 20:23:28,661 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.54 vs. limit=10.0 2023-12-22 20:23:47,580 INFO [train.py:886] (1/4) Epoch 25, batch 550, loss[loss=0.01305, audio_tagging_loss=0.01305, over 24033.00 frames. ], tot_loss[loss=0.01335, audio_tagging_loss=0.01335, over 4637244.23 frames. ], batch size: 100, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:23:50,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=766226.6666666666, ans=0.125 2023-12-22 20:23:56,015 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 20:24:00,957 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.32 vs. limit=15.0 2023-12-22 20:24:05,985 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.753e+01 3.053e+01 3.176e+01 3.328e+01 4.174e+01, threshold=6.352e+01, percent-clipped=0.0 2023-12-22 20:24:22,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=766426.6666666666, ans=0.125 2023-12-22 20:24:34,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=766493.3333333334, ans=0.125 2023-12-22 20:24:36,179 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.81 vs. limit=12.0 2023-12-22 20:24:39,256 INFO [train.py:886] (1/4) Epoch 25, batch 600, loss[loss=0.01137, audio_tagging_loss=0.01137, over 21329.00 frames. ], tot_loss[loss=0.01328, audio_tagging_loss=0.01328, over 4697227.94 frames. ], batch size: 107, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:24:47,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=766560.0, ans=0.125 2023-12-22 20:25:02,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=766693.3333333334, ans=0.1 2023-12-22 20:25:06,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=766693.3333333334, ans=0.125 2023-12-22 20:25:31,423 INFO [train.py:886] (1/4) Epoch 25, batch 650, loss[loss=0.01291, audio_tagging_loss=0.01291, over 24750.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4750617.10 frames. ], batch size: 99, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:25:46,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=766960.0, ans=0.0 2023-12-22 20:25:49,885 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.858e+01 3.083e+01 3.219e+01 3.359e+01 3.843e+01, threshold=6.437e+01, percent-clipped=0.0 2023-12-22 20:25:53,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=767026.6666666666, ans=0.0 2023-12-22 20:25:58,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=767026.6666666666, ans=0.125 2023-12-22 20:25:59,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=767026.6666666666, ans=0.125 2023-12-22 20:25:59,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=767026.6666666666, ans=0.0 2023-12-22 20:26:02,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=767093.3333333334, ans=0.125 2023-12-22 20:26:04,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=767093.3333333334, ans=0.0 2023-12-22 20:26:04,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=767093.3333333334, ans=0.1 2023-12-22 20:26:10,436 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.94 vs. limit=15.0 2023-12-22 20:26:14,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=767160.0, ans=0.2 2023-12-22 20:26:23,595 INFO [train.py:886] (1/4) Epoch 25, batch 700, loss[loss=0.01477, audio_tagging_loss=0.01477, over 25000.00 frames. ], tot_loss[loss=0.01328, audio_tagging_loss=0.01328, over 4787397.29 frames. ], batch size: 100, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:26:24,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=767226.6666666666, ans=0.2 2023-12-22 20:26:26,024 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.45 vs. limit=12.0 2023-12-22 20:26:33,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=767293.3333333334, ans=0.125 2023-12-22 20:26:34,271 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.35 vs. limit=22.5 2023-12-22 20:26:42,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=767293.3333333334, ans=0.125 2023-12-22 20:27:11,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=767493.3333333334, ans=0.125 2023-12-22 20:27:11,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=767493.3333333334, ans=0.1 2023-12-22 20:27:15,237 INFO [train.py:886] (1/4) Epoch 25, batch 750, loss[loss=0.01153, audio_tagging_loss=0.01153, over 25000.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 4826570.17 frames. ], batch size: 100, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:27:33,652 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.775e+01 3.045e+01 3.178e+01 3.302e+01 3.824e+01, threshold=6.355e+01, percent-clipped=0.0 2023-12-22 20:27:33,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=767626.6666666666, ans=0.0 2023-12-22 20:27:35,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=767693.3333333334, ans=0.125 2023-12-22 20:28:03,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=767826.6666666666, ans=0.125 2023-12-22 20:28:06,856 INFO [train.py:886] (1/4) Epoch 25, batch 800, loss[loss=0.01393, audio_tagging_loss=0.01393, over 25000.00 frames. ], tot_loss[loss=0.01308, audio_tagging_loss=0.01308, over 4857164.87 frames. ], batch size: 100, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:28:07,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=767893.3333333334, ans=15.0 2023-12-22 20:28:22,962 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2023-12-22 20:28:31,974 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.45 vs. limit=6.0 2023-12-22 20:28:40,484 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.12 vs. limit=15.0 2023-12-22 20:28:44,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=768093.3333333334, ans=0.125 2023-12-22 20:28:48,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=768160.0, ans=0.125 2023-12-22 20:28:58,436 INFO [train.py:886] (1/4) Epoch 25, batch 850, loss[loss=0.0116, audio_tagging_loss=0.0116, over 24750.00 frames. ], tot_loss[loss=0.01305, audio_tagging_loss=0.01305, over 4873909.76 frames. ], batch size: 99, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:29:04,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=768226.6666666666, ans=0.1 2023-12-22 20:29:06,788 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 20:29:15,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=768293.3333333334, ans=0.1 2023-12-22 20:29:15,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=768293.3333333334, ans=0.07 2023-12-22 20:29:17,719 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.700e+01 3.008e+01 3.165e+01 3.343e+01 3.656e+01, threshold=6.329e+01, percent-clipped=0.0 2023-12-22 20:29:20,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=768360.0, ans=0.125 2023-12-22 20:29:30,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=768426.6666666666, ans=0.1 2023-12-22 20:29:33,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=768426.6666666666, ans=0.125 2023-12-22 20:29:41,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=768493.3333333334, ans=0.125 2023-12-22 20:29:42,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=768493.3333333334, ans=0.2 2023-12-22 20:29:50,958 INFO [train.py:886] (1/4) Epoch 25, batch 900, loss[loss=0.01287, audio_tagging_loss=0.01287, over 24750.00 frames. ], tot_loss[loss=0.01316, audio_tagging_loss=0.01316, over 4896071.97 frames. ], batch size: 99, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:29:54,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=768560.0, ans=0.125 2023-12-22 20:29:54,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=768560.0, ans=0.04949747468305833 2023-12-22 20:29:56,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=768560.0, ans=0.2 2023-12-22 20:29:57,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=768560.0, ans=0.125 2023-12-22 20:30:03,823 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.94 vs. limit=8.0 2023-12-22 20:30:15,348 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.01 vs. limit=15.0 2023-12-22 20:30:18,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=768693.3333333334, ans=0.0 2023-12-22 20:30:25,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=768760.0, ans=0.0 2023-12-22 20:30:29,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=768760.0, ans=0.125 2023-12-22 20:30:36,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=768826.6666666666, ans=0.125 2023-12-22 20:30:43,136 INFO [train.py:886] (1/4) Epoch 25, batch 950, loss[loss=0.01191, audio_tagging_loss=0.01191, over 24750.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 4899370.29 frames. ], batch size: 99, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:30:47,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=768893.3333333334, ans=0.0 2023-12-22 20:30:48,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=768893.3333333334, ans=0.125 2023-12-22 20:30:49,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=768893.3333333334, ans=0.125 2023-12-22 20:31:00,898 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.816e+01 3.092e+01 3.233e+01 3.411e+01 4.030e+01, threshold=6.467e+01, percent-clipped=0.0 2023-12-22 20:31:06,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=769026.6666666666, ans=0.0 2023-12-22 20:31:18,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.00 vs. limit=15.0 2023-12-22 20:31:26,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=769160.0, ans=0.125 2023-12-22 20:31:34,093 INFO [train.py:886] (1/4) Epoch 25, batch 1000, loss[loss=0.01123, audio_tagging_loss=0.01123, over 24750.00 frames. ], tot_loss[loss=0.01314, audio_tagging_loss=0.01314, over 4911096.13 frames. ], batch size: 99, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:31:39,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=769226.6666666666, ans=0.0 2023-12-22 20:31:46,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=769293.3333333334, ans=0.125 2023-12-22 20:31:46,872 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 20:31:47,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=769293.3333333334, ans=0.125 2023-12-22 20:31:48,994 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.35 vs. limit=15.0 2023-12-22 20:31:51,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=769293.3333333334, ans=0.0 2023-12-22 20:32:21,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=769493.3333333334, ans=0.125 2023-12-22 20:32:26,429 INFO [train.py:886] (1/4) Epoch 25, batch 1050, loss[loss=0.01147, audio_tagging_loss=0.01147, over 24750.00 frames. ], tot_loss[loss=0.0131, audio_tagging_loss=0.0131, over 4922196.39 frames. ], batch size: 99, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:32:30,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=769560.0, ans=0.1 2023-12-22 20:32:44,772 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.666e+01 3.041e+01 3.195e+01 3.319e+01 3.773e+01, threshold=6.390e+01, percent-clipped=0.0 2023-12-22 20:32:45,185 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.04 vs. limit=6.0 2023-12-22 20:32:56,137 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.94 vs. limit=12.0 2023-12-22 20:33:15,802 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=5.33 vs. limit=15.0 2023-12-22 20:33:18,082 INFO [train.py:886] (1/4) Epoch 25, batch 1100, loss[loss=0.01035, audio_tagging_loss=0.01035, over 25000.00 frames. ], tot_loss[loss=0.01306, audio_tagging_loss=0.01306, over 4931699.09 frames. ], batch size: 100, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:33:23,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=769893.3333333334, ans=0.2 2023-12-22 20:33:26,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=769893.3333333334, ans=0.125 2023-12-22 20:33:59,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=770160.0, ans=0.125 2023-12-22 20:34:09,554 INFO [train.py:886] (1/4) Epoch 25, batch 1150, loss[loss=0.01261, audio_tagging_loss=0.01261, over 24750.00 frames. ], tot_loss[loss=0.01312, audio_tagging_loss=0.01312, over 4936067.53 frames. ], batch size: 99, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:34:28,873 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.746e+01 3.047e+01 3.195e+01 3.340e+01 6.361e+01, threshold=6.391e+01, percent-clipped=0.0 2023-12-22 20:34:47,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=770426.6666666666, ans=0.125 2023-12-22 20:34:51,495 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 20:35:02,107 INFO [train.py:886] (1/4) Epoch 25, batch 1200, loss[loss=0.009952, audio_tagging_loss=0.009952, over 25000.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4945168.47 frames. ], batch size: 100, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:35:02,460 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.15 vs. limit=12.0 2023-12-22 20:35:21,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=770693.3333333334, ans=0.0 2023-12-22 20:35:38,153 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.46 vs. limit=22.5 2023-12-22 20:35:40,292 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.94 vs. limit=15.0 2023-12-22 20:35:41,255 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.76 vs. limit=15.0 2023-12-22 20:35:45,488 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 20:35:53,924 INFO [train.py:886] (1/4) Epoch 25, batch 1250, loss[loss=0.01389, audio_tagging_loss=0.01389, over 24750.00 frames. ], tot_loss[loss=0.0133, audio_tagging_loss=0.0133, over 4948174.93 frames. ], batch size: 99, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:35:58,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=770893.3333333334, ans=0.0 2023-12-22 20:36:04,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=770960.0, ans=0.1 2023-12-22 20:36:12,984 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.822e+01 3.141e+01 3.235e+01 3.393e+01 3.874e+01, threshold=6.470e+01, percent-clipped=0.0 2023-12-22 20:36:18,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=771026.6666666666, ans=0.015 2023-12-22 20:36:22,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=771026.6666666666, ans=0.0 2023-12-22 20:36:46,653 INFO [train.py:886] (1/4) Epoch 25, batch 1300, loss[loss=0.0145, audio_tagging_loss=0.0145, over 25000.00 frames. ], tot_loss[loss=0.01339, audio_tagging_loss=0.01339, over 4944285.61 frames. ], batch size: 100, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:36:59,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=771293.3333333334, ans=0.2 2023-12-22 20:37:01,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=771293.3333333334, ans=0.125 2023-12-22 20:37:01,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=771293.3333333334, ans=0.0 2023-12-22 20:37:02,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=771293.3333333334, ans=0.0 2023-12-22 20:37:06,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.23 vs. limit=15.0 2023-12-22 20:37:07,509 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.68 vs. limit=22.5 2023-12-22 20:37:15,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=771360.0, ans=0.0 2023-12-22 20:37:35,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=771493.3333333334, ans=15.0 2023-12-22 20:37:38,474 INFO [train.py:886] (1/4) Epoch 25, batch 1350, loss[loss=0.01139, audio_tagging_loss=0.01139, over 25000.00 frames. ], tot_loss[loss=0.01329, audio_tagging_loss=0.01329, over 4945110.74 frames. ], batch size: 100, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:37:56,909 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.618e+01 3.059e+01 3.211e+01 3.317e+01 3.906e+01, threshold=6.423e+01, percent-clipped=0.0 2023-12-22 20:38:00,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.53 vs. limit=22.5 2023-12-22 20:38:29,833 INFO [train.py:886] (1/4) Epoch 25, batch 1400, loss[loss=0.01115, audio_tagging_loss=0.01115, over 25000.00 frames. ], tot_loss[loss=0.01311, audio_tagging_loss=0.01311, over 4948518.06 frames. ], batch size: 100, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:38:31,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=771893.3333333334, ans=0.0 2023-12-22 20:38:34,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=771893.3333333334, ans=0.0 2023-12-22 20:38:41,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=771960.0, ans=0.125 2023-12-22 20:38:41,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=771960.0, ans=0.125 2023-12-22 20:38:53,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=772026.6666666666, ans=0.1 2023-12-22 20:38:57,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=772026.6666666666, ans=0.04949747468305833 2023-12-22 20:39:03,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=772093.3333333334, ans=0.0 2023-12-22 20:39:08,023 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.40 vs. limit=15.0 2023-12-22 20:39:10,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.76 vs. limit=15.0 2023-12-22 20:39:13,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=772160.0, ans=0.0 2023-12-22 20:39:15,961 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.55 vs. limit=15.0 2023-12-22 20:39:22,110 INFO [train.py:886] (1/4) Epoch 25, batch 1450, loss[loss=0.01246, audio_tagging_loss=0.01246, over 25000.00 frames. ], tot_loss[loss=0.01302, audio_tagging_loss=0.01302, over 4948909.68 frames. ], batch size: 100, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:39:24,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=772226.6666666666, ans=0.1 2023-12-22 20:39:40,581 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.735e+01 3.046e+01 3.154e+01 3.328e+01 3.789e+01, threshold=6.308e+01, percent-clipped=0.0 2023-12-22 20:39:45,525 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.06 vs. limit=15.0 2023-12-22 20:39:49,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=772360.0, ans=0.2 2023-12-22 20:40:02,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=772426.6666666666, ans=0.125 2023-12-22 20:40:14,147 INFO [train.py:886] (1/4) Epoch 25, batch 1500, loss[loss=0.01283, audio_tagging_loss=0.01283, over 25000.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 4951836.83 frames. ], batch size: 100, lr: 4.34e-03, grad_scale: 32.0 2023-12-22 20:40:30,100 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.41 vs. limit=15.0 2023-12-22 20:40:38,522 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.53 vs. limit=15.0 2023-12-22 20:40:53,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=772760.0, ans=0.1 2023-12-22 20:40:54,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=772826.6666666666, ans=0.1 2023-12-22 20:40:56,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.94 vs. limit=15.0 2023-12-22 20:41:02,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=772826.6666666666, ans=0.1 2023-12-22 20:41:05,305 INFO [train.py:886] (1/4) Epoch 25, batch 1550, loss[loss=0.01404, audio_tagging_loss=0.01404, over 24750.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 4948829.11 frames. ], batch size: 99, lr: 4.34e-03, grad_scale: 32.0 2023-12-22 20:41:05,556 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 20:41:06,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.86 vs. limit=15.0 2023-12-22 20:41:08,685 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.73 vs. limit=15.0 2023-12-22 20:41:09,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=772893.3333333334, ans=0.125 2023-12-22 20:41:22,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=772960.0, ans=0.0 2023-12-22 20:41:24,019 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.734e+01 3.051e+01 3.220e+01 3.373e+01 4.348e+01, threshold=6.439e+01, percent-clipped=0.0 2023-12-22 20:41:28,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=773026.6666666666, ans=0.125 2023-12-22 20:41:30,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=773026.6666666666, ans=0.125 2023-12-22 20:41:56,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=773226.6666666666, ans=0.2 2023-12-22 20:41:56,983 INFO [train.py:886] (1/4) Epoch 25, batch 1600, loss[loss=0.01216, audio_tagging_loss=0.01216, over 24750.00 frames. ], tot_loss[loss=0.01316, audio_tagging_loss=0.01316, over 4940403.65 frames. ], batch size: 99, lr: 4.34e-03, grad_scale: 32.0 2023-12-22 20:42:15,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=773293.3333333334, ans=0.125 2023-12-22 20:42:16,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=773293.3333333334, ans=0.125 2023-12-22 20:42:31,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=773426.6666666666, ans=0.1 2023-12-22 20:42:33,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=773426.6666666666, ans=0.0 2023-12-22 20:42:40,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=773493.3333333334, ans=0.2 2023-12-22 20:42:42,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=773493.3333333334, ans=0.1 2023-12-22 20:42:50,557 INFO [train.py:886] (1/4) Epoch 25, batch 1650, loss[loss=0.009122, audio_tagging_loss=0.009122, over 25000.00 frames. ], tot_loss[loss=0.01315, audio_tagging_loss=0.01315, over 4941959.51 frames. ], batch size: 100, lr: 4.34e-03, grad_scale: 32.0 2023-12-22 20:43:08,965 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.690e+01 3.090e+01 3.219e+01 3.390e+01 4.071e+01, threshold=6.439e+01, percent-clipped=0.0 2023-12-22 20:43:13,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=773693.3333333334, ans=0.125 2023-12-22 20:43:17,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=773693.3333333334, ans=0.125 2023-12-22 20:43:29,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=773760.0, ans=0.0 2023-12-22 20:43:31,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=773826.6666666666, ans=0.0 2023-12-22 20:43:42,128 INFO [train.py:886] (1/4) Epoch 25, batch 1700, loss[loss=0.01195, audio_tagging_loss=0.01195, over 25000.00 frames. ], tot_loss[loss=0.01311, audio_tagging_loss=0.01311, over 4947960.57 frames. ], batch size: 100, lr: 4.34e-03, grad_scale: 32.0 2023-12-22 20:43:46,209 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.40 vs. limit=15.0 2023-12-22 20:43:49,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=773893.3333333334, ans=0.125 2023-12-22 20:43:56,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=773960.0, ans=0.125 2023-12-22 20:44:08,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=774026.6666666666, ans=0.125 2023-12-22 20:44:13,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=774093.3333333334, ans=0.1 2023-12-22 20:44:18,429 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.07 vs. limit=10.0 2023-12-22 20:44:24,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=774160.0, ans=0.1 2023-12-22 20:44:29,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.26 vs. limit=15.0 2023-12-22 20:44:33,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=774226.6666666666, ans=0.125 2023-12-22 20:44:34,467 INFO [train.py:886] (1/4) Epoch 25, batch 1750, loss[loss=0.01453, audio_tagging_loss=0.01453, over 25000.00 frames. ], tot_loss[loss=0.01305, audio_tagging_loss=0.01305, over 4954933.03 frames. ], batch size: 100, lr: 4.34e-03, grad_scale: 32.0 2023-12-22 20:44:46,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=774293.3333333334, ans=0.125 2023-12-22 20:44:50,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=774293.3333333334, ans=0.0 2023-12-22 20:44:52,983 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.733e+01 3.014e+01 3.131e+01 3.293e+01 4.047e+01, threshold=6.262e+01, percent-clipped=0.0 2023-12-22 20:45:13,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=774426.6666666666, ans=0.2 2023-12-22 20:45:13,406 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.91 vs. limit=10.0 2023-12-22 20:45:20,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=774493.3333333334, ans=0.125 2023-12-22 20:45:23,753 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.76 vs. limit=15.0 2023-12-22 20:45:26,041 INFO [train.py:886] (1/4) Epoch 25, batch 1800, loss[loss=0.01293, audio_tagging_loss=0.01293, over 25000.00 frames. ], tot_loss[loss=0.01302, audio_tagging_loss=0.01302, over 4957224.63 frames. ], batch size: 100, lr: 4.34e-03, grad_scale: 32.0 2023-12-22 20:45:30,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=774560.0, ans=0.125 2023-12-22 20:45:41,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=774626.6666666666, ans=0.0 2023-12-22 20:45:46,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=774693.3333333334, ans=0.2 2023-12-22 20:46:18,656 INFO [train.py:886] (1/4) Epoch 25, batch 1850, loss[loss=0.01506, audio_tagging_loss=0.01506, over 24750.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4955803.94 frames. ], batch size: 99, lr: 4.34e-03, grad_scale: 32.0 2023-12-22 20:46:37,075 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.812e+01 3.072e+01 3.202e+01 3.378e+01 4.183e+01, threshold=6.404e+01, percent-clipped=0.0 2023-12-22 20:47:02,661 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.54 vs. limit=15.0 2023-12-22 20:47:10,324 INFO [train.py:886] (1/4) Epoch 25, batch 1900, loss[loss=0.01165, audio_tagging_loss=0.01165, over 24750.00 frames. ], tot_loss[loss=0.01313, audio_tagging_loss=0.01313, over 4952249.35 frames. ], batch size: 99, lr: 4.34e-03, grad_scale: 32.0 2023-12-22 20:47:19,850 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.93 vs. limit=10.0 2023-12-22 20:47:21,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=775293.3333333334, ans=0.025 2023-12-22 20:47:45,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=775426.6666666666, ans=0.0 2023-12-22 20:47:49,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=775426.6666666666, ans=0.125 2023-12-22 20:47:58,471 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.84 vs. limit=10.0 2023-12-22 20:48:01,250 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.51 vs. limit=22.5 2023-12-22 20:48:02,441 INFO [train.py:886] (1/4) Epoch 25, batch 1950, loss[loss=0.01162, audio_tagging_loss=0.01162, over 24750.00 frames. ], tot_loss[loss=0.01312, audio_tagging_loss=0.01312, over 4949685.87 frames. ], batch size: 99, lr: 4.34e-03, grad_scale: 32.0 2023-12-22 20:48:21,576 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.688e+01 3.045e+01 3.163e+01 3.260e+01 3.752e+01, threshold=6.326e+01, percent-clipped=0.0 2023-12-22 20:48:42,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=775826.6666666666, ans=0.125 2023-12-22 20:48:54,597 INFO [train.py:886] (1/4) Epoch 25, batch 2000, loss[loss=0.01387, audio_tagging_loss=0.01387, over 24750.00 frames. ], tot_loss[loss=0.01311, audio_tagging_loss=0.01311, over 4951494.97 frames. ], batch size: 99, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:49:05,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=775960.0, ans=0.125 2023-12-22 20:49:31,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=776093.3333333334, ans=0.125 2023-12-22 20:49:33,108 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.78 vs. limit=22.5 2023-12-22 20:49:43,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=776160.0, ans=0.125 2023-12-22 20:49:45,788 INFO [train.py:886] (1/4) Epoch 25, batch 2050, loss[loss=0.01033, audio_tagging_loss=0.01033, over 25000.00 frames. ], tot_loss[loss=0.01306, audio_tagging_loss=0.01306, over 4948111.54 frames. ], batch size: 100, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:49:47,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=776226.6666666666, ans=0.05 2023-12-22 20:50:02,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=776293.3333333334, ans=0.0 2023-12-22 20:50:04,275 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.672e+01 3.006e+01 3.133e+01 3.319e+01 3.992e+01, threshold=6.266e+01, percent-clipped=0.0 2023-12-22 20:50:10,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=776360.0, ans=0.125 2023-12-22 20:50:35,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=776493.3333333334, ans=0.0 2023-12-22 20:50:37,286 INFO [train.py:886] (1/4) Epoch 25, batch 2100, loss[loss=0.01116, audio_tagging_loss=0.01116, over 24750.00 frames. ], tot_loss[loss=0.01299, audio_tagging_loss=0.01299, over 4945804.46 frames. ], batch size: 99, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:50:58,414 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.53 vs. limit=6.0 2023-12-22 20:51:09,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=776760.0, ans=0.05 2023-12-22 20:51:28,940 INFO [train.py:886] (1/4) Epoch 25, batch 2150, loss[loss=0.01376, audio_tagging_loss=0.01376, over 24750.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4953750.26 frames. ], batch size: 99, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:51:29,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=776893.3333333334, ans=10.0 2023-12-22 20:51:32,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=776893.3333333334, ans=0.1 2023-12-22 20:51:33,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=776893.3333333334, ans=0.0 2023-12-22 20:51:47,385 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.715e+01 3.098e+01 3.255e+01 3.441e+01 3.883e+01, threshold=6.510e+01, percent-clipped=0.0 2023-12-22 20:51:53,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=777026.6666666666, ans=0.2 2023-12-22 20:51:53,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=777026.6666666666, ans=0.125 2023-12-22 20:51:56,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=777026.6666666666, ans=0.125 2023-12-22 20:52:03,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=777093.3333333334, ans=0.0 2023-12-22 20:52:03,450 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.79 vs. limit=15.0 2023-12-22 20:52:10,638 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.51 vs. limit=22.5 2023-12-22 20:52:17,976 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=15.0 2023-12-22 20:52:21,128 INFO [train.py:886] (1/4) Epoch 25, batch 2200, loss[loss=0.01102, audio_tagging_loss=0.01102, over 25000.00 frames. ], tot_loss[loss=0.01309, audio_tagging_loss=0.01309, over 4950042.38 frames. ], batch size: 100, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:52:22,602 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.07 vs. limit=22.5 2023-12-22 20:52:23,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=777226.6666666666, ans=10.0 2023-12-22 20:52:27,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.23 vs. limit=15.0 2023-12-22 20:52:46,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=777360.0, ans=0.2 2023-12-22 20:52:55,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=777426.6666666666, ans=0.125 2023-12-22 20:53:06,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=777493.3333333334, ans=0.0 2023-12-22 20:53:11,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=777493.3333333334, ans=0.0 2023-12-22 20:53:13,068 INFO [train.py:886] (1/4) Epoch 25, batch 2250, loss[loss=0.009952, audio_tagging_loss=0.009952, over 23930.00 frames. ], tot_loss[loss=0.01308, audio_tagging_loss=0.01308, over 4945525.19 frames. ], batch size: 100, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:53:16,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=777560.0, ans=0.125 2023-12-22 20:53:30,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=777626.6666666666, ans=0.125 2023-12-22 20:53:30,764 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.795e+01 3.106e+01 3.234e+01 3.414e+01 3.931e+01, threshold=6.468e+01, percent-clipped=0.0 2023-12-22 20:53:44,098 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.29 vs. limit=15.0 2023-12-22 20:53:47,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.23 vs. limit=15.0 2023-12-22 20:54:03,756 INFO [train.py:886] (1/4) Epoch 25, batch 2300, loss[loss=0.01247, audio_tagging_loss=0.01247, over 24750.00 frames. ], tot_loss[loss=0.01299, audio_tagging_loss=0.01299, over 4943215.71 frames. ], batch size: 99, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:54:29,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=778026.6666666666, ans=0.1 2023-12-22 20:54:37,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=778093.3333333334, ans=0.125 2023-12-22 20:54:39,275 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.45 vs. limit=15.0 2023-12-22 20:54:55,981 INFO [train.py:886] (1/4) Epoch 25, batch 2350, loss[loss=0.01367, audio_tagging_loss=0.01367, over 24750.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4950234.63 frames. ], batch size: 99, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:55:06,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=778293.3333333334, ans=0.0 2023-12-22 20:55:08,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=778293.3333333334, ans=0.125 2023-12-22 20:55:10,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=778293.3333333334, ans=0.1 2023-12-22 20:55:10,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=778293.3333333334, ans=0.0 2023-12-22 20:55:12,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=778293.3333333334, ans=0.07 2023-12-22 20:55:14,435 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.697e+01 3.039e+01 3.160e+01 3.337e+01 4.563e+01, threshold=6.321e+01, percent-clipped=0.0 2023-12-22 20:55:31,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=778426.6666666666, ans=0.125 2023-12-22 20:55:47,503 INFO [train.py:886] (1/4) Epoch 25, batch 2400, loss[loss=0.01399, audio_tagging_loss=0.01399, over 24750.00 frames. ], tot_loss[loss=0.01289, audio_tagging_loss=0.01289, over 4949224.98 frames. ], batch size: 99, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:55:57,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=778626.6666666666, ans=0.125 2023-12-22 20:55:58,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=778626.6666666666, ans=0.0 2023-12-22 20:55:58,996 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2023-12-22 20:56:04,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=778626.6666666666, ans=0.0 2023-12-22 20:56:26,902 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.27 vs. limit=15.0 2023-12-22 20:56:28,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=778826.6666666666, ans=0.015 2023-12-22 20:56:39,830 INFO [train.py:886] (1/4) Epoch 25, batch 2450, loss[loss=0.0157, audio_tagging_loss=0.0157, over 25000.00 frames. ], tot_loss[loss=0.01289, audio_tagging_loss=0.01289, over 4956018.10 frames. ], batch size: 100, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:56:43,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=778893.3333333334, ans=0.125 2023-12-22 20:56:58,316 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.730e+01 3.055e+01 3.201e+01 3.364e+01 3.978e+01, threshold=6.403e+01, percent-clipped=0.0 2023-12-22 20:57:05,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=779026.6666666666, ans=0.125 2023-12-22 20:57:17,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=779093.3333333334, ans=0.125 2023-12-22 20:57:31,413 INFO [train.py:886] (1/4) Epoch 25, batch 2500, loss[loss=0.01224, audio_tagging_loss=0.01224, over 24750.00 frames. ], tot_loss[loss=0.01298, audio_tagging_loss=0.01298, over 4951088.56 frames. ], batch size: 99, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:57:31,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=779226.6666666666, ans=0.07 2023-12-22 20:57:38,194 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.08 vs. limit=6.0 2023-12-22 20:57:42,444 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 20:57:50,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=779360.0, ans=0.0 2023-12-22 20:57:54,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=779360.0, ans=0.0 2023-12-22 20:58:22,690 INFO [train.py:886] (1/4) Epoch 25, batch 2550, loss[loss=0.01466, audio_tagging_loss=0.01466, over 24750.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4944848.13 frames. ], batch size: 99, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 20:58:26,339 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 20:58:37,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=779626.6666666666, ans=0.125 2023-12-22 20:58:40,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=779626.6666666666, ans=0.125 2023-12-22 20:58:41,775 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.771e+01 3.116e+01 3.261e+01 3.402e+01 3.994e+01, threshold=6.522e+01, percent-clipped=0.0 2023-12-22 20:58:47,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=779693.3333333334, ans=0.1 2023-12-22 20:58:47,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=779693.3333333334, ans=0.125 2023-12-22 20:58:50,489 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.39 vs. limit=15.0 2023-12-22 20:58:54,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=779760.0, ans=0.0 2023-12-22 20:58:57,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=779760.0, ans=0.2 2023-12-22 20:59:08,853 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 20:59:15,168 INFO [train.py:886] (1/4) Epoch 25, batch 2600, loss[loss=0.01393, audio_tagging_loss=0.01393, over 25000.00 frames. ], tot_loss[loss=0.01309, audio_tagging_loss=0.01309, over 4944825.70 frames. ], batch size: 100, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 20:59:23,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=779893.3333333334, ans=0.125 2023-12-22 21:00:07,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=780226.6666666666, ans=0.125 2023-12-22 21:00:08,047 INFO [train.py:886] (1/4) Epoch 25, batch 2650, loss[loss=0.01224, audio_tagging_loss=0.01224, over 25000.00 frames. ], tot_loss[loss=0.01303, audio_tagging_loss=0.01303, over 4940869.75 frames. ], batch size: 100, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 21:00:11,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=780226.6666666666, ans=0.125 2023-12-22 21:00:13,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=780226.6666666666, ans=0.1 2023-12-22 21:00:15,105 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.44 vs. limit=15.0 2023-12-22 21:00:22,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=780293.3333333334, ans=0.125 2023-12-22 21:00:26,452 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.757e+01 3.067e+01 3.194e+01 3.330e+01 3.753e+01, threshold=6.389e+01, percent-clipped=0.0 2023-12-22 21:00:31,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=780360.0, ans=0.0 2023-12-22 21:00:34,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=780360.0, ans=0.0 2023-12-22 21:00:34,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=780360.0, ans=0.0 2023-12-22 21:00:59,656 INFO [train.py:886] (1/4) Epoch 25, batch 2700, loss[loss=0.01101, audio_tagging_loss=0.01101, over 24750.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 4948282.77 frames. ], batch size: 99, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 21:01:11,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=780626.6666666666, ans=0.125 2023-12-22 21:01:12,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=780626.6666666666, ans=0.2 2023-12-22 21:01:31,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=780760.0, ans=0.1 2023-12-22 21:01:37,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=780760.0, ans=0.1 2023-12-22 21:01:51,169 INFO [train.py:886] (1/4) Epoch 25, batch 2750, loss[loss=0.01326, audio_tagging_loss=0.01326, over 24750.00 frames. ], tot_loss[loss=0.01301, audio_tagging_loss=0.01301, over 4953134.58 frames. ], batch size: 99, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 21:02:09,603 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.725e+01 3.037e+01 3.216e+01 3.345e+01 3.821e+01, threshold=6.432e+01, percent-clipped=0.0 2023-12-22 21:02:20,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=781026.6666666666, ans=0.125 2023-12-22 21:02:35,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=781160.0, ans=0.0 2023-12-22 21:02:42,965 INFO [train.py:886] (1/4) Epoch 25, batch 2800, loss[loss=0.01626, audio_tagging_loss=0.01626, over 24955.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4949780.09 frames. ], batch size: 100, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 21:02:56,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=781293.3333333334, ans=0.05 2023-12-22 21:02:56,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=781293.3333333334, ans=0.125 2023-12-22 21:02:57,189 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.64 vs. limit=22.5 2023-12-22 21:03:09,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=781360.0, ans=0.1 2023-12-22 21:03:17,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=781426.6666666666, ans=0.0 2023-12-22 21:03:24,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=781493.3333333334, ans=0.125 2023-12-22 21:03:29,791 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.30 vs. limit=15.0 2023-12-22 21:03:36,269 INFO [train.py:886] (1/4) Epoch 25, batch 2850, loss[loss=0.01543, audio_tagging_loss=0.01543, over 24750.00 frames. ], tot_loss[loss=0.0132, audio_tagging_loss=0.0132, over 4943287.93 frames. ], batch size: 99, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 21:03:54,792 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.736e+01 3.071e+01 3.202e+01 3.384e+01 4.019e+01, threshold=6.405e+01, percent-clipped=0.0 2023-12-22 21:04:00,089 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.19 vs. limit=12.0 2023-12-22 21:04:16,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=781826.6666666666, ans=0.125 2023-12-22 21:04:24,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.98 vs. limit=15.0 2023-12-22 21:04:26,578 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.83 vs. limit=12.0 2023-12-22 21:04:28,005 INFO [train.py:886] (1/4) Epoch 25, batch 2900, loss[loss=0.01245, audio_tagging_loss=0.01245, over 25000.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4948811.13 frames. ], batch size: 100, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 21:04:34,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=781893.3333333334, ans=0.125 2023-12-22 21:04:56,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=782026.6666666666, ans=0.025 2023-12-22 21:05:18,750 INFO [train.py:886] (1/4) Epoch 25, batch 2950, loss[loss=0.009185, audio_tagging_loss=0.009185, over 25000.00 frames. ], tot_loss[loss=0.01316, audio_tagging_loss=0.01316, over 4951556.40 frames. ], batch size: 100, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 21:05:19,221 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.18 vs. limit=15.0 2023-12-22 21:05:29,044 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.26 vs. limit=15.0 2023-12-22 21:05:36,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=782293.3333333334, ans=0.1 2023-12-22 21:05:37,125 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.793e+01 3.031e+01 3.168e+01 3.321e+01 3.698e+01, threshold=6.337e+01, percent-clipped=0.0 2023-12-22 21:05:44,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=782360.0, ans=0.125 2023-12-22 21:05:54,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=782426.6666666666, ans=0.0 2023-12-22 21:06:10,415 INFO [train.py:886] (1/4) Epoch 25, batch 3000, loss[loss=0.01293, audio_tagging_loss=0.01293, over 25000.00 frames. ], tot_loss[loss=0.0131, audio_tagging_loss=0.0131, over 4955714.51 frames. ], batch size: 100, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 21:06:10,415 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 21:06:24,662 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.8560, 3.1054, 3.3062, 2.6153, 2.6270, 3.2062, 2.7484, 2.5566], device='cuda:1') 2023-12-22 21:06:31,864 INFO [train.py:917] (1/4) Epoch 25, validation: loss=0.0331, audio_tagging_loss=0.0331, over 3737520.00 frames. 2023-12-22 21:06:31,865 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 21:06:38,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=782560.0, ans=0.1 2023-12-22 21:06:43,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=782626.6666666666, ans=0.125 2023-12-22 21:06:50,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=782626.6666666666, ans=0.125 2023-12-22 21:06:52,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=782693.3333333334, ans=0.2 2023-12-22 21:06:57,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=782693.3333333334, ans=0.0 2023-12-22 21:07:09,619 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.58 vs. limit=15.0 2023-12-22 21:07:10,226 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=15.0 2023-12-22 21:07:23,403 INFO [train.py:886] (1/4) Epoch 25, batch 3050, loss[loss=0.01673, audio_tagging_loss=0.01673, over 25000.00 frames. ], tot_loss[loss=0.0131, audio_tagging_loss=0.0131, over 4960207.27 frames. ], batch size: 100, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 21:07:30,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=782893.3333333334, ans=0.02 2023-12-22 21:07:32,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=782893.3333333334, ans=0.1 2023-12-22 21:07:38,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=782960.0, ans=0.1 2023-12-22 21:07:39,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2023-12-22 21:07:40,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=782960.0, ans=0.125 2023-12-22 21:07:41,967 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.821e+01 3.063e+01 3.171e+01 3.346e+01 3.819e+01, threshold=6.341e+01, percent-clipped=0.0 2023-12-22 21:08:15,768 INFO [train.py:886] (1/4) Epoch 25, batch 3100, loss[loss=0.01408, audio_tagging_loss=0.01408, over 24750.00 frames. ], tot_loss[loss=0.01314, audio_tagging_loss=0.01314, over 4958301.32 frames. ], batch size: 99, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:08:20,465 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 21:08:30,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=783293.3333333334, ans=0.1 2023-12-22 21:08:34,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=783293.3333333334, ans=0.1 2023-12-22 21:08:51,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=783426.6666666666, ans=0.0 2023-12-22 21:08:55,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=783426.6666666666, ans=0.125 2023-12-22 21:09:07,536 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.04 vs. limit=15.0 2023-12-22 21:09:08,011 INFO [train.py:886] (1/4) Epoch 25, batch 3150, loss[loss=0.01273, audio_tagging_loss=0.01273, over 24750.00 frames. ], tot_loss[loss=0.01331, audio_tagging_loss=0.01331, over 4954266.86 frames. ], batch size: 99, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:09:14,779 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.01 vs. limit=6.0 2023-12-22 21:09:17,841 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.15 vs. limit=22.5 2023-12-22 21:09:23,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=783626.6666666666, ans=0.125 2023-12-22 21:09:26,509 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.818e+01 3.103e+01 3.282e+01 3.438e+01 3.839e+01, threshold=6.565e+01, percent-clipped=0.0 2023-12-22 21:09:33,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=783693.3333333334, ans=0.1 2023-12-22 21:09:33,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=783693.3333333334, ans=0.1 2023-12-22 21:09:45,364 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.15 vs. limit=15.0 2023-12-22 21:09:59,530 INFO [train.py:886] (1/4) Epoch 25, batch 3200, loss[loss=0.01303, audio_tagging_loss=0.01303, over 24750.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4947515.42 frames. ], batch size: 99, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:10:09,990 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.21 vs. limit=12.0 2023-12-22 21:10:16,474 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.44 vs. limit=6.0 2023-12-22 21:10:23,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=784026.6666666666, ans=0.125 2023-12-22 21:10:40,866 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 21:10:41,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=784160.0, ans=0.0 2023-12-22 21:10:51,692 INFO [train.py:886] (1/4) Epoch 25, batch 3250, loss[loss=0.01693, audio_tagging_loss=0.01693, over 24750.00 frames. ], tot_loss[loss=0.01322, audio_tagging_loss=0.01322, over 4949315.67 frames. ], batch size: 99, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:10:51,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=784226.6666666666, ans=0.0 2023-12-22 21:11:04,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=784293.3333333334, ans=0.5 2023-12-22 21:11:10,156 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.690e+01 3.040e+01 3.210e+01 3.374e+01 3.789e+01, threshold=6.419e+01, percent-clipped=0.0 2023-12-22 21:11:34,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=784493.3333333334, ans=0.1 2023-12-22 21:11:44,422 INFO [train.py:886] (1/4) Epoch 25, batch 3300, loss[loss=0.0128, audio_tagging_loss=0.0128, over 25000.00 frames. ], tot_loss[loss=0.0131, audio_tagging_loss=0.0131, over 4949318.90 frames. ], batch size: 100, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:11:44,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=784560.0, ans=0.125 2023-12-22 21:11:47,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=784560.0, ans=0.125 2023-12-22 21:11:57,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=784626.6666666666, ans=0.125 2023-12-22 21:12:00,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=784626.6666666666, ans=0.125 2023-12-22 21:12:14,789 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.52 vs. limit=12.0 2023-12-22 21:12:18,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=784760.0, ans=0.1 2023-12-22 21:12:24,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=784760.0, ans=0.0 2023-12-22 21:12:29,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=784826.6666666666, ans=0.125 2023-12-22 21:12:36,221 INFO [train.py:886] (1/4) Epoch 25, batch 3350, loss[loss=0.01496, audio_tagging_loss=0.01496, over 25000.00 frames. ], tot_loss[loss=0.01306, audio_tagging_loss=0.01306, over 4952579.72 frames. ], batch size: 100, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:12:54,043 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.707e+01 3.036e+01 3.188e+01 3.323e+01 3.889e+01, threshold=6.376e+01, percent-clipped=0.0 2023-12-22 21:13:20,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=785160.0, ans=0.025 2023-12-22 21:13:27,267 INFO [train.py:886] (1/4) Epoch 25, batch 3400, loss[loss=0.01264, audio_tagging_loss=0.01264, over 25000.00 frames. ], tot_loss[loss=0.01301, audio_tagging_loss=0.01301, over 4956800.00 frames. ], batch size: 100, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:13:47,469 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.15 vs. limit=12.0 2023-12-22 21:13:53,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.51 vs. limit=22.5 2023-12-22 21:14:01,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=785426.6666666666, ans=0.125 2023-12-22 21:14:07,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=785426.6666666666, ans=0.07 2023-12-22 21:14:19,129 INFO [train.py:886] (1/4) Epoch 25, batch 3450, loss[loss=0.01137, audio_tagging_loss=0.01137, over 24750.00 frames. ], tot_loss[loss=0.01309, audio_tagging_loss=0.01309, over 4954151.31 frames. ], batch size: 99, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:14:38,181 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.865e+01 3.134e+01 3.253e+01 3.389e+01 3.963e+01, threshold=6.506e+01, percent-clipped=0.0 2023-12-22 21:14:47,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=785693.3333333334, ans=0.1 2023-12-22 21:15:06,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=785826.6666666666, ans=0.125 2023-12-22 21:15:11,235 INFO [train.py:886] (1/4) Epoch 25, batch 3500, loss[loss=0.01389, audio_tagging_loss=0.01389, over 25000.00 frames. ], tot_loss[loss=0.0131, audio_tagging_loss=0.0131, over 4951218.04 frames. ], batch size: 100, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:15:12,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=785893.3333333334, ans=0.09899494936611666 2023-12-22 21:15:13,596 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.06 vs. limit=22.5 2023-12-22 21:15:16,369 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.98 vs. limit=15.0 2023-12-22 21:15:17,566 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.22 vs. limit=15.0 2023-12-22 21:15:42,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=786093.3333333334, ans=0.125 2023-12-22 21:15:44,161 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.13 vs. limit=22.5 2023-12-22 21:16:02,904 INFO [train.py:886] (1/4) Epoch 25, batch 3550, loss[loss=0.01307, audio_tagging_loss=0.01307, over 25000.00 frames. ], tot_loss[loss=0.013, audio_tagging_loss=0.013, over 4953476.94 frames. ], batch size: 100, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:16:07,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=786226.6666666666, ans=0.125 2023-12-22 21:16:22,262 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.633e+01 3.021e+01 3.171e+01 3.367e+01 3.812e+01, threshold=6.343e+01, percent-clipped=0.0 2023-12-22 21:16:23,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=786360.0, ans=0.2 2023-12-22 21:16:27,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=786360.0, ans=0.125 2023-12-22 21:16:30,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=786360.0, ans=0.125 2023-12-22 21:16:46,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=786493.3333333334, ans=0.125 2023-12-22 21:16:54,960 INFO [train.py:886] (1/4) Epoch 25, batch 3600, loss[loss=0.01313, audio_tagging_loss=0.01313, over 24750.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4952122.36 frames. ], batch size: 99, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:17:05,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=786626.6666666666, ans=0.0 2023-12-22 21:17:08,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=786626.6666666666, ans=0.125 2023-12-22 21:17:14,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=786626.6666666666, ans=0.0 2023-12-22 21:17:16,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=786693.3333333334, ans=0.1 2023-12-22 21:17:26,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=786760.0, ans=0.125 2023-12-22 21:17:35,449 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.10 vs. limit=22.5 2023-12-22 21:17:37,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=786826.6666666666, ans=0.0 2023-12-22 21:17:47,388 INFO [train.py:886] (1/4) Epoch 25, batch 3650, loss[loss=0.0153, audio_tagging_loss=0.0153, over 25000.00 frames. ], tot_loss[loss=0.01286, audio_tagging_loss=0.01286, over 4952061.12 frames. ], batch size: 100, lr: 4.30e-03, grad_scale: 64.0 2023-12-22 21:17:47,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=786893.3333333334, ans=0.125 2023-12-22 21:17:50,778 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.33 vs. limit=12.0 2023-12-22 21:17:58,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=786960.0, ans=0.125 2023-12-22 21:18:04,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=786960.0, ans=0.2 2023-12-22 21:18:05,038 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.785e+01 2.980e+01 3.189e+01 3.343e+01 3.889e+01, threshold=6.377e+01, percent-clipped=0.0 2023-12-22 21:18:21,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=787093.3333333334, ans=0.125 2023-12-22 21:18:28,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=787160.0, ans=0.1 2023-12-22 21:18:35,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=787160.0, ans=0.0 2023-12-22 21:18:35,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=787160.0, ans=0.125 2023-12-22 21:18:38,511 INFO [train.py:886] (1/4) Epoch 25, batch 3700, loss[loss=0.01197, audio_tagging_loss=0.01197, over 25000.00 frames. ], tot_loss[loss=0.01292, audio_tagging_loss=0.01292, over 4953046.87 frames. ], batch size: 100, lr: 4.30e-03, grad_scale: 64.0 2023-12-22 21:18:41,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=787226.6666666666, ans=0.125 2023-12-22 21:18:46,334 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.64 vs. limit=15.0 2023-12-22 21:19:09,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=787426.6666666666, ans=0.125 2023-12-22 21:19:23,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.23 vs. limit=15.0 2023-12-22 21:19:30,718 INFO [train.py:886] (1/4) Epoch 25, batch 3750, loss[loss=0.01523, audio_tagging_loss=0.01523, over 24938.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 4950667.89 frames. ], batch size: 100, lr: 4.30e-03, grad_scale: 64.0 2023-12-22 21:19:34,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=787560.0, ans=0.125 2023-12-22 21:19:46,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=787626.6666666666, ans=0.125 2023-12-22 21:19:49,015 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.767e+01 3.099e+01 3.227e+01 3.364e+01 3.864e+01, threshold=6.453e+01, percent-clipped=0.0 2023-12-22 21:19:51,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=787693.3333333334, ans=0.125 2023-12-22 21:20:12,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.51 vs. limit=15.0 2023-12-22 21:20:22,309 INFO [train.py:886] (1/4) Epoch 25, batch 3800, loss[loss=0.01497, audio_tagging_loss=0.01497, over 24750.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4947918.42 frames. ], batch size: 99, lr: 4.30e-03, grad_scale: 64.0 2023-12-22 21:20:22,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=787893.3333333334, ans=0.125 2023-12-22 21:20:27,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=787893.3333333334, ans=0.09899494936611666 2023-12-22 21:20:33,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=787960.0, ans=0.0 2023-12-22 21:21:14,615 INFO [train.py:886] (1/4) Epoch 25, batch 3850, loss[loss=0.01329, audio_tagging_loss=0.01329, over 24750.00 frames. ], tot_loss[loss=0.01311, audio_tagging_loss=0.01311, over 4949599.41 frames. ], batch size: 99, lr: 4.30e-03, grad_scale: 64.0 2023-12-22 21:21:29,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=788293.3333333334, ans=0.125 2023-12-22 21:21:30,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=788293.3333333334, ans=0.0 2023-12-22 21:21:33,016 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.703e+01 3.115e+01 3.236e+01 3.422e+01 4.867e+01, threshold=6.472e+01, percent-clipped=0.0 2023-12-22 21:21:57,394 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.92 vs. limit=15.0 2023-12-22 21:21:59,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=788493.3333333334, ans=0.125 2023-12-22 21:22:00,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=788493.3333333334, ans=0.035 2023-12-22 21:22:03,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=788493.3333333334, ans=0.125 2023-12-22 21:22:04,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=788493.3333333334, ans=0.125 2023-12-22 21:22:06,290 INFO [train.py:886] (1/4) Epoch 25, batch 3900, loss[loss=0.01255, audio_tagging_loss=0.01255, over 24750.00 frames. ], tot_loss[loss=0.01305, audio_tagging_loss=0.01305, over 4950907.67 frames. ], batch size: 99, lr: 4.30e-03, grad_scale: 64.0 2023-12-22 21:22:08,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=788560.0, ans=0.125 2023-12-22 21:22:56,936 INFO [train.py:886] (1/4) Epoch 25, batch 3950, loss[loss=0.01275, audio_tagging_loss=0.01275, over 25000.00 frames. ], tot_loss[loss=0.013, audio_tagging_loss=0.013, over 4949586.43 frames. ], batch size: 100, lr: 4.30e-03, grad_scale: 64.0 2023-12-22 21:23:04,522 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.57 vs. limit=22.5 2023-12-22 21:23:05,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=788893.3333333334, ans=0.0 2023-12-22 21:23:16,157 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.672e+01 3.014e+01 3.191e+01 3.353e+01 3.763e+01, threshold=6.383e+01, percent-clipped=0.0 2023-12-22 21:23:18,487 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.78 vs. limit=15.0 2023-12-22 21:23:48,857 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=15.0 2023-12-22 21:23:49,435 INFO [train.py:886] (1/4) Epoch 25, batch 4000, loss[loss=0.01211, audio_tagging_loss=0.01211, over 24058.00 frames. ], tot_loss[loss=0.01299, audio_tagging_loss=0.01299, over 4951431.85 frames. ], batch size: 100, lr: 4.30e-03, grad_scale: 128.0 2023-12-22 21:24:05,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=789293.3333333334, ans=0.125 2023-12-22 21:24:19,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=789426.6666666666, ans=0.1 2023-12-22 21:24:41,230 INFO [train.py:886] (1/4) Epoch 25, batch 4050, loss[loss=0.01104, audio_tagging_loss=0.01104, over 24750.00 frames. ], tot_loss[loss=0.01308, audio_tagging_loss=0.01308, over 4953247.36 frames. ], batch size: 99, lr: 4.30e-03, grad_scale: 64.0 2023-12-22 21:24:44,498 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.81 vs. limit=10.0 2023-12-22 21:24:59,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=789626.6666666666, ans=0.125 2023-12-22 21:25:01,300 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.755e+01 3.123e+01 3.229e+01 3.371e+01 4.451e+01, threshold=6.458e+01, percent-clipped=0.0 2023-12-22 21:25:10,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=789693.3333333334, ans=0.2 2023-12-22 21:25:30,060 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.67 vs. limit=15.0 2023-12-22 21:25:33,453 INFO [train.py:886] (1/4) Epoch 25, batch 4100, loss[loss=0.01638, audio_tagging_loss=0.01638, over 24750.00 frames. ], tot_loss[loss=0.0132, audio_tagging_loss=0.0132, over 4944747.43 frames. ], batch size: 99, lr: 4.30e-03, grad_scale: 64.0 2023-12-22 21:25:42,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=789893.3333333334, ans=0.125 2023-12-22 21:25:47,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=789960.0, ans=0.0 2023-12-22 21:25:54,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=790026.6666666666, ans=0.125 2023-12-22 21:25:57,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=790026.6666666666, ans=0.125 2023-12-22 21:26:06,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=790093.3333333334, ans=0.125 2023-12-22 21:26:24,977 INFO [train.py:886] (1/4) Epoch 25, batch 4150, loss[loss=0.01294, audio_tagging_loss=0.01294, over 25000.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 4943911.65 frames. ], batch size: 100, lr: 4.30e-03, grad_scale: 64.0 2023-12-22 21:26:31,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=790226.6666666666, ans=15.0 2023-12-22 21:26:37,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=790293.3333333334, ans=0.0 2023-12-22 21:26:42,271 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 21:26:44,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=790293.3333333334, ans=0.1 2023-12-22 21:26:44,959 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.625e+01 3.070e+01 3.176e+01 3.307e+01 3.814e+01, threshold=6.351e+01, percent-clipped=0.0 2023-12-22 21:26:53,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=790360.0, ans=0.0 2023-12-22 21:27:02,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=790426.6666666666, ans=0.2 2023-12-22 21:27:10,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=790493.3333333334, ans=0.1 2023-12-22 21:27:11,537 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=20.95 vs. limit=22.5 2023-12-22 21:27:16,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=790560.0, ans=0.125 2023-12-22 21:27:17,233 INFO [train.py:886] (1/4) Epoch 25, batch 4200, loss[loss=0.01136, audio_tagging_loss=0.01136, over 25000.00 frames. ], tot_loss[loss=0.01312, audio_tagging_loss=0.01312, over 4946745.70 frames. ], batch size: 100, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:27:24,989 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.58 vs. limit=15.0 2023-12-22 21:27:43,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=790693.3333333334, ans=0.125 2023-12-22 21:27:54,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=790760.0, ans=0.0 2023-12-22 21:27:55,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=790760.0, ans=0.125 2023-12-22 21:28:09,323 INFO [train.py:886] (1/4) Epoch 25, batch 4250, loss[loss=0.01236, audio_tagging_loss=0.01236, over 24098.00 frames. ], tot_loss[loss=0.01302, audio_tagging_loss=0.01302, over 4946049.01 frames. ], batch size: 100, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:28:18,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=790893.3333333334, ans=10.0 2023-12-22 21:28:27,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=790960.0, ans=0.125 2023-12-22 21:28:29,339 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.694e+01 3.059e+01 3.182e+01 3.332e+01 3.915e+01, threshold=6.364e+01, percent-clipped=0.0 2023-12-22 21:28:35,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=791026.6666666666, ans=0.0 2023-12-22 21:28:42,002 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.70 vs. limit=15.0 2023-12-22 21:28:46,464 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.32 vs. limit=15.0 2023-12-22 21:29:01,506 INFO [train.py:886] (1/4) Epoch 25, batch 4300, loss[loss=0.01315, audio_tagging_loss=0.01315, over 25000.00 frames. ], tot_loss[loss=0.013, audio_tagging_loss=0.013, over 4954063.66 frames. ], batch size: 100, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:29:06,661 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2023-12-22 21:29:18,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=791293.3333333334, ans=0.2 2023-12-22 21:29:31,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=791426.6666666666, ans=0.1 2023-12-22 21:29:39,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=791426.6666666666, ans=0.125 2023-12-22 21:29:52,955 INFO [train.py:886] (1/4) Epoch 25, batch 4350, loss[loss=0.01219, audio_tagging_loss=0.01219, over 24750.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4950367.57 frames. ], batch size: 99, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:30:04,149 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.05 vs. limit=6.0 2023-12-22 21:30:10,761 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.02 vs. limit=22.5 2023-12-22 21:30:12,257 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.736e+01 3.142e+01 3.246e+01 3.434e+01 4.775e+01, threshold=6.491e+01, percent-clipped=0.0 2023-12-22 21:30:12,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=791693.3333333334, ans=0.125 2023-12-22 21:30:16,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=791693.3333333334, ans=0.2 2023-12-22 21:30:27,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=791760.0, ans=0.0 2023-12-22 21:30:41,989 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.60 vs. limit=15.0 2023-12-22 21:30:43,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=791893.3333333334, ans=0.0 2023-12-22 21:30:44,409 INFO [train.py:886] (1/4) Epoch 25, batch 4400, loss[loss=0.01486, audio_tagging_loss=0.01486, over 24750.00 frames. ], tot_loss[loss=0.0132, audio_tagging_loss=0.0132, over 4948960.90 frames. ], batch size: 99, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:30:47,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=791893.3333333334, ans=0.0 2023-12-22 21:31:02,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=791960.0, ans=0.035 2023-12-22 21:31:04,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=792026.6666666666, ans=0.04949747468305833 2023-12-22 21:31:15,391 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.67 vs. limit=15.0 2023-12-22 21:31:25,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=792160.0, ans=0.1 2023-12-22 21:31:36,750 INFO [train.py:886] (1/4) Epoch 25, batch 4450, loss[loss=0.01351, audio_tagging_loss=0.01351, over 24055.00 frames. ], tot_loss[loss=0.01324, audio_tagging_loss=0.01324, over 4944176.63 frames. ], batch size: 100, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:31:50,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=792293.3333333334, ans=0.1 2023-12-22 21:31:55,987 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.817e+01 3.096e+01 3.289e+01 3.454e+01 4.109e+01, threshold=6.578e+01, percent-clipped=0.0 2023-12-22 21:31:56,433 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.63 vs. limit=15.0 2023-12-22 21:32:02,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=792360.0, ans=0.1 2023-12-22 21:32:07,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=792426.6666666666, ans=0.125 2023-12-22 21:32:11,546 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 21:32:18,419 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.53 vs. limit=15.0 2023-12-22 21:32:22,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=792493.3333333334, ans=0.2 2023-12-22 21:32:28,148 INFO [train.py:886] (1/4) Epoch 25, batch 4500, loss[loss=0.01213, audio_tagging_loss=0.01213, over 25000.00 frames. ], tot_loss[loss=0.01316, audio_tagging_loss=0.01316, over 4947263.14 frames. ], batch size: 100, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:32:51,774 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.37 vs. limit=15.0 2023-12-22 21:32:55,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=792693.3333333334, ans=0.1 2023-12-22 21:33:19,564 INFO [train.py:886] (1/4) Epoch 25, batch 4550, loss[loss=0.01281, audio_tagging_loss=0.01281, over 25000.00 frames. ], tot_loss[loss=0.01311, audio_tagging_loss=0.01311, over 4951475.93 frames. ], batch size: 100, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:33:39,624 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.637e+01 3.079e+01 3.195e+01 3.326e+01 3.977e+01, threshold=6.390e+01, percent-clipped=0.0 2023-12-22 21:33:48,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=793026.6666666666, ans=0.0 2023-12-22 21:33:48,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=793026.6666666666, ans=0.125 2023-12-22 21:34:08,952 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.82 vs. limit=10.0 2023-12-22 21:34:10,998 INFO [train.py:886] (1/4) Epoch 25, batch 4600, loss[loss=0.01319, audio_tagging_loss=0.01319, over 24900.00 frames. ], tot_loss[loss=0.01308, audio_tagging_loss=0.01308, over 4949673.16 frames. ], batch size: 100, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:34:18,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=793226.6666666666, ans=0.1 2023-12-22 21:34:34,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=793360.0, ans=0.1 2023-12-22 21:34:40,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=793426.6666666666, ans=0.125 2023-12-22 21:34:45,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=793426.6666666666, ans=0.1 2023-12-22 21:34:46,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=793426.6666666666, ans=0.0 2023-12-22 21:34:56,613 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.29 vs. limit=15.0 2023-12-22 21:35:02,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=793560.0, ans=0.125 2023-12-22 21:35:02,781 INFO [train.py:886] (1/4) Epoch 25, batch 4650, loss[loss=0.01362, audio_tagging_loss=0.01362, over 25000.00 frames. ], tot_loss[loss=0.01316, audio_tagging_loss=0.01316, over 4955823.46 frames. ], batch size: 100, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:35:13,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=793626.6666666666, ans=0.1 2023-12-22 21:35:22,748 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.714e+01 3.053e+01 3.192e+01 3.319e+01 4.042e+01, threshold=6.384e+01, percent-clipped=0.0 2023-12-22 21:35:44,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=793826.6666666666, ans=0.1 2023-12-22 21:35:45,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=793826.6666666666, ans=0.025 2023-12-22 21:35:53,215 INFO [train.py:886] (1/4) Epoch 25, batch 4700, loss[loss=0.01386, audio_tagging_loss=0.01386, over 24750.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4952222.96 frames. ], batch size: 99, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:35:58,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=793893.3333333334, ans=0.0 2023-12-22 21:36:04,143 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.00 vs. limit=15.0 2023-12-22 21:36:27,824 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 21:36:32,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=794160.0, ans=0.0 2023-12-22 21:36:34,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=794160.0, ans=0.125 2023-12-22 21:36:35,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=794160.0, ans=0.125 2023-12-22 21:36:40,976 INFO [train.py:886] (1/4) Epoch 25, batch 4750, loss[loss=0.01257, audio_tagging_loss=0.01257, over 24750.00 frames. ], tot_loss[loss=0.0133, audio_tagging_loss=0.0133, over 4949945.82 frames. ], batch size: 99, lr: 4.28e-03, grad_scale: 64.0 2023-12-22 21:37:15,784 INFO [train.py:886] (1/4) Epoch 26, batch 0, loss[loss=0.03604, audio_tagging_loss=0.03604, over 21321.00 frames. ], tot_loss[loss=0.03604, audio_tagging_loss=0.03604, over 21321.00 frames. ], batch size: 107, lr: 4.20e-03, grad_scale: 32.0 2023-12-22 21:37:15,784 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 21:37:37,120 INFO [train.py:917] (1/4) Epoch 26, validation: loss=0.03272, audio_tagging_loss=0.03272, over 3737520.00 frames. 2023-12-22 21:37:37,120 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 21:37:38,378 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.64 vs. limit=15.0 2023-12-22 21:37:39,319 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.34 vs. limit=15.0 2023-12-22 21:37:41,534 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.775e+01 3.157e+01 3.286e+01 3.436e+01 9.011e+01, threshold=6.571e+01, percent-clipped=3.0 2023-12-22 21:37:47,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=794400.0, ans=0.0 2023-12-22 21:37:49,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=794400.0, ans=0.125 2023-12-22 21:37:53,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=794400.0, ans=0.125 2023-12-22 21:38:28,753 INFO [train.py:886] (1/4) Epoch 26, batch 50, loss[loss=0.01643, audio_tagging_loss=0.01643, over 25000.00 frames. ], tot_loss[loss=0.02076, audio_tagging_loss=0.02076, over 1115912.01 frames. ], batch size: 100, lr: 4.20e-03, grad_scale: 32.0 2023-12-22 21:38:41,574 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.66 vs. limit=15.0 2023-12-22 21:38:43,318 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.03 vs. limit=22.5 2023-12-22 21:39:01,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=794866.6666666666, ans=0.2 2023-12-22 21:39:12,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=794933.3333333334, ans=0.1 2023-12-22 21:39:14,325 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.81 vs. limit=10.0 2023-12-22 21:39:15,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=794933.3333333334, ans=0.0 2023-12-22 21:39:20,284 INFO [train.py:886] (1/4) Epoch 26, batch 100, loss[loss=0.01382, audio_tagging_loss=0.01382, over 25000.00 frames. ], tot_loss[loss=0.01795, audio_tagging_loss=0.01795, over 1968081.47 frames. ], batch size: 100, lr: 4.20e-03, grad_scale: 32.0 2023-12-22 21:39:24,050 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.050e+01 3.579e+01 3.859e+01 4.416e+01 7.347e+01, threshold=7.717e+01, percent-clipped=4.0 2023-12-22 21:39:36,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=795066.6666666666, ans=0.125 2023-12-22 21:39:38,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=795066.6666666666, ans=0.1 2023-12-22 21:39:44,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=795133.3333333334, ans=0.125 2023-12-22 21:40:11,822 INFO [train.py:886] (1/4) Epoch 26, batch 150, loss[loss=0.01391, audio_tagging_loss=0.01391, over 25000.00 frames. ], tot_loss[loss=0.01641, audio_tagging_loss=0.01641, over 2633510.97 frames. ], batch size: 100, lr: 4.20e-03, grad_scale: 32.0 2023-12-22 21:40:13,525 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.28 vs. limit=15.0 2023-12-22 21:40:13,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=795333.3333333334, ans=0.1 2023-12-22 21:40:20,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=795333.3333333334, ans=0.125 2023-12-22 21:40:33,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=795466.6666666666, ans=0.2 2023-12-22 21:40:40,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=795466.6666666666, ans=0.125 2023-12-22 21:40:43,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=795533.3333333334, ans=0.125 2023-12-22 21:40:55,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=795600.0, ans=0.125 2023-12-22 21:41:02,957 INFO [train.py:886] (1/4) Epoch 26, batch 200, loss[loss=0.01534, audio_tagging_loss=0.01534, over 24750.00 frames. ], tot_loss[loss=0.01539, audio_tagging_loss=0.01539, over 3148418.10 frames. ], batch size: 99, lr: 4.20e-03, grad_scale: 32.0 2023-12-22 21:41:04,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=795666.6666666666, ans=0.125 2023-12-22 21:41:07,377 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.808e+01 3.182e+01 3.315e+01 3.522e+01 3.900e+01, threshold=6.631e+01, percent-clipped=0.0 2023-12-22 21:41:08,855 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.90 vs. limit=22.5 2023-12-22 21:41:19,024 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.67 vs. limit=22.5 2023-12-22 21:41:24,486 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.76 vs. limit=15.0 2023-12-22 21:41:32,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=795800.0, ans=0.0 2023-12-22 21:41:37,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=795866.6666666666, ans=0.1 2023-12-22 21:41:50,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=795933.3333333334, ans=0.2 2023-12-22 21:41:52,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=795933.3333333334, ans=0.1 2023-12-22 21:41:55,400 INFO [train.py:886] (1/4) Epoch 26, batch 250, loss[loss=0.009914, audio_tagging_loss=0.009914, over 25000.00 frames. ], tot_loss[loss=0.01473, audio_tagging_loss=0.01473, over 3547308.02 frames. ], batch size: 100, lr: 4.20e-03, grad_scale: 32.0 2023-12-22 21:42:18,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=796133.3333333334, ans=0.125 2023-12-22 21:42:33,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=796200.0, ans=0.0 2023-12-22 21:42:47,247 INFO [train.py:886] (1/4) Epoch 26, batch 300, loss[loss=0.013, audio_tagging_loss=0.013, over 24750.00 frames. ], tot_loss[loss=0.01436, audio_tagging_loss=0.01436, over 3854545.36 frames. ], batch size: 99, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:42:50,649 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=5.22 vs. limit=15.0 2023-12-22 21:42:50,985 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.813e+01 3.083e+01 3.251e+01 3.397e+01 4.143e+01, threshold=6.503e+01, percent-clipped=0.0 2023-12-22 21:43:25,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=796533.3333333334, ans=0.0 2023-12-22 21:43:25,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=796533.3333333334, ans=0.1 2023-12-22 21:43:36,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=796600.0, ans=0.125 2023-12-22 21:43:39,418 INFO [train.py:886] (1/4) Epoch 26, batch 350, loss[loss=0.01538, audio_tagging_loss=0.01538, over 24750.00 frames. ], tot_loss[loss=0.01416, audio_tagging_loss=0.01416, over 4092871.29 frames. ], batch size: 99, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:43:43,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=796666.6666666666, ans=0.125 2023-12-22 21:44:04,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=796800.0, ans=0.1 2023-12-22 21:44:29,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=797000.0, ans=0.125 2023-12-22 21:44:31,540 INFO [train.py:886] (1/4) Epoch 26, batch 400, loss[loss=0.01252, audio_tagging_loss=0.01252, over 25000.00 frames. ], tot_loss[loss=0.01384, audio_tagging_loss=0.01384, over 4279860.01 frames. ], batch size: 100, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:44:33,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=797000.0, ans=0.125 2023-12-22 21:44:36,030 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.737e+01 3.039e+01 3.196e+01 3.377e+01 3.798e+01, threshold=6.392e+01, percent-clipped=0.0 2023-12-22 21:44:50,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=797066.6666666666, ans=0.1 2023-12-22 21:44:52,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=797133.3333333334, ans=0.0 2023-12-22 21:45:00,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=797133.3333333334, ans=0.0 2023-12-22 21:45:15,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=797266.6666666666, ans=0.125 2023-12-22 21:45:23,758 INFO [train.py:886] (1/4) Epoch 26, batch 450, loss[loss=0.01365, audio_tagging_loss=0.01365, over 25000.00 frames. ], tot_loss[loss=0.01356, audio_tagging_loss=0.01356, over 4426620.13 frames. ], batch size: 100, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:45:35,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=797400.0, ans=0.125 2023-12-22 21:46:08,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=797600.0, ans=0.0 2023-12-22 21:46:14,592 INFO [train.py:886] (1/4) Epoch 26, batch 500, loss[loss=0.01608, audio_tagging_loss=0.01608, over 24750.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4539911.24 frames. ], batch size: 99, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:46:16,026 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.31 vs. limit=15.0 2023-12-22 21:46:19,023 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.814e+01 3.074e+01 3.193e+01 3.348e+01 4.490e+01, threshold=6.386e+01, percent-clipped=0.0 2023-12-22 21:46:21,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=797666.6666666666, ans=0.07 2023-12-22 21:46:35,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=797800.0, ans=0.125 2023-12-22 21:46:41,292 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.74 vs. limit=6.0 2023-12-22 21:46:42,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=797800.0, ans=0.0 2023-12-22 21:46:56,035 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.05 vs. limit=22.5 2023-12-22 21:47:05,465 INFO [train.py:886] (1/4) Epoch 26, batch 550, loss[loss=0.01219, audio_tagging_loss=0.01219, over 25000.00 frames. ], tot_loss[loss=0.01329, audio_tagging_loss=0.01329, over 4635283.34 frames. ], batch size: 100, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:47:21,140 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 21:47:30,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=798133.3333333334, ans=0.0 2023-12-22 21:47:34,024 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.08 vs. limit=6.0 2023-12-22 21:47:43,660 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.08 vs. limit=12.0 2023-12-22 21:47:57,531 INFO [train.py:886] (1/4) Epoch 26, batch 600, loss[loss=0.01457, audio_tagging_loss=0.01457, over 24947.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4701212.48 frames. ], batch size: 100, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:47:59,794 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.57 vs. limit=22.5 2023-12-22 21:48:01,263 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.753e+01 3.107e+01 3.234e+01 3.355e+01 4.218e+01, threshold=6.468e+01, percent-clipped=0.0 2023-12-22 21:48:11,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=798400.0, ans=0.125 2023-12-22 21:48:29,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=798533.3333333334, ans=0.125 2023-12-22 21:48:48,528 INFO [train.py:886] (1/4) Epoch 26, batch 650, loss[loss=0.01207, audio_tagging_loss=0.01207, over 24750.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4754693.10 frames. ], batch size: 99, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:49:05,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=798733.3333333334, ans=0.1 2023-12-22 21:49:15,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=798800.0, ans=0.0 2023-12-22 21:49:20,670 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.45 vs. limit=15.0 2023-12-22 21:49:36,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=798933.3333333334, ans=0.125 2023-12-22 21:49:40,196 INFO [train.py:886] (1/4) Epoch 26, batch 700, loss[loss=0.01318, audio_tagging_loss=0.01318, over 24750.00 frames. ], tot_loss[loss=0.0133, audio_tagging_loss=0.0133, over 4797060.40 frames. ], batch size: 99, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:49:43,944 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.720e+01 3.098e+01 3.257e+01 3.432e+01 3.751e+01, threshold=6.513e+01, percent-clipped=0.0 2023-12-22 21:49:57,469 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.80 vs. limit=15.0 2023-12-22 21:50:01,810 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2023-12-22 21:50:30,732 INFO [train.py:886] (1/4) Epoch 26, batch 750, loss[loss=0.01418, audio_tagging_loss=0.01418, over 25000.00 frames. ], tot_loss[loss=0.01323, audio_tagging_loss=0.01323, over 4834946.22 frames. ], batch size: 100, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:50:41,924 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 21:50:43,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=799400.0, ans=0.125 2023-12-22 21:50:43,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=799400.0, ans=0.0 2023-12-22 21:50:44,056 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.50 vs. limit=15.0 2023-12-22 21:51:05,492 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.85 vs. limit=15.0 2023-12-22 21:51:06,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=799533.3333333334, ans=0.125 2023-12-22 21:51:22,656 INFO [train.py:886] (1/4) Epoch 26, batch 800, loss[loss=0.01172, audio_tagging_loss=0.01172, over 24750.00 frames. ], tot_loss[loss=0.01313, audio_tagging_loss=0.01313, over 4865015.06 frames. ], batch size: 99, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:51:25,961 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.90 vs. limit=15.0 2023-12-22 21:51:26,417 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.722e+01 3.081e+01 3.211e+01 3.380e+01 3.889e+01, threshold=6.421e+01, percent-clipped=0.0 2023-12-22 21:51:27,096 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2023-12-22 21:51:29,632 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.19 vs. limit=22.5 2023-12-22 21:51:31,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=799666.6666666666, ans=0.0 2023-12-22 21:51:37,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=799733.3333333334, ans=0.125 2023-12-22 21:51:41,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=799733.3333333334, ans=0.2 2023-12-22 21:51:57,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=799866.6666666666, ans=0.125 2023-12-22 21:52:06,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.whiten.whitening_limit, batch_count=799933.3333333334, ans=15.0 2023-12-22 21:52:17,741 INFO [train.py:886] (1/4) Epoch 26, batch 850, loss[loss=0.01213, audio_tagging_loss=0.01213, over 25000.00 frames. ], tot_loss[loss=0.01313, audio_tagging_loss=0.01313, over 4889309.73 frames. ], batch size: 100, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 21:52:21,758 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.97 vs. limit=15.0 2023-12-22 21:52:30,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=800066.6666666666, ans=0.125 2023-12-22 21:52:44,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=800133.3333333334, ans=0.125 2023-12-22 21:52:50,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=800200.0, ans=0.025 2023-12-22 21:52:54,118 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.90 vs. limit=12.0 2023-12-22 21:53:00,627 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.42 vs. limit=22.5 2023-12-22 21:53:06,520 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.56 vs. limit=12.0 2023-12-22 21:53:08,751 INFO [train.py:886] (1/4) Epoch 26, batch 900, loss[loss=0.01277, audio_tagging_loss=0.01277, over 24750.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 4907439.70 frames. ], batch size: 99, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 21:53:13,187 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.742e+01 3.047e+01 3.202e+01 3.314e+01 4.091e+01, threshold=6.405e+01, percent-clipped=0.0 2023-12-22 21:53:43,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=800533.3333333334, ans=0.1 2023-12-22 21:53:49,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=800533.3333333334, ans=0.2 2023-12-22 21:54:01,662 INFO [train.py:886] (1/4) Epoch 26, batch 950, loss[loss=0.01308, audio_tagging_loss=0.01308, over 24750.00 frames. ], tot_loss[loss=0.01321, audio_tagging_loss=0.01321, over 4915936.03 frames. ], batch size: 99, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 21:54:02,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=800666.6666666666, ans=0.07 2023-12-22 21:54:51,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.88 vs. limit=6.0 2023-12-22 21:54:53,027 INFO [train.py:886] (1/4) Epoch 26, batch 1000, loss[loss=0.01215, audio_tagging_loss=0.01215, over 23975.00 frames. ], tot_loss[loss=0.01316, audio_tagging_loss=0.01316, over 4918471.10 frames. ], batch size: 100, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 21:54:56,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=801000.0, ans=0.125 2023-12-22 21:54:56,818 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.794e+01 3.055e+01 3.210e+01 3.341e+01 4.117e+01, threshold=6.420e+01, percent-clipped=0.0 2023-12-22 21:54:59,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=801000.0, ans=0.125 2023-12-22 21:55:32,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=801200.0, ans=0.0 2023-12-22 21:55:41,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=801266.6666666666, ans=0.125 2023-12-22 21:55:43,851 INFO [train.py:886] (1/4) Epoch 26, batch 1050, loss[loss=0.01111, audio_tagging_loss=0.01111, over 25000.00 frames. ], tot_loss[loss=0.01312, audio_tagging_loss=0.01312, over 4926399.64 frames. ], batch size: 100, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 21:55:55,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=801400.0, ans=0.0 2023-12-22 21:55:55,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=801400.0, ans=0.0 2023-12-22 21:56:10,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=801466.6666666666, ans=0.125 2023-12-22 21:56:20,982 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.62 vs. limit=22.5 2023-12-22 21:56:36,435 INFO [train.py:886] (1/4) Epoch 26, batch 1100, loss[loss=0.01225, audio_tagging_loss=0.01225, over 25000.00 frames. ], tot_loss[loss=0.01305, audio_tagging_loss=0.01305, over 4935027.59 frames. ], batch size: 100, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 21:56:40,902 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.668e+01 3.073e+01 3.229e+01 3.408e+01 3.887e+01, threshold=6.457e+01, percent-clipped=0.0 2023-12-22 21:56:44,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=801666.6666666666, ans=0.125 2023-12-22 21:56:47,145 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.97 vs. limit=15.0 2023-12-22 21:57:02,743 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.40 vs. limit=22.5 2023-12-22 21:57:09,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=801866.6666666666, ans=0.125 2023-12-22 21:57:15,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=801866.6666666666, ans=0.2 2023-12-22 21:57:18,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=801933.3333333334, ans=0.1 2023-12-22 21:57:19,123 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.450e-02 2023-12-22 21:57:21,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=801933.3333333334, ans=0.1 2023-12-22 21:57:21,911 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 21:57:28,147 INFO [train.py:886] (1/4) Epoch 26, batch 1150, loss[loss=0.01381, audio_tagging_loss=0.01381, over 24750.00 frames. ], tot_loss[loss=0.01303, audio_tagging_loss=0.01303, over 4940171.64 frames. ], batch size: 99, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 21:57:29,636 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.66 vs. limit=15.0 2023-12-22 21:57:32,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=802000.0, ans=0.0 2023-12-22 21:57:46,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=5.34 vs. limit=15.0 2023-12-22 21:57:59,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.94 vs. limit=15.0 2023-12-22 21:58:09,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=802266.6666666666, ans=0.125 2023-12-22 21:58:20,321 INFO [train.py:886] (1/4) Epoch 26, batch 1200, loss[loss=0.01377, audio_tagging_loss=0.01377, over 25000.00 frames. ], tot_loss[loss=0.0131, audio_tagging_loss=0.0131, over 4947633.34 frames. ], batch size: 100, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 21:58:20,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=802333.3333333334, ans=0.0 2023-12-22 21:58:24,029 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.663e+01 3.101e+01 3.239e+01 3.403e+01 4.197e+01, threshold=6.477e+01, percent-clipped=0.0 2023-12-22 21:58:28,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=802400.0, ans=0.2 2023-12-22 21:58:34,588 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.49 vs. limit=22.5 2023-12-22 21:58:36,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=802400.0, ans=0.0 2023-12-22 21:59:02,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=802600.0, ans=0.125 2023-12-22 21:59:11,804 INFO [train.py:886] (1/4) Epoch 26, batch 1250, loss[loss=0.01193, audio_tagging_loss=0.01193, over 24750.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 4942251.11 frames. ], batch size: 99, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 21:59:14,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=802666.6666666666, ans=15.0 2023-12-22 21:59:19,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=802666.6666666666, ans=0.125 2023-12-22 21:59:33,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=802800.0, ans=0.1 2023-12-22 21:59:38,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=802800.0, ans=0.2 2023-12-22 21:59:46,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=802866.6666666666, ans=0.0 2023-12-22 21:59:50,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=802866.6666666666, ans=0.0 2023-12-22 21:59:50,711 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.58 vs. limit=15.0 2023-12-22 21:59:55,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=802933.3333333334, ans=0.04949747468305833 2023-12-22 21:59:59,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=802933.3333333334, ans=0.0 2023-12-22 22:00:03,362 INFO [train.py:886] (1/4) Epoch 26, batch 1300, loss[loss=0.01195, audio_tagging_loss=0.01195, over 24750.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 4938909.28 frames. ], batch size: 99, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 22:00:07,853 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.752e+01 3.138e+01 3.281e+01 3.471e+01 4.385e+01, threshold=6.561e+01, percent-clipped=0.0 2023-12-22 22:00:40,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=803200.0, ans=0.125 2023-12-22 22:00:48,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=803266.6666666666, ans=0.125 2023-12-22 22:00:55,953 INFO [train.py:886] (1/4) Epoch 26, batch 1350, loss[loss=0.01403, audio_tagging_loss=0.01403, over 25000.00 frames. ], tot_loss[loss=0.01308, audio_tagging_loss=0.01308, over 4931302.58 frames. ], batch size: 100, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 22:01:12,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.07 vs. limit=6.0 2023-12-22 22:01:18,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=803466.6666666666, ans=0.0 2023-12-22 22:01:33,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=803533.3333333334, ans=0.125 2023-12-22 22:01:48,084 INFO [train.py:886] (1/4) Epoch 26, batch 1400, loss[loss=0.01224, audio_tagging_loss=0.01224, over 24750.00 frames. ], tot_loss[loss=0.013, audio_tagging_loss=0.013, over 4935523.79 frames. ], batch size: 99, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 22:01:51,856 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.688e+01 3.025e+01 3.160e+01 3.304e+01 3.712e+01, threshold=6.320e+01, percent-clipped=0.0 2023-12-22 22:01:52,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=803666.6666666666, ans=0.125 2023-12-22 22:02:06,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=803733.3333333334, ans=0.125 2023-12-22 22:02:14,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=803800.0, ans=0.125 2023-12-22 22:02:20,775 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.29 vs. limit=22.5 2023-12-22 22:02:29,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=803933.3333333334, ans=0.0 2023-12-22 22:02:37,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=803933.3333333334, ans=0.5 2023-12-22 22:02:38,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=804000.0, ans=0.125 2023-12-22 22:02:39,171 INFO [train.py:886] (1/4) Epoch 26, batch 1450, loss[loss=0.01255, audio_tagging_loss=0.01255, over 25000.00 frames. ], tot_loss[loss=0.01305, audio_tagging_loss=0.01305, over 4942108.81 frames. ], batch size: 100, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:02:42,284 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.23 vs. limit=10.0 2023-12-22 22:02:49,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=804066.6666666666, ans=0.125 2023-12-22 22:03:31,161 INFO [train.py:886] (1/4) Epoch 26, batch 1500, loss[loss=0.01074, audio_tagging_loss=0.01074, over 25000.00 frames. ], tot_loss[loss=0.01302, audio_tagging_loss=0.01302, over 4946804.16 frames. ], batch size: 100, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:03:35,695 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.803e+01 3.046e+01 3.201e+01 3.307e+01 3.722e+01, threshold=6.402e+01, percent-clipped=0.0 2023-12-22 22:03:38,948 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.06 vs. limit=12.0 2023-12-22 22:03:42,846 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.60 vs. limit=15.0 2023-12-22 22:03:44,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=804400.0, ans=0.07 2023-12-22 22:03:48,088 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.19 vs. limit=15.0 2023-12-22 22:03:49,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=804400.0, ans=0.0 2023-12-22 22:03:52,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=804466.6666666666, ans=0.2 2023-12-22 22:04:03,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=804533.3333333334, ans=0.2 2023-12-22 22:04:20,467 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.96 vs. limit=10.0 2023-12-22 22:04:23,293 INFO [train.py:886] (1/4) Epoch 26, batch 1550, loss[loss=0.01347, audio_tagging_loss=0.01347, over 24750.00 frames. ], tot_loss[loss=0.01315, audio_tagging_loss=0.01315, over 4946352.27 frames. ], batch size: 99, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:04:29,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=804666.6666666666, ans=0.125 2023-12-22 22:04:56,686 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.95 vs. limit=15.0 2023-12-22 22:05:01,185 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.95 vs. limit=12.0 2023-12-22 22:05:14,683 INFO [train.py:886] (1/4) Epoch 26, batch 1600, loss[loss=0.01304, audio_tagging_loss=0.01304, over 25000.00 frames. ], tot_loss[loss=0.01322, audio_tagging_loss=0.01322, over 4944653.30 frames. ], batch size: 100, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:05:18,339 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.820e+01 3.137e+01 3.270e+01 3.392e+01 3.802e+01, threshold=6.540e+01, percent-clipped=0.0 2023-12-22 22:05:24,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=805066.6666666666, ans=0.125 2023-12-22 22:05:40,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=805133.3333333334, ans=0.125 2023-12-22 22:05:43,458 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.82 vs. limit=22.5 2023-12-22 22:05:46,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=805200.0, ans=0.0 2023-12-22 22:05:57,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=805266.6666666666, ans=0.125 2023-12-22 22:06:06,088 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:06:07,032 INFO [train.py:886] (1/4) Epoch 26, batch 1650, loss[loss=0.01204, audio_tagging_loss=0.01204, over 25000.00 frames. ], tot_loss[loss=0.01303, audio_tagging_loss=0.01303, over 4943083.50 frames. ], batch size: 100, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:06:18,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=805400.0, ans=0.2 2023-12-22 22:06:22,471 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.39 vs. limit=15.0 2023-12-22 22:06:37,885 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.22 vs. limit=22.5 2023-12-22 22:06:47,065 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.56 vs. limit=15.0 2023-12-22 22:06:57,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=805666.6666666666, ans=0.125 2023-12-22 22:06:57,930 INFO [train.py:886] (1/4) Epoch 26, batch 1700, loss[loss=0.01379, audio_tagging_loss=0.01379, over 25000.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 4946466.42 frames. ], batch size: 100, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:07:02,415 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.767e+01 3.061e+01 3.187e+01 3.330e+01 3.966e+01, threshold=6.373e+01, percent-clipped=0.0 2023-12-22 22:07:12,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=805733.3333333334, ans=0.0 2023-12-22 22:07:22,229 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.13 vs. limit=15.0 2023-12-22 22:07:25,027 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.24 vs. limit=15.0 2023-12-22 22:07:33,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=805866.6666666666, ans=0.2 2023-12-22 22:07:34,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=805866.6666666666, ans=0.125 2023-12-22 22:07:34,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=805866.6666666666, ans=0.1 2023-12-22 22:07:51,016 INFO [train.py:886] (1/4) Epoch 26, batch 1750, loss[loss=0.01119, audio_tagging_loss=0.01119, over 25000.00 frames. ], tot_loss[loss=0.01298, audio_tagging_loss=0.01298, over 4941633.81 frames. ], batch size: 100, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:07:56,404 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.64 vs. limit=15.0 2023-12-22 22:08:02,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=806066.6666666666, ans=0.125 2023-12-22 22:08:14,667 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.19 vs. limit=22.5 2023-12-22 22:08:41,209 INFO [train.py:886] (1/4) Epoch 26, batch 1800, loss[loss=0.01259, audio_tagging_loss=0.01259, over 25000.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4952088.15 frames. ], batch size: 100, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:08:45,754 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.749e+01 3.085e+01 3.206e+01 3.379e+01 3.984e+01, threshold=6.413e+01, percent-clipped=0.0 2023-12-22 22:08:49,835 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.98 vs. limit=15.0 2023-12-22 22:08:50,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=806400.0, ans=0.0 2023-12-22 22:09:02,103 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2023-12-22 22:09:21,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=806600.0, ans=0.125 2023-12-22 22:09:30,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=806600.0, ans=0.2 2023-12-22 22:09:33,102 INFO [train.py:886] (1/4) Epoch 26, batch 1850, loss[loss=0.01654, audio_tagging_loss=0.01654, over 25000.00 frames. ], tot_loss[loss=0.01298, audio_tagging_loss=0.01298, over 4949492.14 frames. ], batch size: 100, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:09:47,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=806733.3333333334, ans=0.0 2023-12-22 22:09:59,362 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.48 vs. limit=15.0 2023-12-22 22:10:04,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=806866.6666666666, ans=0.125 2023-12-22 22:10:10,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=806866.6666666666, ans=0.125 2023-12-22 22:10:13,672 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.72 vs. limit=22.5 2023-12-22 22:10:19,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=806933.3333333334, ans=0.0 2023-12-22 22:10:25,619 INFO [train.py:886] (1/4) Epoch 26, batch 1900, loss[loss=0.0128, audio_tagging_loss=0.0128, over 24750.00 frames. ], tot_loss[loss=0.01314, audio_tagging_loss=0.01314, over 4945789.33 frames. ], batch size: 99, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:10:30,062 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.864e+01 3.174e+01 3.329e+01 3.459e+01 4.807e+01, threshold=6.658e+01, percent-clipped=0.0 2023-12-22 22:10:39,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=807066.6666666666, ans=0.125 2023-12-22 22:10:42,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=807066.6666666666, ans=0.025 2023-12-22 22:10:46,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=807133.3333333334, ans=0.125 2023-12-22 22:10:52,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=807133.3333333334, ans=0.125 2023-12-22 22:10:58,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=807200.0, ans=0.125 2023-12-22 22:11:13,449 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.86 vs. limit=15.0 2023-12-22 22:11:16,646 INFO [train.py:886] (1/4) Epoch 26, batch 1950, loss[loss=0.009933, audio_tagging_loss=0.009933, over 25000.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4939378.41 frames. ], batch size: 100, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:11:20,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=807333.3333333334, ans=0.07 2023-12-22 22:11:40,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=807466.6666666666, ans=0.0 2023-12-22 22:11:51,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=807533.3333333334, ans=0.1 2023-12-22 22:11:55,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=807533.3333333334, ans=0.1 2023-12-22 22:11:56,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=807533.3333333334, ans=0.0 2023-12-22 22:12:09,906 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.43 vs. limit=15.0 2023-12-22 22:12:10,194 INFO [train.py:886] (1/4) Epoch 26, batch 2000, loss[loss=0.01266, audio_tagging_loss=0.01266, over 25000.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 4940213.39 frames. ], batch size: 100, lr: 4.17e-03, grad_scale: 64.0 2023-12-22 22:12:14,054 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.798e+01 3.056e+01 3.196e+01 3.415e+01 4.184e+01, threshold=6.392e+01, percent-clipped=0.0 2023-12-22 22:12:16,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=807666.6666666666, ans=0.125 2023-12-22 22:12:24,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=807733.3333333334, ans=0.0 2023-12-22 22:12:46,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=807866.6666666666, ans=0.125 2023-12-22 22:12:52,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=807933.3333333334, ans=0.125 2023-12-22 22:12:55,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=807933.3333333334, ans=0.1 2023-12-22 22:13:02,223 INFO [train.py:886] (1/4) Epoch 26, batch 2050, loss[loss=0.01429, audio_tagging_loss=0.01429, over 25000.00 frames. ], tot_loss[loss=0.01301, audio_tagging_loss=0.01301, over 4946879.22 frames. ], batch size: 100, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:13:13,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=808066.6666666666, ans=0.0 2023-12-22 22:13:19,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=808066.6666666666, ans=0.1 2023-12-22 22:13:37,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=808200.0, ans=0.0 2023-12-22 22:13:52,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=808333.3333333334, ans=0.125 2023-12-22 22:13:53,233 INFO [train.py:886] (1/4) Epoch 26, batch 2100, loss[loss=0.01219, audio_tagging_loss=0.01219, over 25000.00 frames. ], tot_loss[loss=0.013, audio_tagging_loss=0.013, over 4948373.35 frames. ], batch size: 100, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:13:56,969 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.782e+01 3.109e+01 3.258e+01 3.404e+01 4.082e+01, threshold=6.517e+01, percent-clipped=0.0 2023-12-22 22:14:06,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=808400.0, ans=0.0 2023-12-22 22:14:06,608 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.37 vs. limit=15.0 2023-12-22 22:14:08,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=808400.0, ans=0.0 2023-12-22 22:14:35,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=808600.0, ans=0.125 2023-12-22 22:14:44,688 INFO [train.py:886] (1/4) Epoch 26, batch 2150, loss[loss=0.01415, audio_tagging_loss=0.01415, over 24750.00 frames. ], tot_loss[loss=0.01295, audio_tagging_loss=0.01295, over 4954492.90 frames. ], batch size: 99, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:14:47,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=808666.6666666666, ans=0.125 2023-12-22 22:14:48,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=808666.6666666666, ans=12.0 2023-12-22 22:14:57,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=808733.3333333334, ans=0.125 2023-12-22 22:15:18,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=808866.6666666666, ans=0.0 2023-12-22 22:15:20,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=808866.6666666666, ans=0.0 2023-12-22 22:15:36,297 INFO [train.py:886] (1/4) Epoch 26, batch 2200, loss[loss=0.01108, audio_tagging_loss=0.01108, over 23975.00 frames. ], tot_loss[loss=0.01308, audio_tagging_loss=0.01308, over 4946920.86 frames. ], batch size: 100, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:15:40,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=809000.0, ans=0.0 2023-12-22 22:15:40,870 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.712e+01 3.112e+01 3.281e+01 3.454e+01 3.979e+01, threshold=6.563e+01, percent-clipped=0.0 2023-12-22 22:15:42,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=809000.0, ans=0.0 2023-12-22 22:15:46,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=809066.6666666666, ans=0.1 2023-12-22 22:16:08,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=809200.0, ans=0.125 2023-12-22 22:16:28,682 INFO [train.py:886] (1/4) Epoch 26, batch 2250, loss[loss=0.01192, audio_tagging_loss=0.01192, over 24750.00 frames. ], tot_loss[loss=0.01313, audio_tagging_loss=0.01313, over 4938336.42 frames. ], batch size: 99, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:16:31,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=809333.3333333334, ans=0.125 2023-12-22 22:16:46,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=809400.0, ans=0.125 2023-12-22 22:16:52,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=809466.6666666666, ans=0.2 2023-12-22 22:16:59,828 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.31 vs. limit=15.0 2023-12-22 22:17:00,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=809533.3333333334, ans=0.125 2023-12-22 22:17:02,438 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2023-12-22 22:17:04,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=809533.3333333334, ans=0.0 2023-12-22 22:17:05,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=809533.3333333334, ans=0.125 2023-12-22 22:17:19,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=809666.6666666666, ans=0.1 2023-12-22 22:17:20,281 INFO [train.py:886] (1/4) Epoch 26, batch 2300, loss[loss=0.01325, audio_tagging_loss=0.01325, over 25000.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 4943503.51 frames. ], batch size: 100, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:17:23,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=809666.6666666666, ans=0.0 2023-12-22 22:17:24,685 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.740e+01 3.089e+01 3.215e+01 3.416e+01 4.133e+01, threshold=6.430e+01, percent-clipped=0.0 2023-12-22 22:17:31,711 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.14 vs. limit=15.0 2023-12-22 22:17:54,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=809866.6666666666, ans=0.1 2023-12-22 22:17:58,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=809866.6666666666, ans=0.125 2023-12-22 22:18:07,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=809933.3333333334, ans=0.0 2023-12-22 22:18:11,851 INFO [train.py:886] (1/4) Epoch 26, batch 2350, loss[loss=0.01394, audio_tagging_loss=0.01394, over 24750.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4940578.98 frames. ], batch size: 99, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:18:14,275 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.52 vs. limit=6.0 2023-12-22 22:18:46,231 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.42 vs. limit=22.5 2023-12-22 22:19:03,983 INFO [train.py:886] (1/4) Epoch 26, batch 2400, loss[loss=0.01057, audio_tagging_loss=0.01057, over 21405.00 frames. ], tot_loss[loss=0.01292, audio_tagging_loss=0.01292, over 4944097.33 frames. ], batch size: 107, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:19:07,784 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.847e+01 3.068e+01 3.224e+01 3.358e+01 4.468e+01, threshold=6.448e+01, percent-clipped=0.0 2023-12-22 22:19:09,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=810333.3333333334, ans=0.125 2023-12-22 22:19:24,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.48 vs. limit=22.5 2023-12-22 22:19:46,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=810600.0, ans=0.1 2023-12-22 22:19:56,223 INFO [train.py:886] (1/4) Epoch 26, batch 2450, loss[loss=0.01435, audio_tagging_loss=0.01435, over 25000.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4948016.28 frames. ], batch size: 100, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:20:08,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=810733.3333333334, ans=0.1 2023-12-22 22:20:11,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=810733.3333333334, ans=0.05 2023-12-22 22:20:17,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=810800.0, ans=0.125 2023-12-22 22:20:20,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=810800.0, ans=0.125 2023-12-22 22:20:31,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=810866.6666666666, ans=0.1 2023-12-22 22:20:37,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=810933.3333333334, ans=0.125 2023-12-22 22:20:38,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=810933.3333333334, ans=0.0 2023-12-22 22:20:47,803 INFO [train.py:886] (1/4) Epoch 26, batch 2500, loss[loss=0.01669, audio_tagging_loss=0.01669, over 24953.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 4946313.98 frames. ], batch size: 100, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:20:50,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=811000.0, ans=0.1 2023-12-22 22:20:50,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=811000.0, ans=0.125 2023-12-22 22:20:50,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=811000.0, ans=0.04949747468305833 2023-12-22 22:20:52,294 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.780e+01 3.125e+01 3.301e+01 3.409e+01 3.789e+01, threshold=6.601e+01, percent-clipped=0.0 2023-12-22 22:21:01,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=811066.6666666666, ans=0.0 2023-12-22 22:21:17,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=811200.0, ans=0.125 2023-12-22 22:21:21,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=811200.0, ans=0.125 2023-12-22 22:21:27,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=811266.6666666666, ans=0.0 2023-12-22 22:21:27,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=811266.6666666666, ans=0.5 2023-12-22 22:21:29,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=811266.6666666666, ans=0.0 2023-12-22 22:21:33,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=811266.6666666666, ans=0.1 2023-12-22 22:21:38,990 INFO [train.py:886] (1/4) Epoch 26, batch 2550, loss[loss=0.0135, audio_tagging_loss=0.0135, over 24750.00 frames. ], tot_loss[loss=0.01316, audio_tagging_loss=0.01316, over 4943693.65 frames. ], batch size: 99, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:21:45,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=811333.3333333334, ans=0.125 2023-12-22 22:21:49,648 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.35 vs. limit=22.5 2023-12-22 22:21:53,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=811400.0, ans=0.05 2023-12-22 22:22:11,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=811533.3333333334, ans=0.0 2023-12-22 22:22:12,869 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.60 vs. limit=15.0 2023-12-22 22:22:15,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=811533.3333333334, ans=0.5 2023-12-22 22:22:19,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=811600.0, ans=0.0 2023-12-22 22:22:24,340 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=23.54 vs. limit=22.5 2023-12-22 22:22:24,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=811600.0, ans=0.0 2023-12-22 22:22:30,091 INFO [train.py:886] (1/4) Epoch 26, batch 2600, loss[loss=0.01374, audio_tagging_loss=0.01374, over 25000.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 4940091.65 frames. ], batch size: 100, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:22:35,137 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.819e+01 3.115e+01 3.236e+01 3.408e+01 3.889e+01, threshold=6.471e+01, percent-clipped=0.0 2023-12-22 22:22:46,782 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.44 vs. limit=10.0 2023-12-22 22:22:56,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=811800.0, ans=0.125 2023-12-22 22:22:59,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=811800.0, ans=0.125 2023-12-22 22:23:02,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=811866.6666666666, ans=0.1 2023-12-22 22:23:18,946 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.60 vs. limit=15.0 2023-12-22 22:23:22,388 INFO [train.py:886] (1/4) Epoch 26, batch 2650, loss[loss=0.01396, audio_tagging_loss=0.01396, over 25000.00 frames. ], tot_loss[loss=0.013, audio_tagging_loss=0.013, over 4944056.32 frames. ], batch size: 100, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:23:26,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=812000.0, ans=0.0 2023-12-22 22:23:48,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=812133.3333333334, ans=0.0 2023-12-22 22:24:14,625 INFO [train.py:886] (1/4) Epoch 26, batch 2700, loss[loss=0.0113, audio_tagging_loss=0.0113, over 25000.00 frames. ], tot_loss[loss=0.01296, audio_tagging_loss=0.01296, over 4950105.87 frames. ], batch size: 100, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:24:18,367 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.764e+01 3.107e+01 3.255e+01 3.402e+01 3.998e+01, threshold=6.509e+01, percent-clipped=0.0 2023-12-22 22:24:33,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=812466.6666666666, ans=0.125 2023-12-22 22:24:46,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=812533.3333333334, ans=0.0 2023-12-22 22:24:53,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=812533.3333333334, ans=0.1 2023-12-22 22:25:05,574 INFO [train.py:886] (1/4) Epoch 26, batch 2750, loss[loss=0.01273, audio_tagging_loss=0.01273, over 25000.00 frames. ], tot_loss[loss=0.01295, audio_tagging_loss=0.01295, over 4957993.56 frames. ], batch size: 100, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:25:08,928 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.27 vs. limit=15.0 2023-12-22 22:25:14,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=812666.6666666666, ans=0.05 2023-12-22 22:25:31,625 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:25:50,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=812933.3333333334, ans=0.125 2023-12-22 22:25:50,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=812933.3333333334, ans=0.125 2023-12-22 22:25:58,394 INFO [train.py:886] (1/4) Epoch 26, batch 2800, loss[loss=0.01555, audio_tagging_loss=0.01555, over 24750.00 frames. ], tot_loss[loss=0.01298, audio_tagging_loss=0.01298, over 4957239.05 frames. ], batch size: 99, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:26:02,085 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.802e+01 3.113e+01 3.268e+01 3.447e+01 3.793e+01, threshold=6.536e+01, percent-clipped=0.0 2023-12-22 22:26:33,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=813200.0, ans=0.125 2023-12-22 22:26:44,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=813266.6666666666, ans=0.125 2023-12-22 22:26:47,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=813266.6666666666, ans=0.125 2023-12-22 22:26:48,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=813333.3333333334, ans=0.09899494936611666 2023-12-22 22:26:48,957 INFO [train.py:886] (1/4) Epoch 26, batch 2850, loss[loss=0.01228, audio_tagging_loss=0.01228, over 21816.00 frames. ], tot_loss[loss=0.01306, audio_tagging_loss=0.01306, over 4944595.51 frames. ], batch size: 107, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:26:59,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=813400.0, ans=0.125 2023-12-22 22:27:01,291 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.53 vs. limit=5.0 2023-12-22 22:27:06,437 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:27:20,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=813533.3333333334, ans=0.2 2023-12-22 22:27:29,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=813533.3333333334, ans=0.1 2023-12-22 22:27:30,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=813600.0, ans=0.1 2023-12-22 22:27:35,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=813600.0, ans=0.125 2023-12-22 22:27:35,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=813600.0, ans=0.125 2023-12-22 22:27:40,979 INFO [train.py:886] (1/4) Epoch 26, batch 2900, loss[loss=0.01283, audio_tagging_loss=0.01283, over 25000.00 frames. ], tot_loss[loss=0.01309, audio_tagging_loss=0.01309, over 4943777.47 frames. ], batch size: 100, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:27:44,732 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.685e+01 3.068e+01 3.241e+01 3.417e+01 3.879e+01, threshold=6.482e+01, percent-clipped=0.0 2023-12-22 22:27:59,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=813733.3333333334, ans=0.0 2023-12-22 22:28:17,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=813866.6666666666, ans=0.125 2023-12-22 22:28:18,486 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.78 vs. limit=15.0 2023-12-22 22:28:33,264 INFO [train.py:886] (1/4) Epoch 26, batch 2950, loss[loss=0.0113, audio_tagging_loss=0.0113, over 25000.00 frames. ], tot_loss[loss=0.01305, audio_tagging_loss=0.01305, over 4944166.10 frames. ], batch size: 100, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:28:33,545 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:28:34,731 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.07 vs. limit=22.5 2023-12-22 22:28:37,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=814000.0, ans=0.125 2023-12-22 22:28:40,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=814000.0, ans=0.2 2023-12-22 22:28:51,464 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.85 vs. limit=15.0 2023-12-22 22:28:53,436 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.30 vs. limit=15.0 2023-12-22 22:29:02,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=814133.3333333334, ans=0.125 2023-12-22 22:29:03,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=814200.0, ans=0.125 2023-12-22 22:29:13,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=814200.0, ans=0.125 2023-12-22 22:29:24,370 INFO [train.py:886] (1/4) Epoch 26, batch 3000, loss[loss=0.01327, audio_tagging_loss=0.01327, over 25000.00 frames. ], tot_loss[loss=0.01294, audio_tagging_loss=0.01294, over 4952770.56 frames. ], batch size: 100, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:29:24,370 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 22:29:45,070 INFO [train.py:917] (1/4) Epoch 26, validation: loss=0.03227, audio_tagging_loss=0.03227, over 3737520.00 frames. 2023-12-22 22:29:45,070 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 22:29:48,817 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.801e+01 3.069e+01 3.220e+01 3.381e+01 3.833e+01, threshold=6.439e+01, percent-clipped=0.0 2023-12-22 22:29:51,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=814333.3333333334, ans=0.015 2023-12-22 22:30:01,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=814400.0, ans=0.2 2023-12-22 22:30:30,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=814600.0, ans=0.0 2023-12-22 22:30:36,600 INFO [train.py:886] (1/4) Epoch 26, batch 3050, loss[loss=0.01343, audio_tagging_loss=0.01343, over 25000.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 4953414.67 frames. ], batch size: 100, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:30:38,728 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.47 vs. limit=15.0 2023-12-22 22:30:46,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=814733.3333333334, ans=0.1 2023-12-22 22:30:53,600 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.01 vs. limit=15.0 2023-12-22 22:31:08,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=814866.6666666666, ans=0.05 2023-12-22 22:31:28,247 INFO [train.py:886] (1/4) Epoch 26, batch 3100, loss[loss=0.01161, audio_tagging_loss=0.01161, over 24750.00 frames. ], tot_loss[loss=0.01298, audio_tagging_loss=0.01298, over 4956819.74 frames. ], batch size: 99, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:31:29,685 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.61 vs. limit=10.0 2023-12-22 22:31:29,724 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.00 vs. limit=15.0 2023-12-22 22:31:32,717 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.774e+01 3.119e+01 3.253e+01 3.438e+01 3.800e+01, threshold=6.506e+01, percent-clipped=0.0 2023-12-22 22:31:45,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=815066.6666666666, ans=0.2 2023-12-22 22:31:55,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=815133.3333333334, ans=0.0 2023-12-22 22:31:56,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=815133.3333333334, ans=0.125 2023-12-22 22:32:04,894 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.62 vs. limit=15.0 2023-12-22 22:32:08,950 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:32:21,416 INFO [train.py:886] (1/4) Epoch 26, batch 3150, loss[loss=0.01268, audio_tagging_loss=0.01268, over 22620.00 frames. ], tot_loss[loss=0.0131, audio_tagging_loss=0.0131, over 4945936.32 frames. ], batch size: 107, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:32:31,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=815400.0, ans=0.1 2023-12-22 22:32:54,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=815533.3333333334, ans=0.0 2023-12-22 22:33:00,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=815533.3333333334, ans=0.04949747468305833 2023-12-22 22:33:13,059 INFO [train.py:886] (1/4) Epoch 26, batch 3200, loss[loss=0.01327, audio_tagging_loss=0.01327, over 24935.00 frames. ], tot_loss[loss=0.01311, audio_tagging_loss=0.01311, over 4944034.44 frames. ], batch size: 100, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:33:16,942 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.874e+01 3.137e+01 3.242e+01 3.409e+01 4.105e+01, threshold=6.485e+01, percent-clipped=0.0 2023-12-22 22:33:20,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=815666.6666666666, ans=0.0 2023-12-22 22:33:21,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=815666.6666666666, ans=0.125 2023-12-22 22:33:26,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=815733.3333333334, ans=0.125 2023-12-22 22:33:53,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=815933.3333333334, ans=0.125 2023-12-22 22:33:59,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=815933.3333333334, ans=0.0 2023-12-22 22:34:01,059 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.12 vs. limit=15.0 2023-12-22 22:34:04,555 INFO [train.py:886] (1/4) Epoch 26, batch 3250, loss[loss=0.01524, audio_tagging_loss=0.01524, over 25000.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 4946394.48 frames. ], batch size: 100, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:34:12,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=816000.0, ans=0.125 2023-12-22 22:34:26,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=816133.3333333334, ans=0.1 2023-12-22 22:34:35,874 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=16.76 vs. limit=15.0 2023-12-22 22:34:53,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=816266.6666666666, ans=0.2 2023-12-22 22:34:55,459 INFO [train.py:886] (1/4) Epoch 26, batch 3300, loss[loss=0.01419, audio_tagging_loss=0.01419, over 25000.00 frames. ], tot_loss[loss=0.01294, audio_tagging_loss=0.01294, over 4944773.47 frames. ], batch size: 100, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:35:00,013 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.721e+01 3.062e+01 3.233e+01 3.383e+01 3.983e+01, threshold=6.466e+01, percent-clipped=0.0 2023-12-22 22:35:10,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=816400.0, ans=0.125 2023-12-22 22:35:12,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=816400.0, ans=0.1 2023-12-22 22:35:14,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=816400.0, ans=0.0 2023-12-22 22:35:28,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=816533.3333333334, ans=0.0 2023-12-22 22:35:40,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=816600.0, ans=0.0 2023-12-22 22:35:41,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=816600.0, ans=0.125 2023-12-22 22:35:47,755 INFO [train.py:886] (1/4) Epoch 26, batch 3350, loss[loss=0.0133, audio_tagging_loss=0.0133, over 25000.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4947858.61 frames. ], batch size: 100, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:35:58,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=816733.3333333334, ans=0.1 2023-12-22 22:36:26,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=816866.6666666666, ans=0.2 2023-12-22 22:36:27,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=816866.6666666666, ans=0.1 2023-12-22 22:36:36,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=816933.3333333334, ans=0.0 2023-12-22 22:36:39,805 INFO [train.py:886] (1/4) Epoch 26, batch 3400, loss[loss=0.01084, audio_tagging_loss=0.01084, over 25000.00 frames. ], tot_loss[loss=0.01298, audio_tagging_loss=0.01298, over 4948473.99 frames. ], batch size: 100, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:36:44,327 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.878e+01 3.135e+01 3.242e+01 3.478e+01 3.841e+01, threshold=6.483e+01, percent-clipped=0.0 2023-12-22 22:36:48,799 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.51 vs. limit=15.0 2023-12-22 22:37:31,878 INFO [train.py:886] (1/4) Epoch 26, batch 3450, loss[loss=0.01485, audio_tagging_loss=0.01485, over 24750.00 frames. ], tot_loss[loss=0.01301, audio_tagging_loss=0.01301, over 4941415.96 frames. ], batch size: 99, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:37:42,627 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.89 vs. limit=15.0 2023-12-22 22:38:02,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=817533.3333333334, ans=0.125 2023-12-22 22:38:23,645 INFO [train.py:886] (1/4) Epoch 26, batch 3500, loss[loss=0.01128, audio_tagging_loss=0.01128, over 25000.00 frames. ], tot_loss[loss=0.01303, audio_tagging_loss=0.01303, over 4942443.29 frames. ], batch size: 100, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:38:28,154 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.949e+01 3.137e+01 3.263e+01 3.427e+01 4.088e+01, threshold=6.525e+01, percent-clipped=0.0 2023-12-22 22:38:37,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=817733.3333333334, ans=0.09899494936611666 2023-12-22 22:38:38,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=817733.3333333334, ans=0.0 2023-12-22 22:38:39,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=817733.3333333334, ans=0.2 2023-12-22 22:38:40,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=817733.3333333334, ans=0.0 2023-12-22 22:38:40,843 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.88 vs. limit=15.0 2023-12-22 22:38:48,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=817800.0, ans=0.0 2023-12-22 22:39:06,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=817933.3333333334, ans=0.0 2023-12-22 22:39:13,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=817933.3333333334, ans=0.125 2023-12-22 22:39:15,355 INFO [train.py:886] (1/4) Epoch 26, batch 3550, loss[loss=0.01131, audio_tagging_loss=0.01131, over 24750.00 frames. ], tot_loss[loss=0.01292, audio_tagging_loss=0.01292, over 4941346.29 frames. ], batch size: 99, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:39:22,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=818000.0, ans=0.2 2023-12-22 22:39:40,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=818133.3333333334, ans=0.125 2023-12-22 22:39:58,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=818266.6666666666, ans=0.0 2023-12-22 22:39:59,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=818266.6666666666, ans=0.125 2023-12-22 22:40:08,440 INFO [train.py:886] (1/4) Epoch 26, batch 3600, loss[loss=0.01335, audio_tagging_loss=0.01335, over 24750.00 frames. ], tot_loss[loss=0.01285, audio_tagging_loss=0.01285, over 4936889.91 frames. ], batch size: 99, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:40:12,304 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.685e+01 3.075e+01 3.238e+01 3.372e+01 3.764e+01, threshold=6.477e+01, percent-clipped=0.0 2023-12-22 22:40:15,620 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.92 vs. limit=15.0 2023-12-22 22:40:15,955 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.21 vs. limit=10.0 2023-12-22 22:40:16,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=818333.3333333334, ans=0.0 2023-12-22 22:40:31,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=818466.6666666666, ans=0.1 2023-12-22 22:40:55,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=818600.0, ans=0.0 2023-12-22 22:40:56,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=818600.0, ans=0.125 2023-12-22 22:41:00,183 INFO [train.py:886] (1/4) Epoch 26, batch 3650, loss[loss=0.01588, audio_tagging_loss=0.01588, over 25000.00 frames. ], tot_loss[loss=0.01284, audio_tagging_loss=0.01284, over 4941897.53 frames. ], batch size: 100, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:41:08,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=818666.6666666666, ans=0.0 2023-12-22 22:41:10,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=818733.3333333334, ans=0.04949747468305833 2023-12-22 22:41:14,829 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:41:16,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=818733.3333333334, ans=0.035 2023-12-22 22:41:24,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=818800.0, ans=0.0 2023-12-22 22:41:24,976 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.51 vs. limit=15.0 2023-12-22 22:41:31,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.46 vs. limit=15.0 2023-12-22 22:41:33,972 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.75 vs. limit=22.5 2023-12-22 22:41:34,925 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.25 vs. limit=15.0 2023-12-22 22:41:52,792 INFO [train.py:886] (1/4) Epoch 26, batch 3700, loss[loss=0.01277, audio_tagging_loss=0.01277, over 24750.00 frames. ], tot_loss[loss=0.01281, audio_tagging_loss=0.01281, over 4948059.85 frames. ], batch size: 99, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:41:52,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=819000.0, ans=0.125 2023-12-22 22:41:56,506 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.693e+01 3.107e+01 3.222e+01 3.395e+01 4.051e+01, threshold=6.444e+01, percent-clipped=0.0 2023-12-22 22:41:57,807 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.11 vs. limit=15.0 2023-12-22 22:42:00,401 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.21 vs. limit=15.0 2023-12-22 22:42:06,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=819066.6666666666, ans=0.2 2023-12-22 22:42:20,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=819133.3333333334, ans=0.04949747468305833 2023-12-22 22:42:39,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=819266.6666666666, ans=0.2 2023-12-22 22:42:43,625 INFO [train.py:886] (1/4) Epoch 26, batch 3750, loss[loss=0.01404, audio_tagging_loss=0.01404, over 24750.00 frames. ], tot_loss[loss=0.01293, audio_tagging_loss=0.01293, over 4949351.70 frames. ], batch size: 99, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:42:48,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.67 vs. limit=15.0 2023-12-22 22:43:07,842 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.33 vs. limit=15.0 2023-12-22 22:43:08,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=819466.6666666666, ans=0.2 2023-12-22 22:43:18,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=819533.3333333334, ans=0.125 2023-12-22 22:43:23,314 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.61 vs. limit=15.0 2023-12-22 22:43:24,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=819533.3333333334, ans=0.0 2023-12-22 22:43:35,752 INFO [train.py:886] (1/4) Epoch 26, batch 3800, loss[loss=0.01271, audio_tagging_loss=0.01271, over 24750.00 frames. ], tot_loss[loss=0.01298, audio_tagging_loss=0.01298, over 4940802.50 frames. ], batch size: 99, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:43:39,547 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.766e+01 3.124e+01 3.288e+01 3.411e+01 4.142e+01, threshold=6.577e+01, percent-clipped=0.0 2023-12-22 22:43:50,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=819733.3333333334, ans=0.2 2023-12-22 22:43:52,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=819733.3333333334, ans=0.1 2023-12-22 22:43:55,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=819733.3333333334, ans=0.125 2023-12-22 22:44:00,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=819800.0, ans=0.2 2023-12-22 22:44:28,476 INFO [train.py:886] (1/4) Epoch 26, batch 3850, loss[loss=0.01242, audio_tagging_loss=0.01242, over 25000.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4939066.81 frames. ], batch size: 100, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:44:35,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.84 vs. limit=22.5 2023-12-22 22:44:48,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=820133.3333333334, ans=0.1 2023-12-22 22:44:59,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=820200.0, ans=0.125 2023-12-22 22:45:19,633 INFO [train.py:886] (1/4) Epoch 26, batch 3900, loss[loss=0.01321, audio_tagging_loss=0.01321, over 25000.00 frames. ], tot_loss[loss=0.01294, audio_tagging_loss=0.01294, over 4933914.27 frames. ], batch size: 100, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:45:23,398 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.803e+01 3.084e+01 3.268e+01 3.397e+01 4.255e+01, threshold=6.537e+01, percent-clipped=0.0 2023-12-22 22:45:32,709 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.56 vs. limit=12.0 2023-12-22 22:45:35,459 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.86 vs. limit=15.0 2023-12-22 22:46:10,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=820600.0, ans=0.0 2023-12-22 22:46:11,693 INFO [train.py:886] (1/4) Epoch 26, batch 3950, loss[loss=0.01273, audio_tagging_loss=0.01273, over 25000.00 frames. ], tot_loss[loss=0.01293, audio_tagging_loss=0.01293, over 4944116.84 frames. ], batch size: 100, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:46:27,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=820733.3333333334, ans=0.0 2023-12-22 22:46:54,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=820933.3333333334, ans=0.125 2023-12-22 22:46:57,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.05 vs. limit=15.0 2023-12-22 22:47:03,351 INFO [train.py:886] (1/4) Epoch 26, batch 4000, loss[loss=0.01032, audio_tagging_loss=0.01032, over 24750.00 frames. ], tot_loss[loss=0.01295, audio_tagging_loss=0.01295, over 4949499.65 frames. ], batch size: 99, lr: 4.13e-03, grad_scale: 128.0 2023-12-22 22:47:07,818 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.820e+01 3.131e+01 3.262e+01 3.397e+01 4.571e+01, threshold=6.525e+01, percent-clipped=0.0 2023-12-22 22:47:17,454 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.06 vs. limit=22.5 2023-12-22 22:47:20,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=821066.6666666666, ans=0.125 2023-12-22 22:47:21,186 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.35 vs. limit=15.0 2023-12-22 22:47:36,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=821200.0, ans=0.125 2023-12-22 22:47:41,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=821200.0, ans=0.04949747468305833 2023-12-22 22:47:50,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=821266.6666666666, ans=0.125 2023-12-22 22:47:55,606 INFO [train.py:886] (1/4) Epoch 26, batch 4050, loss[loss=0.01416, audio_tagging_loss=0.01416, over 24750.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 4950974.71 frames. ], batch size: 99, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:47:58,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=821333.3333333334, ans=0.125 2023-12-22 22:48:08,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=821400.0, ans=0.2 2023-12-22 22:48:13,724 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.10 vs. limit=15.0 2023-12-22 22:48:24,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=821466.6666666666, ans=0.02 2023-12-22 22:48:26,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=821533.3333333334, ans=0.2 2023-12-22 22:48:36,716 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:48:37,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=821600.0, ans=0.0 2023-12-22 22:48:42,484 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.81 vs. limit=22.5 2023-12-22 22:48:47,406 INFO [train.py:886] (1/4) Epoch 26, batch 4100, loss[loss=0.01268, audio_tagging_loss=0.01268, over 24750.00 frames. ], tot_loss[loss=0.01305, audio_tagging_loss=0.01305, over 4946377.73 frames. ], batch size: 99, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:48:49,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=821666.6666666666, ans=0.0 2023-12-22 22:48:51,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=821666.6666666666, ans=0.125 2023-12-22 22:48:52,857 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.695e+01 3.137e+01 3.284e+01 3.434e+01 4.127e+01, threshold=6.569e+01, percent-clipped=0.0 2023-12-22 22:49:09,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=821800.0, ans=0.0 2023-12-22 22:49:12,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=821800.0, ans=0.0 2023-12-22 22:49:30,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.37 vs. limit=10.0 2023-12-22 22:49:38,962 INFO [train.py:886] (1/4) Epoch 26, batch 4150, loss[loss=0.01497, audio_tagging_loss=0.01497, over 25000.00 frames. ], tot_loss[loss=0.01302, audio_tagging_loss=0.01302, over 4946094.91 frames. ], batch size: 100, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:49:56,934 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2023-12-22 22:50:06,968 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.04 vs. limit=15.0 2023-12-22 22:50:14,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=822200.0, ans=0.125 2023-12-22 22:50:17,484 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=15.0 2023-12-22 22:50:22,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=822266.6666666666, ans=0.0 2023-12-22 22:50:23,802 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.87 vs. limit=10.0 2023-12-22 22:50:32,185 INFO [train.py:886] (1/4) Epoch 26, batch 4200, loss[loss=0.0151, audio_tagging_loss=0.0151, over 25000.00 frames. ], tot_loss[loss=0.01309, audio_tagging_loss=0.01309, over 4945422.95 frames. ], batch size: 100, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:50:36,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=822333.3333333334, ans=0.09899494936611666 2023-12-22 22:50:37,014 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.765e+01 3.069e+01 3.218e+01 3.405e+01 4.073e+01, threshold=6.437e+01, percent-clipped=0.0 2023-12-22 22:51:08,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=822533.3333333334, ans=0.125 2023-12-22 22:51:13,308 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.00 vs. limit=6.0 2023-12-22 22:51:18,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=822600.0, ans=0.1 2023-12-22 22:51:23,438 INFO [train.py:886] (1/4) Epoch 26, batch 4250, loss[loss=0.01316, audio_tagging_loss=0.01316, over 24750.00 frames. ], tot_loss[loss=0.01298, audio_tagging_loss=0.01298, over 4948733.72 frames. ], batch size: 99, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:51:48,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=822800.0, ans=0.1 2023-12-22 22:51:49,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=822800.0, ans=0.0 2023-12-22 22:51:56,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=822866.6666666666, ans=0.1 2023-12-22 22:52:14,951 INFO [train.py:886] (1/4) Epoch 26, batch 4300, loss[loss=0.01253, audio_tagging_loss=0.01253, over 25000.00 frames. ], tot_loss[loss=0.01294, audio_tagging_loss=0.01294, over 4950738.49 frames. ], batch size: 100, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:52:19,682 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.718e+01 3.099e+01 3.212e+01 3.358e+01 3.889e+01, threshold=6.424e+01, percent-clipped=0.0 2023-12-22 22:52:21,316 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.05 vs. limit=15.0 2023-12-22 22:52:24,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=823066.6666666666, ans=0.2 2023-12-22 22:52:28,445 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.64 vs. limit=15.0 2023-12-22 22:52:53,275 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.52 vs. limit=15.0 2023-12-22 22:52:54,919 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:53:06,680 INFO [train.py:886] (1/4) Epoch 26, batch 4350, loss[loss=0.007682, audio_tagging_loss=0.007682, over 24056.00 frames. ], tot_loss[loss=0.01301, audio_tagging_loss=0.01301, over 4952558.18 frames. ], batch size: 100, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:53:23,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=823400.0, ans=0.125 2023-12-22 22:53:24,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=823400.0, ans=0.125 2023-12-22 22:53:28,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=823466.6666666666, ans=0.2 2023-12-22 22:53:32,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=823466.6666666666, ans=0.125 2023-12-22 22:53:37,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=823533.3333333334, ans=0.125 2023-12-22 22:53:45,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=823533.3333333334, ans=10.0 2023-12-22 22:53:54,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=823600.0, ans=0.0 2023-12-22 22:53:59,350 INFO [train.py:886] (1/4) Epoch 26, batch 4400, loss[loss=0.01172, audio_tagging_loss=0.01172, over 25000.00 frames. ], tot_loss[loss=0.013, audio_tagging_loss=0.013, over 4947942.48 frames. ], batch size: 100, lr: 4.12e-03, grad_scale: 64.0 2023-12-22 22:54:04,078 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.798e+01 3.224e+01 3.341e+01 3.511e+01 3.923e+01, threshold=6.682e+01, percent-clipped=0.0 2023-12-22 22:54:11,794 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.09 vs. limit=6.0 2023-12-22 22:54:19,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=823800.0, ans=0.1 2023-12-22 22:54:19,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=823800.0, ans=0.95 2023-12-22 22:54:28,274 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.15 vs. limit=12.0 2023-12-22 22:54:30,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=823866.6666666666, ans=0.2 2023-12-22 22:54:34,838 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.68 vs. limit=15.0 2023-12-22 22:54:39,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=823933.3333333334, ans=0.125 2023-12-22 22:54:47,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=823933.3333333334, ans=0.1 2023-12-22 22:54:51,661 INFO [train.py:886] (1/4) Epoch 26, batch 4450, loss[loss=0.01152, audio_tagging_loss=0.01152, over 25000.00 frames. ], tot_loss[loss=0.01312, audio_tagging_loss=0.01312, over 4944293.55 frames. ], batch size: 100, lr: 4.12e-03, grad_scale: 64.0 2023-12-22 22:54:56,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=824000.0, ans=0.125 2023-12-22 22:54:57,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=824000.0, ans=0.125 2023-12-22 22:55:04,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=824066.6666666666, ans=0.0 2023-12-22 22:55:05,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=824066.6666666666, ans=0.0 2023-12-22 22:55:20,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=824133.3333333334, ans=0.125 2023-12-22 22:55:43,464 INFO [train.py:886] (1/4) Epoch 26, batch 4500, loss[loss=0.01113, audio_tagging_loss=0.01113, over 24750.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 4949251.29 frames. ], batch size: 99, lr: 4.12e-03, grad_scale: 64.0 2023-12-22 22:55:48,315 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.836e+01 3.106e+01 3.253e+01 3.392e+01 3.736e+01, threshold=6.505e+01, percent-clipped=0.0 2023-12-22 22:55:54,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=824400.0, ans=0.0 2023-12-22 22:55:56,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.26 vs. limit=15.0 2023-12-22 22:56:06,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=824466.6666666666, ans=0.04949747468305833 2023-12-22 22:56:17,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=824533.3333333334, ans=0.0 2023-12-22 22:56:24,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=824600.0, ans=0.125 2023-12-22 22:56:35,058 INFO [train.py:886] (1/4) Epoch 26, batch 4550, loss[loss=0.01529, audio_tagging_loss=0.01529, over 25000.00 frames. ], tot_loss[loss=0.01292, audio_tagging_loss=0.01292, over 4955345.84 frames. ], batch size: 100, lr: 4.12e-03, grad_scale: 64.0 2023-12-22 22:56:36,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=824666.6666666666, ans=0.125 2023-12-22 22:56:50,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=824733.3333333334, ans=0.2 2023-12-22 22:56:53,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=824733.3333333334, ans=0.125 2023-12-22 22:56:56,342 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:57:02,565 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.59 vs. limit=15.0 2023-12-22 22:57:17,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=824933.3333333334, ans=0.125 2023-12-22 22:57:20,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.61 vs. limit=15.0 2023-12-22 22:57:26,767 INFO [train.py:886] (1/4) Epoch 26, batch 4600, loss[loss=0.01454, audio_tagging_loss=0.01454, over 24895.00 frames. ], tot_loss[loss=0.01299, audio_tagging_loss=0.01299, over 4950098.28 frames. ], batch size: 100, lr: 4.12e-03, grad_scale: 64.0 2023-12-22 22:57:32,097 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.753e+01 3.119e+01 3.242e+01 3.395e+01 3.973e+01, threshold=6.484e+01, percent-clipped=0.0 2023-12-22 22:57:33,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=825000.0, ans=0.0 2023-12-22 22:57:33,457 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.08 vs. limit=15.0 2023-12-22 22:57:38,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=825066.6666666666, ans=0.0 2023-12-22 22:58:00,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=825200.0, ans=0.0 2023-12-22 22:58:19,445 INFO [train.py:886] (1/4) Epoch 26, batch 4650, loss[loss=0.01087, audio_tagging_loss=0.01087, over 25000.00 frames. ], tot_loss[loss=0.01293, audio_tagging_loss=0.01293, over 4956356.56 frames. ], batch size: 100, lr: 4.12e-03, grad_scale: 64.0 2023-12-22 22:58:30,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=825400.0, ans=0.05 2023-12-22 22:58:38,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=825466.6666666666, ans=0.125 2023-12-22 22:58:47,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=825466.6666666666, ans=0.2 2023-12-22 22:58:59,064 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:59:06,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=825600.0, ans=0.0 2023-12-22 22:59:09,847 INFO [train.py:886] (1/4) Epoch 26, batch 4700, loss[loss=0.01317, audio_tagging_loss=0.01317, over 24750.00 frames. ], tot_loss[loss=0.01296, audio_tagging_loss=0.01296, over 4954600.80 frames. ], batch size: 99, lr: 4.12e-03, grad_scale: 64.0 2023-12-22 22:59:14,958 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.846e+01 3.179e+01 3.317e+01 3.439e+01 3.833e+01, threshold=6.634e+01, percent-clipped=0.0 2023-12-22 22:59:17,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=825666.6666666666, ans=0.125 2023-12-22 22:59:20,743 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:59:21,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=825733.3333333334, ans=0.0 2023-12-22 22:59:23,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=825733.3333333334, ans=0.125 2023-12-22 22:59:32,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=825800.0, ans=0.125 2023-12-22 22:59:53,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=825933.3333333334, ans=0.0 2023-12-22 22:59:57,204 INFO [train.py:886] (1/4) Epoch 26, batch 4750, loss[loss=0.0129, audio_tagging_loss=0.0129, over 24750.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4945840.22 frames. ], batch size: 99, lr: 4.12e-03, grad_scale: 64.0 2023-12-22 23:00:32,997 INFO [train.py:886] (1/4) Epoch 27, batch 0, loss[loss=0.03733, audio_tagging_loss=0.03733, over 20602.00 frames. ], tot_loss[loss=0.03733, audio_tagging_loss=0.03733, over 20602.00 frames. ], batch size: 107, lr: 4.04e-03, grad_scale: 32.0 2023-12-22 23:00:32,997 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 23:00:46,039 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3288, 4.5559, 5.2031, 4.7104], device='cuda:1') 2023-12-22 23:00:51,750 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5581, 3.9777, 4.0860, 3.5159], device='cuda:1') 2023-12-22 23:00:53,970 INFO [train.py:917] (1/4) Epoch 27, validation: loss=0.03314, audio_tagging_loss=0.03314, over 3737520.00 frames. 2023-12-22 23:00:53,970 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 23:00:55,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=826106.6666666666, ans=0.125 2023-12-22 23:00:58,062 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.21 vs. limit=15.0 2023-12-22 23:01:22,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=826240.0, ans=0.07 2023-12-22 23:01:27,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=826306.6666666666, ans=0.2 2023-12-22 23:01:31,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=826306.6666666666, ans=0.0 2023-12-22 23:01:33,168 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.42 vs. limit=22.5 2023-12-22 23:01:33,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=826306.6666666666, ans=0.0 2023-12-22 23:01:35,331 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.888e+01 3.296e+01 3.604e+01 4.648e+01 9.057e+01, threshold=7.208e+01, percent-clipped=9.0 2023-12-22 23:01:44,720 INFO [train.py:886] (1/4) Epoch 27, batch 50, loss[loss=0.01703, audio_tagging_loss=0.01703, over 25000.00 frames. ], tot_loss[loss=0.02035, audio_tagging_loss=0.02035, over 1119541.42 frames. ], batch size: 100, lr: 4.04e-03, grad_scale: 32.0 2023-12-22 23:01:53,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=826440.0, ans=0.07 2023-12-22 23:02:03,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=826506.6666666666, ans=0.2 2023-12-22 23:02:03,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=826506.6666666666, ans=0.1 2023-12-22 23:02:15,154 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.26 vs. limit=22.5 2023-12-22 23:02:22,407 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.80 vs. limit=15.0 2023-12-22 23:02:31,099 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.25 vs. limit=12.0 2023-12-22 23:02:39,488 INFO [train.py:886] (1/4) Epoch 27, batch 100, loss[loss=0.01462, audio_tagging_loss=0.01462, over 25000.00 frames. ], tot_loss[loss=0.018, audio_tagging_loss=0.018, over 1977111.01 frames. ], batch size: 100, lr: 4.04e-03, grad_scale: 32.0 2023-12-22 23:02:45,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=826773.3333333334, ans=0.0 2023-12-22 23:02:46,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=826773.3333333334, ans=0.0 2023-12-22 23:02:54,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=826840.0, ans=0.125 2023-12-22 23:02:57,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=826906.6666666666, ans=0.05 2023-12-22 23:03:10,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=826973.3333333334, ans=0.125 2023-12-22 23:03:12,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=826973.3333333334, ans=0.1 2023-12-22 23:03:12,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.16 vs. limit=15.0 2023-12-22 23:03:19,781 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.044e+01 3.375e+01 3.585e+01 3.841e+01 4.539e+01, threshold=7.169e+01, percent-clipped=0.0 2023-12-22 23:03:28,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=827106.6666666666, ans=0.125 2023-12-22 23:03:29,224 INFO [train.py:886] (1/4) Epoch 27, batch 150, loss[loss=0.0146, audio_tagging_loss=0.0146, over 25000.00 frames. ], tot_loss[loss=0.01638, audio_tagging_loss=0.01638, over 2638926.18 frames. ], batch size: 100, lr: 4.04e-03, grad_scale: 32.0 2023-12-22 23:03:52,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=827240.0, ans=0.2 2023-12-22 23:03:56,242 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.35 vs. limit=15.0 2023-12-22 23:04:08,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=827306.6666666666, ans=0.1 2023-12-22 23:04:13,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=827373.3333333334, ans=0.125 2023-12-22 23:04:15,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=827373.3333333334, ans=0.07 2023-12-22 23:04:21,168 INFO [train.py:886] (1/4) Epoch 27, batch 200, loss[loss=0.01485, audio_tagging_loss=0.01485, over 24750.00 frames. ], tot_loss[loss=0.01519, audio_tagging_loss=0.01519, over 3160415.23 frames. ], batch size: 99, lr: 4.04e-03, grad_scale: 32.0 2023-12-22 23:04:23,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=827440.0, ans=0.2 2023-12-22 23:04:37,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=827506.6666666666, ans=0.0 2023-12-22 23:04:40,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=827573.3333333334, ans=0.0 2023-12-22 23:04:45,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=827573.3333333334, ans=0.125 2023-12-22 23:05:01,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=827706.6666666666, ans=0.04949747468305833 2023-12-22 23:05:02,031 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.845e+01 3.150e+01 3.268e+01 3.453e+01 3.797e+01, threshold=6.536e+01, percent-clipped=0.0 2023-12-22 23:05:05,490 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.64 vs. limit=15.0 2023-12-22 23:05:12,180 INFO [train.py:886] (1/4) Epoch 27, batch 250, loss[loss=0.01309, audio_tagging_loss=0.01309, over 25000.00 frames. ], tot_loss[loss=0.01457, audio_tagging_loss=0.01457, over 3559590.30 frames. ], batch size: 100, lr: 4.04e-03, grad_scale: 32.0 2023-12-22 23:05:13,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=827773.3333333334, ans=0.125 2023-12-22 23:05:46,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=827973.3333333334, ans=0.125 2023-12-22 23:05:47,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=827973.3333333334, ans=0.125 2023-12-22 23:05:50,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=827973.3333333334, ans=12.0 2023-12-22 23:05:52,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=827973.3333333334, ans=0.2 2023-12-22 23:05:57,355 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 23:05:58,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=828040.0, ans=0.125 2023-12-22 23:06:01,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=828040.0, ans=0.125 2023-12-22 23:06:04,756 INFO [train.py:886] (1/4) Epoch 27, batch 300, loss[loss=0.01517, audio_tagging_loss=0.01517, over 24750.00 frames. ], tot_loss[loss=0.01421, audio_tagging_loss=0.01421, over 3866553.54 frames. ], batch size: 99, lr: 4.04e-03, grad_scale: 32.0 2023-12-22 23:06:45,461 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.889e+01 3.171e+01 3.297e+01 3.491e+01 3.918e+01, threshold=6.593e+01, percent-clipped=0.0 2023-12-22 23:06:51,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=828373.3333333334, ans=0.09899494936611666 2023-12-22 23:06:56,637 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.60 vs. limit=6.0 2023-12-22 23:06:57,086 INFO [train.py:886] (1/4) Epoch 27, batch 350, loss[loss=0.01386, audio_tagging_loss=0.01386, over 25000.00 frames. ], tot_loss[loss=0.01395, audio_tagging_loss=0.01395, over 4107558.38 frames. ], batch size: 100, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:07:22,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=828573.3333333334, ans=0.0 2023-12-22 23:07:22,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=828573.3333333334, ans=0.125 2023-12-22 23:07:24,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=828573.3333333334, ans=0.125 2023-12-22 23:07:30,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=828640.0, ans=0.125 2023-12-22 23:07:44,061 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.34 vs. limit=10.0 2023-12-22 23:07:47,934 INFO [train.py:886] (1/4) Epoch 27, batch 400, loss[loss=0.01198, audio_tagging_loss=0.01198, over 22168.00 frames. ], tot_loss[loss=0.01361, audio_tagging_loss=0.01361, over 4290004.67 frames. ], batch size: 107, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:07:50,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=828773.3333333334, ans=15.0 2023-12-22 23:07:54,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=828773.3333333334, ans=0.025 2023-12-22 23:08:11,453 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=15.0 2023-12-22 23:08:22,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=828973.3333333334, ans=0.0 2023-12-22 23:08:30,060 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.745e+01 3.115e+01 3.244e+01 3.373e+01 3.819e+01, threshold=6.489e+01, percent-clipped=0.0 2023-12-22 23:08:40,205 INFO [train.py:886] (1/4) Epoch 27, batch 450, loss[loss=0.01084, audio_tagging_loss=0.01084, over 25000.00 frames. ], tot_loss[loss=0.01343, audio_tagging_loss=0.01343, over 4441075.40 frames. ], batch size: 100, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:08:50,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=829173.3333333334, ans=0.0 2023-12-22 23:08:56,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=829173.3333333334, ans=0.0 2023-12-22 23:09:03,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=829240.0, ans=0.07 2023-12-22 23:09:22,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=829373.3333333334, ans=0.0 2023-12-22 23:09:31,752 INFO [train.py:886] (1/4) Epoch 27, batch 500, loss[loss=0.01662, audio_tagging_loss=0.01662, over 24750.00 frames. ], tot_loss[loss=0.0132, audio_tagging_loss=0.0132, over 4553320.02 frames. ], batch size: 99, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:09:31,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=829440.0, ans=0.2 2023-12-22 23:09:37,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=829440.0, ans=0.125 2023-12-22 23:09:44,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=829506.6666666666, ans=0.125 2023-12-22 23:10:01,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=829640.0, ans=0.125 2023-12-22 23:10:01,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=829640.0, ans=0.0 2023-12-22 23:10:01,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=829640.0, ans=0.95 2023-12-22 23:10:07,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=829640.0, ans=0.0 2023-12-22 23:10:13,347 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.842e+01 3.068e+01 3.193e+01 3.340e+01 3.913e+01, threshold=6.386e+01, percent-clipped=0.0 2023-12-22 23:10:16,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=829706.6666666666, ans=0.125 2023-12-22 23:10:20,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=829706.6666666666, ans=0.0 2023-12-22 23:10:23,521 INFO [train.py:886] (1/4) Epoch 27, batch 550, loss[loss=0.01433, audio_tagging_loss=0.01433, over 25000.00 frames. ], tot_loss[loss=0.01323, audio_tagging_loss=0.01323, over 4640771.59 frames. ], batch size: 100, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:10:43,242 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.00 vs. limit=22.5 2023-12-22 23:10:45,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=829906.6666666666, ans=0.0 2023-12-22 23:10:50,517 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 23:10:53,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=829906.6666666666, ans=0.0 2023-12-22 23:11:10,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=830040.0, ans=0.125 2023-12-22 23:11:13,615 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.18 vs. limit=22.5 2023-12-22 23:11:15,953 INFO [train.py:886] (1/4) Epoch 27, batch 600, loss[loss=0.01085, audio_tagging_loss=0.01085, over 21158.00 frames. ], tot_loss[loss=0.01319, audio_tagging_loss=0.01319, over 4708314.32 frames. ], batch size: 107, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:11:57,273 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.845e+01 3.188e+01 3.320e+01 3.491e+01 4.138e+01, threshold=6.639e+01, percent-clipped=0.0 2023-12-22 23:12:07,461 INFO [train.py:886] (1/4) Epoch 27, batch 650, loss[loss=0.01597, audio_tagging_loss=0.01597, over 24750.00 frames. ], tot_loss[loss=0.0132, audio_tagging_loss=0.0132, over 4761740.24 frames. ], batch size: 99, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:12:15,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=830440.0, ans=0.0 2023-12-22 23:12:16,717 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.85 vs. limit=15.0 2023-12-22 23:12:20,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=830506.6666666666, ans=0.09899494936611666 2023-12-22 23:12:27,013 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.97 vs. limit=15.0 2023-12-22 23:12:32,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=830573.3333333334, ans=15.0 2023-12-22 23:13:00,472 INFO [train.py:886] (1/4) Epoch 27, batch 700, loss[loss=0.01131, audio_tagging_loss=0.01131, over 24093.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 4800779.08 frames. ], batch size: 100, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:13:41,261 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.788e+01 3.146e+01 3.254e+01 3.417e+01 3.974e+01, threshold=6.508e+01, percent-clipped=0.0 2023-12-22 23:13:52,991 INFO [train.py:886] (1/4) Epoch 27, batch 750, loss[loss=0.01475, audio_tagging_loss=0.01475, over 24750.00 frames. ], tot_loss[loss=0.01305, audio_tagging_loss=0.01305, over 4835564.40 frames. ], batch size: 99, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:13:55,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=831106.6666666666, ans=0.0 2023-12-22 23:13:55,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=831106.6666666666, ans=0.2 2023-12-22 23:14:29,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=831306.6666666666, ans=0.125 2023-12-22 23:14:33,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=831373.3333333334, ans=0.125 2023-12-22 23:14:35,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=831373.3333333334, ans=0.2 2023-12-22 23:14:44,365 INFO [train.py:886] (1/4) Epoch 27, batch 800, loss[loss=0.01313, audio_tagging_loss=0.01313, over 25000.00 frames. ], tot_loss[loss=0.01298, audio_tagging_loss=0.01298, over 4865986.88 frames. ], batch size: 100, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:15:06,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=831573.3333333334, ans=0.0 2023-12-22 23:15:22,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=831640.0, ans=0.125 2023-12-22 23:15:23,570 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.38 vs. limit=10.0 2023-12-22 23:15:26,459 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.771e+01 3.100e+01 3.260e+01 3.401e+01 4.213e+01, threshold=6.521e+01, percent-clipped=0.0 2023-12-22 23:15:31,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=831706.6666666666, ans=10.0 2023-12-22 23:15:32,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=831706.6666666666, ans=0.2 2023-12-22 23:15:36,711 INFO [train.py:886] (1/4) Epoch 27, batch 850, loss[loss=0.01254, audio_tagging_loss=0.01254, over 25000.00 frames. ], tot_loss[loss=0.01295, audio_tagging_loss=0.01295, over 4887225.13 frames. ], batch size: 100, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:15:39,087 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.70 vs. limit=6.0 2023-12-22 23:15:45,883 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.56 vs. limit=15.0 2023-12-22 23:15:52,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=831840.0, ans=0.2 2023-12-22 23:15:55,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=831840.0, ans=0.125 2023-12-22 23:15:57,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=831906.6666666666, ans=0.0 2023-12-22 23:16:02,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=831906.6666666666, ans=0.125 2023-12-22 23:16:13,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=831973.3333333334, ans=0.2 2023-12-22 23:16:28,942 INFO [train.py:886] (1/4) Epoch 27, batch 900, loss[loss=0.01107, audio_tagging_loss=0.01107, over 21830.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4898510.67 frames. ], batch size: 107, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:16:51,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=832240.0, ans=0.125 2023-12-22 23:17:10,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=832373.3333333334, ans=0.125 2023-12-22 23:17:11,214 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.850e+01 3.162e+01 3.282e+01 3.411e+01 3.879e+01, threshold=6.564e+01, percent-clipped=0.0 2023-12-22 23:17:13,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=832373.3333333334, ans=0.0 2023-12-22 23:17:14,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=832373.3333333334, ans=0.0 2023-12-22 23:17:16,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=832373.3333333334, ans=0.125 2023-12-22 23:17:19,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=832373.3333333334, ans=0.125 2023-12-22 23:17:20,757 INFO [train.py:886] (1/4) Epoch 27, batch 950, loss[loss=0.01266, audio_tagging_loss=0.01266, over 24065.00 frames. ], tot_loss[loss=0.01306, audio_tagging_loss=0.01306, over 4904167.75 frames. ], batch size: 100, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:17:30,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=832506.6666666666, ans=0.2 2023-12-22 23:18:00,749 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.55 vs. limit=15.0 2023-12-22 23:18:13,711 INFO [train.py:886] (1/4) Epoch 27, batch 1000, loss[loss=0.01241, audio_tagging_loss=0.01241, over 24750.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4915006.99 frames. ], batch size: 99, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:18:31,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=832840.0, ans=0.0 2023-12-22 23:18:47,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=832973.3333333334, ans=0.125 2023-12-22 23:18:54,480 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.825e+01 3.109e+01 3.263e+01 3.461e+01 3.831e+01, threshold=6.526e+01, percent-clipped=0.0 2023-12-22 23:19:02,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=833040.0, ans=0.035 2023-12-22 23:19:02,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=833040.0, ans=0.125 2023-12-22 23:19:04,686 INFO [train.py:886] (1/4) Epoch 27, batch 1050, loss[loss=0.01181, audio_tagging_loss=0.01181, over 22167.00 frames. ], tot_loss[loss=0.01292, audio_tagging_loss=0.01292, over 4921181.53 frames. ], batch size: 107, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:19:11,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=833106.6666666666, ans=0.1 2023-12-22 23:19:17,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.50 vs. limit=6.0 2023-12-22 23:19:57,235 INFO [train.py:886] (1/4) Epoch 27, batch 1100, loss[loss=0.01313, audio_tagging_loss=0.01313, over 24750.00 frames. ], tot_loss[loss=0.01284, audio_tagging_loss=0.01284, over 4930410.07 frames. ], batch size: 99, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:20:00,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=833440.0, ans=0.0 2023-12-22 23:20:15,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=833506.6666666666, ans=0.1 2023-12-22 23:20:17,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=833573.3333333334, ans=0.5 2023-12-22 23:20:24,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=833573.3333333334, ans=0.1 2023-12-22 23:20:37,344 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.804e+01 3.138e+01 3.223e+01 3.425e+01 4.180e+01, threshold=6.446e+01, percent-clipped=0.0 2023-12-22 23:20:49,036 INFO [train.py:886] (1/4) Epoch 27, batch 1150, loss[loss=0.01347, audio_tagging_loss=0.01347, over 23983.00 frames. ], tot_loss[loss=0.01285, audio_tagging_loss=0.01285, over 4937143.09 frames. ], batch size: 100, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:21:08,231 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.62 vs. limit=10.0 2023-12-22 23:21:22,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=833973.3333333334, ans=0.5 2023-12-22 23:21:31,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=834040.0, ans=0.1 2023-12-22 23:21:40,010 INFO [train.py:886] (1/4) Epoch 27, batch 1200, loss[loss=0.01432, audio_tagging_loss=0.01432, over 25000.00 frames. ], tot_loss[loss=0.01293, audio_tagging_loss=0.01293, over 4942281.32 frames. ], batch size: 100, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:21:52,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=834173.3333333334, ans=0.125 2023-12-22 23:22:03,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=834240.0, ans=0.1 2023-12-22 23:22:05,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=834240.0, ans=0.0 2023-12-22 23:22:20,814 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.906e+01 3.144e+01 3.270e+01 3.430e+01 4.377e+01, threshold=6.540e+01, percent-clipped=0.0 2023-12-22 23:22:23,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=834373.3333333334, ans=0.2 2023-12-22 23:22:29,388 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.73 vs. limit=15.0 2023-12-22 23:22:32,521 INFO [train.py:886] (1/4) Epoch 27, batch 1250, loss[loss=0.01249, audio_tagging_loss=0.01249, over 24750.00 frames. ], tot_loss[loss=0.01299, audio_tagging_loss=0.01299, over 4943290.13 frames. ], batch size: 99, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:22:32,940 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.96 vs. limit=10.0 2023-12-22 23:22:45,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=834506.6666666666, ans=0.1 2023-12-22 23:22:48,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=834506.6666666666, ans=0.125 2023-12-22 23:22:51,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=834573.3333333334, ans=0.1 2023-12-22 23:23:07,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=834640.0, ans=0.125 2023-12-22 23:23:08,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=834640.0, ans=0.2 2023-12-22 23:23:10,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=834640.0, ans=0.0 2023-12-22 23:23:12,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=834640.0, ans=0.125 2023-12-22 23:23:15,254 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.70 vs. limit=15.0 2023-12-22 23:23:17,366 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.63 vs. limit=15.0 2023-12-22 23:23:17,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=834706.6666666666, ans=0.07 2023-12-22 23:23:21,066 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.81 vs. limit=22.5 2023-12-22 23:23:23,910 INFO [train.py:886] (1/4) Epoch 27, batch 1300, loss[loss=0.01238, audio_tagging_loss=0.01238, over 24750.00 frames. ], tot_loss[loss=0.01316, audio_tagging_loss=0.01316, over 4945214.18 frames. ], batch size: 99, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:23:33,432 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=5.45 vs. limit=15.0 2023-12-22 23:23:35,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=834840.0, ans=0.1 2023-12-22 23:23:45,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.30 vs. limit=22.5 2023-12-22 23:23:59,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=834973.3333333334, ans=0.1 2023-12-22 23:23:59,400 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.83 vs. limit=10.0 2023-12-22 23:24:04,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.28 vs. limit=15.0 2023-12-22 23:24:05,917 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.886e+01 3.125e+01 3.290e+01 3.447e+01 4.331e+01, threshold=6.579e+01, percent-clipped=0.0 2023-12-22 23:24:08,405 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.44 vs. limit=15.0 2023-12-22 23:24:15,427 INFO [train.py:886] (1/4) Epoch 27, batch 1350, loss[loss=0.0143, audio_tagging_loss=0.0143, over 25000.00 frames. ], tot_loss[loss=0.0131, audio_tagging_loss=0.0131, over 4944976.44 frames. ], batch size: 100, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:24:26,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=835173.3333333334, ans=0.0 2023-12-22 23:24:27,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=835173.3333333334, ans=0.125 2023-12-22 23:24:28,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=835173.3333333334, ans=0.2 2023-12-22 23:24:48,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=835306.6666666666, ans=0.125 2023-12-22 23:24:51,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=835306.6666666666, ans=0.2 2023-12-22 23:24:52,644 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.17 vs. limit=15.0 2023-12-22 23:24:53,603 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.83 vs. limit=12.0 2023-12-22 23:25:07,497 INFO [train.py:886] (1/4) Epoch 27, batch 1400, loss[loss=0.01057, audio_tagging_loss=0.01057, over 25000.00 frames. ], tot_loss[loss=0.01299, audio_tagging_loss=0.01299, over 4943671.19 frames. ], batch size: 100, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:25:12,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=835440.0, ans=0.1 2023-12-22 23:25:24,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=835506.6666666666, ans=0.07 2023-12-22 23:25:48,891 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.757e+01 3.050e+01 3.190e+01 3.364e+01 4.111e+01, threshold=6.380e+01, percent-clipped=0.0 2023-12-22 23:25:52,257 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.41 vs. limit=15.0 2023-12-22 23:25:58,449 INFO [train.py:886] (1/4) Epoch 27, batch 1450, loss[loss=0.01305, audio_tagging_loss=0.01305, over 25000.00 frames. ], tot_loss[loss=0.01288, audio_tagging_loss=0.01288, over 4941403.34 frames. ], batch size: 100, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:26:03,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=835773.3333333334, ans=0.05 2023-12-22 23:26:23,382 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.169e-01 2023-12-22 23:26:33,391 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.64 vs. limit=6.0 2023-12-22 23:26:34,191 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.07 vs. limit=15.0 2023-12-22 23:26:38,579 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 23:26:47,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=836040.0, ans=0.125 2023-12-22 23:26:51,928 INFO [train.py:886] (1/4) Epoch 27, batch 1500, loss[loss=0.01124, audio_tagging_loss=0.01124, over 25000.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4947240.21 frames. ], batch size: 100, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:27:17,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=836240.0, ans=0.0 2023-12-22 23:27:19,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=836240.0, ans=0.0 2023-12-22 23:27:32,533 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.805e+01 3.110e+01 3.282e+01 3.449e+01 4.134e+01, threshold=6.564e+01, percent-clipped=0.0 2023-12-22 23:27:42,852 INFO [train.py:886] (1/4) Epoch 27, batch 1550, loss[loss=0.01472, audio_tagging_loss=0.01472, over 24750.00 frames. ], tot_loss[loss=0.01296, audio_tagging_loss=0.01296, over 4949945.35 frames. ], batch size: 99, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:27:47,537 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.40 vs. limit=15.0 2023-12-22 23:28:01,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.28 vs. limit=15.0 2023-12-22 23:28:10,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=836573.3333333334, ans=0.125 2023-12-22 23:28:16,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=836640.0, ans=0.125 2023-12-22 23:28:22,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=836640.0, ans=0.1 2023-12-22 23:28:34,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=836773.3333333334, ans=0.0 2023-12-22 23:28:35,350 INFO [train.py:886] (1/4) Epoch 27, batch 1600, loss[loss=0.009262, audio_tagging_loss=0.009262, over 24750.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4945881.22 frames. ], batch size: 99, lr: 4.01e-03, grad_scale: 32.0 2023-12-22 23:28:37,867 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.19 vs. limit=12.0 2023-12-22 23:28:53,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=836840.0, ans=0.0 2023-12-22 23:29:03,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=836906.6666666666, ans=0.04949747468305833 2023-12-22 23:29:06,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=836973.3333333334, ans=0.0 2023-12-22 23:29:06,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=836973.3333333334, ans=0.125 2023-12-22 23:29:14,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.46 vs. limit=15.0 2023-12-22 23:29:16,723 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.807e+01 3.145e+01 3.260e+01 3.442e+01 4.134e+01, threshold=6.520e+01, percent-clipped=0.0 2023-12-22 23:29:27,604 INFO [train.py:886] (1/4) Epoch 27, batch 1650, loss[loss=0.01387, audio_tagging_loss=0.01387, over 25000.00 frames. ], tot_loss[loss=0.01301, audio_tagging_loss=0.01301, over 4948762.05 frames. ], batch size: 100, lr: 4.01e-03, grad_scale: 32.0 2023-12-22 23:30:10,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=837373.3333333334, ans=0.125 2023-12-22 23:30:19,112 INFO [train.py:886] (1/4) Epoch 27, batch 1700, loss[loss=0.01279, audio_tagging_loss=0.01279, over 25000.00 frames. ], tot_loss[loss=0.01295, audio_tagging_loss=0.01295, over 4943893.03 frames. ], batch size: 100, lr: 4.01e-03, grad_scale: 32.0 2023-12-22 23:30:46,564 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.21 vs. limit=15.0 2023-12-22 23:31:01,545 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.884e+01 3.145e+01 3.289e+01 3.408e+01 4.323e+01, threshold=6.579e+01, percent-clipped=0.0 2023-12-22 23:31:04,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=837706.6666666666, ans=0.125 2023-12-22 23:31:07,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=837706.6666666666, ans=0.125 2023-12-22 23:31:11,070 INFO [train.py:886] (1/4) Epoch 27, batch 1750, loss[loss=0.01298, audio_tagging_loss=0.01298, over 25000.00 frames. ], tot_loss[loss=0.01292, audio_tagging_loss=0.01292, over 4950517.51 frames. ], batch size: 100, lr: 4.01e-03, grad_scale: 32.0 2023-12-22 23:31:14,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=837773.3333333334, ans=0.2 2023-12-22 23:31:15,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=837773.3333333334, ans=0.125 2023-12-22 23:31:15,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=837773.3333333334, ans=0.1 2023-12-22 23:31:29,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=837840.0, ans=0.0 2023-12-22 23:31:30,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=837840.0, ans=0.125 2023-12-22 23:31:50,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=837973.3333333334, ans=0.0 2023-12-22 23:31:58,191 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.97 vs. limit=15.0 2023-12-22 23:32:03,068 INFO [train.py:886] (1/4) Epoch 27, batch 1800, loss[loss=0.01258, audio_tagging_loss=0.01258, over 25000.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4956268.83 frames. ], batch size: 100, lr: 4.01e-03, grad_scale: 32.0 2023-12-22 23:32:13,475 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.20 vs. limit=10.0 2023-12-22 23:32:13,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=838173.3333333334, ans=0.07 2023-12-22 23:32:14,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=838173.3333333334, ans=0.125 2023-12-22 23:32:16,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=838173.3333333334, ans=0.125 2023-12-22 23:32:16,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=838173.3333333334, ans=0.0 2023-12-22 23:32:24,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=838240.0, ans=0.1 2023-12-22 23:32:25,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.77 vs. limit=15.0 2023-12-22 23:32:27,082 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.30 vs. limit=15.0 2023-12-22 23:32:41,503 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.86 vs. limit=10.0 2023-12-22 23:32:43,911 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.896e+01 3.173e+01 3.284e+01 3.409e+01 4.176e+01, threshold=6.568e+01, percent-clipped=0.0 2023-12-22 23:32:54,022 INFO [train.py:886] (1/4) Epoch 27, batch 1850, loss[loss=0.01328, audio_tagging_loss=0.01328, over 24750.00 frames. ], tot_loss[loss=0.01296, audio_tagging_loss=0.01296, over 4952063.94 frames. ], batch size: 99, lr: 4.01e-03, grad_scale: 32.0 2023-12-22 23:33:10,205 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.30 vs. limit=15.0 2023-12-22 23:33:10,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=838506.6666666666, ans=0.125 2023-12-22 23:33:27,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=838640.0, ans=0.0 2023-12-22 23:33:45,749 INFO [train.py:886] (1/4) Epoch 27, batch 1900, loss[loss=0.01287, audio_tagging_loss=0.01287, over 21849.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 4938404.15 frames. ], batch size: 107, lr: 4.01e-03, grad_scale: 32.0 2023-12-22 23:33:49,097 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.16 vs. limit=22.5 2023-12-22 23:34:15,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=838973.3333333334, ans=0.1 2023-12-22 23:34:25,791 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.935e+01 3.098e+01 3.249e+01 3.476e+01 4.111e+01, threshold=6.498e+01, percent-clipped=0.0 2023-12-22 23:34:29,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=839040.0, ans=0.0 2023-12-22 23:34:36,046 INFO [train.py:886] (1/4) Epoch 27, batch 1950, loss[loss=0.01371, audio_tagging_loss=0.01371, over 25000.00 frames. ], tot_loss[loss=0.01299, audio_tagging_loss=0.01299, over 4939820.95 frames. ], batch size: 100, lr: 4.01e-03, grad_scale: 32.0 2023-12-22 23:34:45,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=839106.6666666666, ans=0.1 2023-12-22 23:34:58,850 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.04 vs. limit=15.0 2023-12-22 23:35:19,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=839373.3333333334, ans=0.07 2023-12-22 23:35:20,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=839373.3333333334, ans=0.2 2023-12-22 23:35:28,644 INFO [train.py:886] (1/4) Epoch 27, batch 2000, loss[loss=0.01215, audio_tagging_loss=0.01215, over 25000.00 frames. ], tot_loss[loss=0.01294, audio_tagging_loss=0.01294, over 4949125.20 frames. ], batch size: 100, lr: 4.01e-03, grad_scale: 64.0 2023-12-22 23:35:30,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.80 vs. limit=12.0 2023-12-22 23:35:37,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=839506.6666666666, ans=0.1 2023-12-22 23:36:04,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=839640.0, ans=0.125 2023-12-22 23:36:09,863 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.727e+01 3.085e+01 3.253e+01 3.416e+01 3.912e+01, threshold=6.506e+01, percent-clipped=0.0 2023-12-22 23:36:18,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=839706.6666666666, ans=0.125 2023-12-22 23:36:21,423 INFO [train.py:886] (1/4) Epoch 27, batch 2050, loss[loss=0.01385, audio_tagging_loss=0.01385, over 25000.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4950518.79 frames. ], batch size: 100, lr: 4.01e-03, grad_scale: 64.0 2023-12-22 23:36:28,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=839773.3333333334, ans=0.125 2023-12-22 23:36:28,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=839773.3333333334, ans=0.1 2023-12-22 23:36:52,603 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2023-12-22 23:37:00,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=839973.3333333334, ans=0.125 2023-12-22 23:37:11,549 INFO [train.py:886] (1/4) Epoch 27, batch 2100, loss[loss=0.01243, audio_tagging_loss=0.01243, over 25000.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 4952450.76 frames. ], batch size: 100, lr: 4.01e-03, grad_scale: 64.0 2023-12-22 23:37:11,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=840106.6666666666, ans=0.0 2023-12-22 23:37:12,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=840106.6666666666, ans=0.125 2023-12-22 23:37:19,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=840106.6666666666, ans=0.125 2023-12-22 23:37:26,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=840173.3333333334, ans=0.1 2023-12-22 23:37:41,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=840240.0, ans=0.0 2023-12-22 23:37:42,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=840306.6666666666, ans=0.0 2023-12-22 23:37:47,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=840306.6666666666, ans=0.0 2023-12-22 23:37:50,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=840306.6666666666, ans=0.125 2023-12-22 23:37:53,521 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.868e+01 3.144e+01 3.259e+01 3.394e+01 3.827e+01, threshold=6.517e+01, percent-clipped=0.0 2023-12-22 23:37:55,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=840373.3333333334, ans=0.125 2023-12-22 23:37:58,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=840373.3333333334, ans=0.0 2023-12-22 23:38:00,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=840373.3333333334, ans=0.125 2023-12-22 23:38:02,569 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.26 vs. limit=10.0 2023-12-22 23:38:03,433 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2023-12-22 23:38:03,699 INFO [train.py:886] (1/4) Epoch 27, batch 2150, loss[loss=0.01762, audio_tagging_loss=0.01762, over 24949.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4955992.56 frames. ], batch size: 100, lr: 4.01e-03, grad_scale: 64.0 2023-12-22 23:38:03,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=840440.0, ans=0.125 2023-12-22 23:38:41,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=840640.0, ans=0.1 2023-12-22 23:38:43,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=840706.6666666666, ans=0.125 2023-12-22 23:38:47,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=840706.6666666666, ans=0.125 2023-12-22 23:38:53,877 INFO [train.py:886] (1/4) Epoch 27, batch 2200, loss[loss=0.01269, audio_tagging_loss=0.01269, over 24750.00 frames. ], tot_loss[loss=0.01296, audio_tagging_loss=0.01296, over 4951326.36 frames. ], batch size: 99, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:39:14,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=840906.6666666666, ans=0.0 2023-12-22 23:39:35,153 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 23:39:35,836 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.899e+01 3.169e+01 3.290e+01 3.439e+01 3.968e+01, threshold=6.580e+01, percent-clipped=0.0 2023-12-22 23:39:36,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=841040.0, ans=0.125 2023-12-22 23:39:45,344 INFO [train.py:886] (1/4) Epoch 27, batch 2250, loss[loss=0.0177, audio_tagging_loss=0.0177, over 24750.00 frames. ], tot_loss[loss=0.01302, audio_tagging_loss=0.01302, over 4946174.27 frames. ], batch size: 99, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:40:05,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=841240.0, ans=0.0 2023-12-22 23:40:25,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=841373.3333333334, ans=0.07 2023-12-22 23:40:28,742 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.61 vs. limit=22.5 2023-12-22 23:40:32,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=841373.3333333334, ans=0.125 2023-12-22 23:40:37,975 INFO [train.py:886] (1/4) Epoch 27, batch 2300, loss[loss=0.01366, audio_tagging_loss=0.01366, over 24750.00 frames. ], tot_loss[loss=0.013, audio_tagging_loss=0.013, over 4943935.04 frames. ], batch size: 99, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:40:44,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=841440.0, ans=0.2 2023-12-22 23:41:02,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=841573.3333333334, ans=0.1 2023-12-22 23:41:13,005 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.51 vs. limit=10.0 2023-12-22 23:41:15,620 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.59 vs. limit=10.0 2023-12-22 23:41:18,131 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.820e+01 3.111e+01 3.239e+01 3.412e+01 3.909e+01, threshold=6.478e+01, percent-clipped=0.0 2023-12-22 23:41:27,128 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.29 vs. limit=22.5 2023-12-22 23:41:27,657 INFO [train.py:886] (1/4) Epoch 27, batch 2350, loss[loss=0.01397, audio_tagging_loss=0.01397, over 24750.00 frames. ], tot_loss[loss=0.01294, audio_tagging_loss=0.01294, over 4943959.97 frames. ], batch size: 99, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:42:04,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=841973.3333333334, ans=0.025 2023-12-22 23:42:06,841 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.17 vs. limit=15.0 2023-12-22 23:42:13,905 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.75 vs. limit=6.0 2023-12-22 23:42:14,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=842040.0, ans=0.0 2023-12-22 23:42:16,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.81 vs. limit=15.0 2023-12-22 23:42:20,075 INFO [train.py:886] (1/4) Epoch 27, batch 2400, loss[loss=0.01452, audio_tagging_loss=0.01452, over 25000.00 frames. ], tot_loss[loss=0.01288, audio_tagging_loss=0.01288, over 4943490.54 frames. ], batch size: 100, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:42:25,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=842106.6666666666, ans=0.125 2023-12-22 23:42:32,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.64 vs. limit=15.0 2023-12-22 23:42:35,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=842173.3333333334, ans=0.1 2023-12-22 23:42:46,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=842240.0, ans=0.0 2023-12-22 23:43:00,772 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.830e+01 3.113e+01 3.225e+01 3.408e+01 4.094e+01, threshold=6.450e+01, percent-clipped=0.0 2023-12-22 23:43:03,225 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.41 vs. limit=15.0 2023-12-22 23:43:10,990 INFO [train.py:886] (1/4) Epoch 27, batch 2450, loss[loss=0.01326, audio_tagging_loss=0.01326, over 25000.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4942434.59 frames. ], batch size: 100, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:43:12,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=842440.0, ans=0.2 2023-12-22 23:43:12,978 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.74 vs. limit=15.0 2023-12-22 23:43:21,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=842506.6666666666, ans=0.0 2023-12-22 23:43:24,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=842506.6666666666, ans=0.09899494936611666 2023-12-22 23:43:33,723 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.23 vs. limit=15.0 2023-12-22 23:43:55,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=842706.6666666666, ans=0.07 2023-12-22 23:44:03,522 INFO [train.py:886] (1/4) Epoch 27, batch 2500, loss[loss=0.0123, audio_tagging_loss=0.0123, over 24750.00 frames. ], tot_loss[loss=0.01289, audio_tagging_loss=0.01289, over 4940081.83 frames. ], batch size: 99, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:44:13,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=842840.0, ans=0.0 2023-12-22 23:44:28,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=842906.6666666666, ans=0.0 2023-12-22 23:44:30,611 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.32 vs. limit=22.5 2023-12-22 23:44:44,155 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.936e+01 3.187e+01 3.317e+01 3.432e+01 3.909e+01, threshold=6.634e+01, percent-clipped=0.0 2023-12-22 23:44:55,754 INFO [train.py:886] (1/4) Epoch 27, batch 2550, loss[loss=0.01246, audio_tagging_loss=0.01246, over 24750.00 frames. ], tot_loss[loss=0.01294, audio_tagging_loss=0.01294, over 4939620.29 frames. ], batch size: 99, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:45:03,918 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.11 vs. limit=22.5 2023-12-22 23:45:14,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.34 vs. limit=12.0 2023-12-22 23:45:15,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=843240.0, ans=0.125 2023-12-22 23:45:24,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.35 vs. limit=22.5 2023-12-22 23:45:46,015 INFO [train.py:886] (1/4) Epoch 27, batch 2600, loss[loss=0.01517, audio_tagging_loss=0.01517, over 24750.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 4940342.05 frames. ], batch size: 99, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:45:50,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=843440.0, ans=0.125 2023-12-22 23:46:23,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=843640.0, ans=0.0 2023-12-22 23:46:27,695 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.891e+01 3.099e+01 3.252e+01 3.438e+01 3.884e+01, threshold=6.504e+01, percent-clipped=0.0 2023-12-22 23:46:38,022 INFO [train.py:886] (1/4) Epoch 27, batch 2650, loss[loss=0.0112, audio_tagging_loss=0.0112, over 25000.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4942371.97 frames. ], batch size: 100, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:47:09,622 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.54 vs. limit=15.0 2023-12-22 23:47:10,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=843973.3333333334, ans=0.1 2023-12-22 23:47:19,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=844040.0, ans=0.125 2023-12-22 23:47:30,282 INFO [train.py:886] (1/4) Epoch 27, batch 2700, loss[loss=0.01218, audio_tagging_loss=0.01218, over 25000.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 4939553.58 frames. ], batch size: 100, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:47:34,280 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 23:47:49,305 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.15 vs. limit=15.0 2023-12-22 23:48:04,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=844306.6666666666, ans=0.0 2023-12-22 23:48:11,090 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.927e+01 3.112e+01 3.261e+01 3.405e+01 4.046e+01, threshold=6.522e+01, percent-clipped=0.0 2023-12-22 23:48:15,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=844373.3333333334, ans=0.04949747468305833 2023-12-22 23:48:18,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=844373.3333333334, ans=0.125 2023-12-22 23:48:21,325 INFO [train.py:886] (1/4) Epoch 27, batch 2750, loss[loss=0.01448, audio_tagging_loss=0.01448, over 25000.00 frames. ], tot_loss[loss=0.01275, audio_tagging_loss=0.01275, over 4942439.21 frames. ], batch size: 100, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:48:34,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=844506.6666666666, ans=0.2 2023-12-22 23:48:47,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=844573.3333333334, ans=0.0 2023-12-22 23:48:48,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=844573.3333333334, ans=0.0 2023-12-22 23:49:09,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=844706.6666666666, ans=0.125 2023-12-22 23:49:13,584 INFO [train.py:886] (1/4) Epoch 27, batch 2800, loss[loss=0.01064, audio_tagging_loss=0.01064, over 24042.00 frames. ], tot_loss[loss=0.01287, audio_tagging_loss=0.01287, over 4944448.64 frames. ], batch size: 100, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:49:19,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=844773.3333333334, ans=0.125 2023-12-22 23:49:37,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=844906.6666666666, ans=0.1 2023-12-22 23:49:54,270 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.754e+01 3.194e+01 3.305e+01 3.452e+01 3.886e+01, threshold=6.610e+01, percent-clipped=0.0 2023-12-22 23:50:05,157 INFO [train.py:886] (1/4) Epoch 27, batch 2850, loss[loss=0.0153, audio_tagging_loss=0.0153, over 24750.00 frames. ], tot_loss[loss=0.01293, audio_tagging_loss=0.01293, over 4940884.75 frames. ], batch size: 99, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:50:18,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=845173.3333333334, ans=0.125 2023-12-22 23:50:38,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=845306.6666666666, ans=0.0 2023-12-22 23:50:42,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=845306.6666666666, ans=0.125 2023-12-22 23:50:50,614 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.54 vs. limit=15.0 2023-12-22 23:50:52,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=845373.3333333334, ans=0.025 2023-12-22 23:50:56,337 INFO [train.py:886] (1/4) Epoch 27, batch 2900, loss[loss=0.01346, audio_tagging_loss=0.01346, over 25000.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4932795.83 frames. ], batch size: 100, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:50:57,476 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 23:51:18,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=845573.3333333334, ans=0.125 2023-12-22 23:51:21,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=845573.3333333334, ans=0.0 2023-12-22 23:51:37,798 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.846e+01 3.117e+01 3.211e+01 3.387e+01 4.000e+01, threshold=6.423e+01, percent-clipped=0.0 2023-12-22 23:51:40,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=845706.6666666666, ans=0.125 2023-12-22 23:51:48,686 INFO [train.py:886] (1/4) Epoch 27, batch 2950, loss[loss=0.01335, audio_tagging_loss=0.01335, over 24750.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 4933259.12 frames. ], batch size: 99, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:51:48,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=845773.3333333334, ans=0.125 2023-12-22 23:51:49,364 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.38 vs. limit=22.5 2023-12-22 23:52:13,190 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.61 vs. limit=22.5 2023-12-22 23:52:39,825 INFO [train.py:886] (1/4) Epoch 27, batch 3000, loss[loss=0.01252, audio_tagging_loss=0.01252, over 24750.00 frames. ], tot_loss[loss=0.01271, audio_tagging_loss=0.01271, over 4939606.25 frames. ], batch size: 99, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:52:39,826 INFO [train.py:909] (1/4) Computing validation loss 2023-12-22 23:52:56,417 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.5923, 2.7923, 4.2352, 3.7976], device='cuda:1') 2023-12-22 23:53:00,359 INFO [train.py:917] (1/4) Epoch 27, validation: loss=0.03311, audio_tagging_loss=0.03311, over 3737520.00 frames. 2023-12-22 23:53:00,360 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-22 23:53:18,957 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.78 vs. limit=12.0 2023-12-22 23:53:33,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=846306.6666666666, ans=0.04949747468305833 2023-12-22 23:53:35,919 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.57 vs. limit=12.0 2023-12-22 23:53:37,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=846306.6666666666, ans=0.125 2023-12-22 23:53:42,510 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.897e+01 3.118e+01 3.260e+01 3.397e+01 4.017e+01, threshold=6.519e+01, percent-clipped=0.0 2023-12-22 23:53:53,393 INFO [train.py:886] (1/4) Epoch 27, batch 3050, loss[loss=0.01114, audio_tagging_loss=0.01114, over 24005.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 4944482.92 frames. ], batch size: 100, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:53:53,829 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.50 vs. limit=15.0 2023-12-22 23:54:13,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=846573.3333333334, ans=0.0 2023-12-22 23:54:29,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=846640.0, ans=0.0 2023-12-22 23:54:31,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=846640.0, ans=0.1 2023-12-22 23:54:44,005 INFO [train.py:886] (1/4) Epoch 27, batch 3100, loss[loss=0.01098, audio_tagging_loss=0.01098, over 24750.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 4949859.43 frames. ], batch size: 99, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:54:53,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2023-12-22 23:55:02,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=846840.0, ans=0.07 2023-12-22 23:55:09,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=846906.6666666666, ans=0.0 2023-12-22 23:55:16,060 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.75 vs. limit=6.0 2023-12-22 23:55:26,097 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.855e+01 3.155e+01 3.323e+01 3.468e+01 4.091e+01, threshold=6.646e+01, percent-clipped=0.0 2023-12-22 23:55:31,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=847040.0, ans=0.125 2023-12-22 23:55:31,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=847040.0, ans=0.0 2023-12-22 23:55:35,700 INFO [train.py:886] (1/4) Epoch 27, batch 3150, loss[loss=0.01232, audio_tagging_loss=0.01232, over 24750.00 frames. ], tot_loss[loss=0.01277, audio_tagging_loss=0.01277, over 4946809.54 frames. ], batch size: 99, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:55:39,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=847106.6666666666, ans=0.0 2023-12-22 23:55:40,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=847106.6666666666, ans=0.125 2023-12-22 23:55:49,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=847173.3333333334, ans=0.2 2023-12-22 23:56:09,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=847306.6666666666, ans=0.0 2023-12-22 23:56:11,226 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 23:56:27,859 INFO [train.py:886] (1/4) Epoch 27, batch 3200, loss[loss=0.01214, audio_tagging_loss=0.01214, over 25000.00 frames. ], tot_loss[loss=0.01283, audio_tagging_loss=0.01283, over 4943880.80 frames. ], batch size: 100, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:56:31,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=847440.0, ans=0.07 2023-12-22 23:56:42,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.19 vs. limit=15.0 2023-12-22 23:56:51,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=847573.3333333334, ans=0.125 2023-12-22 23:57:08,444 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.746e+01 3.162e+01 3.265e+01 3.446e+01 4.055e+01, threshold=6.529e+01, percent-clipped=0.0 2023-12-22 23:57:17,865 INFO [train.py:886] (1/4) Epoch 27, batch 3250, loss[loss=0.01234, audio_tagging_loss=0.01234, over 22469.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4939764.87 frames. ], batch size: 107, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:57:24,186 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.18 vs. limit=10.0 2023-12-22 23:57:27,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=847773.3333333334, ans=0.0 2023-12-22 23:57:32,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=847840.0, ans=0.0 2023-12-22 23:57:40,740 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.18 vs. limit=15.0 2023-12-22 23:57:47,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=847906.6666666666, ans=0.125 2023-12-22 23:57:58,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=847973.3333333334, ans=0.0 2023-12-22 23:58:02,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=848040.0, ans=0.1 2023-12-22 23:58:04,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=848040.0, ans=0.1 2023-12-22 23:58:06,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=848040.0, ans=0.125 2023-12-22 23:58:06,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=848040.0, ans=0.125 2023-12-22 23:58:10,820 INFO [train.py:886] (1/4) Epoch 27, batch 3300, loss[loss=0.01255, audio_tagging_loss=0.01255, over 24750.00 frames. ], tot_loss[loss=0.01271, audio_tagging_loss=0.01271, over 4936620.14 frames. ], batch size: 99, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:58:11,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=848106.6666666666, ans=0.05 2023-12-22 23:58:14,280 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.83 vs. limit=22.5 2023-12-22 23:58:17,139 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.92 vs. limit=15.0 2023-12-22 23:58:27,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=848173.3333333334, ans=0.2 2023-12-22 23:58:34,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.00 vs. limit=15.0 2023-12-22 23:58:42,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=848306.6666666666, ans=0.2 2023-12-22 23:58:45,385 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.68 vs. limit=15.0 2023-12-22 23:58:48,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=848306.6666666666, ans=0.125 2023-12-22 23:58:51,326 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.800e+01 3.157e+01 3.288e+01 3.417e+01 4.683e+01, threshold=6.576e+01, percent-clipped=0.0 2023-12-22 23:58:58,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=848373.3333333334, ans=0.0 2023-12-22 23:59:02,303 INFO [train.py:886] (1/4) Epoch 27, batch 3350, loss[loss=0.01156, audio_tagging_loss=0.01156, over 25000.00 frames. ], tot_loss[loss=0.0127, audio_tagging_loss=0.0127, over 4936915.72 frames. ], batch size: 100, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:59:11,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=848440.0, ans=0.0 2023-12-22 23:59:12,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=848506.6666666666, ans=0.125 2023-12-22 23:59:13,711 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.52 vs. limit=15.0 2023-12-22 23:59:17,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=848506.6666666666, ans=0.125 2023-12-22 23:59:23,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=848573.3333333334, ans=0.0 2023-12-22 23:59:28,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=848573.3333333334, ans=0.1 2023-12-22 23:59:38,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=848640.0, ans=0.125 2023-12-22 23:59:53,241 INFO [train.py:886] (1/4) Epoch 27, batch 3400, loss[loss=0.01315, audio_tagging_loss=0.01315, over 25000.00 frames. ], tot_loss[loss=0.0128, audio_tagging_loss=0.0128, over 4937225.86 frames. ], batch size: 100, lr: 3.99e-03, grad_scale: 64.0 2023-12-23 00:00:01,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=848773.3333333334, ans=0.125 2023-12-23 00:00:04,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=848840.0, ans=0.125 2023-12-23 00:00:30,748 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.78 vs. limit=12.0 2023-12-23 00:00:33,028 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.887e+01 3.185e+01 3.323e+01 3.461e+01 4.226e+01, threshold=6.645e+01, percent-clipped=0.0 2023-12-23 00:00:45,419 INFO [train.py:886] (1/4) Epoch 27, batch 3450, loss[loss=0.01416, audio_tagging_loss=0.01416, over 24750.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4933879.18 frames. ], batch size: 99, lr: 3.99e-03, grad_scale: 64.0 2023-12-23 00:00:52,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=849106.6666666666, ans=0.09899494936611666 2023-12-23 00:01:34,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=849373.3333333334, ans=0.0 2023-12-23 00:01:36,107 INFO [train.py:886] (1/4) Epoch 27, batch 3500, loss[loss=0.01174, audio_tagging_loss=0.01174, over 24750.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 4935622.98 frames. ], batch size: 99, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:01:49,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=849506.6666666666, ans=0.0 2023-12-23 00:01:51,989 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.90 vs. limit=15.0 2023-12-23 00:02:06,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=849573.3333333334, ans=0.0 2023-12-23 00:02:08,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=849640.0, ans=0.0 2023-12-23 00:02:20,117 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.696e+01 3.138e+01 3.283e+01 3.482e+01 4.072e+01, threshold=6.566e+01, percent-clipped=0.0 2023-12-23 00:02:29,539 INFO [train.py:886] (1/4) Epoch 27, batch 3550, loss[loss=0.01098, audio_tagging_loss=0.01098, over 25000.00 frames. ], tot_loss[loss=0.01289, audio_tagging_loss=0.01289, over 4939408.79 frames. ], batch size: 100, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:02:40,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=849840.0, ans=0.1 2023-12-23 00:02:41,350 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 00:02:59,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=849906.6666666666, ans=0.5 2023-12-23 00:03:13,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=850040.0, ans=0.125 2023-12-23 00:03:22,010 INFO [train.py:886] (1/4) Epoch 27, batch 3600, loss[loss=0.01141, audio_tagging_loss=0.01141, over 25000.00 frames. ], tot_loss[loss=0.01288, audio_tagging_loss=0.01288, over 4945884.54 frames. ], batch size: 100, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:03:22,414 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.70 vs. limit=15.0 2023-12-23 00:03:28,097 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.62 vs. limit=12.0 2023-12-23 00:03:29,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=850106.6666666666, ans=0.0 2023-12-23 00:04:02,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=850373.3333333334, ans=0.0 2023-12-23 00:04:03,114 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.828e+01 3.137e+01 3.279e+01 3.466e+01 4.135e+01, threshold=6.558e+01, percent-clipped=0.0 2023-12-23 00:04:04,512 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2023-12-23 00:04:12,581 INFO [train.py:886] (1/4) Epoch 27, batch 3650, loss[loss=0.0168, audio_tagging_loss=0.0168, over 25000.00 frames. ], tot_loss[loss=0.01276, audio_tagging_loss=0.01276, over 4944850.25 frames. ], batch size: 100, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:04:31,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=850506.6666666666, ans=0.0 2023-12-23 00:04:41,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=850573.3333333334, ans=0.125 2023-12-23 00:04:43,470 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.98 vs. limit=15.0 2023-12-23 00:05:02,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=850706.6666666666, ans=0.1 2023-12-23 00:05:02,935 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.60 vs. limit=12.0 2023-12-23 00:05:05,192 INFO [train.py:886] (1/4) Epoch 27, batch 3700, loss[loss=0.01189, audio_tagging_loss=0.01189, over 25000.00 frames. ], tot_loss[loss=0.0127, audio_tagging_loss=0.0127, over 4952203.86 frames. ], batch size: 100, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:05:06,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=850773.3333333334, ans=0.0 2023-12-23 00:05:11,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=850773.3333333334, ans=0.1 2023-12-23 00:05:29,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=850906.6666666666, ans=0.1 2023-12-23 00:05:30,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=850906.6666666666, ans=0.125 2023-12-23 00:05:36,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=850973.3333333334, ans=0.125 2023-12-23 00:05:46,646 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.920e+01 3.190e+01 3.298e+01 3.442e+01 3.856e+01, threshold=6.597e+01, percent-clipped=0.0 2023-12-23 00:05:52,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=851040.0, ans=0.025 2023-12-23 00:05:56,026 INFO [train.py:886] (1/4) Epoch 27, batch 3750, loss[loss=0.01282, audio_tagging_loss=0.01282, over 24750.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4949976.22 frames. ], batch size: 99, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:06:21,974 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.26 vs. limit=15.0 2023-12-23 00:06:24,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=851240.0, ans=0.0 2023-12-23 00:06:26,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=851306.6666666666, ans=0.125 2023-12-23 00:06:38,976 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.03 vs. limit=15.0 2023-12-23 00:06:43,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=851373.3333333334, ans=0.125 2023-12-23 00:06:48,653 INFO [train.py:886] (1/4) Epoch 27, batch 3800, loss[loss=0.01437, audio_tagging_loss=0.01437, over 24750.00 frames. ], tot_loss[loss=0.01281, audio_tagging_loss=0.01281, over 4949973.90 frames. ], batch size: 99, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:06:48,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=851440.0, ans=0.0 2023-12-23 00:06:54,920 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.63 vs. limit=15.0 2023-12-23 00:06:56,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=851440.0, ans=0.07 2023-12-23 00:07:10,504 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.60 vs. limit=15.0 2023-12-23 00:07:23,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=851640.0, ans=0.2 2023-12-23 00:07:27,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=851640.0, ans=0.0 2023-12-23 00:07:29,310 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.838e+01 3.145e+01 3.329e+01 3.462e+01 3.871e+01, threshold=6.658e+01, percent-clipped=0.0 2023-12-23 00:07:40,927 INFO [train.py:886] (1/4) Epoch 27, batch 3850, loss[loss=0.01039, audio_tagging_loss=0.01039, over 24750.00 frames. ], tot_loss[loss=0.01277, audio_tagging_loss=0.01277, over 4950829.85 frames. ], batch size: 99, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:07:43,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=851773.3333333334, ans=0.1 2023-12-23 00:07:55,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=851840.0, ans=0.0 2023-12-23 00:08:05,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=851906.6666666666, ans=0.125 2023-12-23 00:08:15,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=851973.3333333334, ans=0.0 2023-12-23 00:08:19,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=851973.3333333334, ans=0.125 2023-12-23 00:08:21,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=852040.0, ans=0.0 2023-12-23 00:08:29,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=852040.0, ans=0.0 2023-12-23 00:08:32,563 INFO [train.py:886] (1/4) Epoch 27, batch 3900, loss[loss=0.01354, audio_tagging_loss=0.01354, over 25000.00 frames. ], tot_loss[loss=0.01267, audio_tagging_loss=0.01267, over 4949384.44 frames. ], batch size: 100, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:08:39,189 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.93 vs. limit=15.0 2023-12-23 00:08:45,590 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.84 vs. limit=15.0 2023-12-23 00:08:48,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=852173.3333333334, ans=0.0 2023-12-23 00:09:05,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=852306.6666666666, ans=0.125 2023-12-23 00:09:15,478 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.748e+01 3.126e+01 3.278e+01 3.398e+01 3.984e+01, threshold=6.555e+01, percent-clipped=0.0 2023-12-23 00:09:25,093 INFO [train.py:886] (1/4) Epoch 27, batch 3950, loss[loss=0.01422, audio_tagging_loss=0.01422, over 22223.00 frames. ], tot_loss[loss=0.01272, audio_tagging_loss=0.01272, over 4950079.17 frames. ], batch size: 107, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:09:27,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=852440.0, ans=0.125 2023-12-23 00:09:48,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=852573.3333333334, ans=0.125 2023-12-23 00:09:53,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=852573.3333333334, ans=0.1 2023-12-23 00:09:53,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=852573.3333333334, ans=0.125 2023-12-23 00:10:01,735 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.30 vs. limit=15.0 2023-12-23 00:10:07,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=852706.6666666666, ans=0.125 2023-12-23 00:10:14,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=852706.6666666666, ans=0.0 2023-12-23 00:10:16,916 INFO [train.py:886] (1/4) Epoch 27, batch 4000, loss[loss=0.01359, audio_tagging_loss=0.01359, over 25000.00 frames. ], tot_loss[loss=0.01277, audio_tagging_loss=0.01277, over 4952839.09 frames. ], batch size: 100, lr: 3.98e-03, grad_scale: 128.0 2023-12-23 00:10:24,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.21 vs. limit=15.0 2023-12-23 00:10:27,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=852840.0, ans=0.125 2023-12-23 00:10:33,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=852840.0, ans=0.1 2023-12-23 00:10:33,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=852840.0, ans=0.125 2023-12-23 00:10:33,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=852840.0, ans=0.2 2023-12-23 00:10:42,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=852906.6666666666, ans=0.125 2023-12-23 00:10:54,747 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.83 vs. limit=10.0 2023-12-23 00:10:59,973 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.833e+01 3.188e+01 3.348e+01 3.452e+01 3.928e+01, threshold=6.695e+01, percent-clipped=0.0 2023-12-23 00:11:05,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=853040.0, ans=0.0 2023-12-23 00:11:08,498 INFO [train.py:886] (1/4) Epoch 27, batch 4050, loss[loss=0.01473, audio_tagging_loss=0.01473, over 24750.00 frames. ], tot_loss[loss=0.01286, audio_tagging_loss=0.01286, over 4956175.08 frames. ], batch size: 99, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:11:21,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=853173.3333333334, ans=0.0 2023-12-23 00:11:23,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=853173.3333333334, ans=10.0 2023-12-23 00:11:24,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=853173.3333333334, ans=0.0 2023-12-23 00:11:30,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=853240.0, ans=0.125 2023-12-23 00:11:31,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=853240.0, ans=0.125 2023-12-23 00:11:32,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=853240.0, ans=0.0 2023-12-23 00:11:35,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=853240.0, ans=0.2 2023-12-23 00:11:47,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=853306.6666666666, ans=0.05 2023-12-23 00:11:48,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=853306.6666666666, ans=0.035 2023-12-23 00:11:49,020 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.57 vs. limit=12.0 2023-12-23 00:12:02,804 INFO [train.py:886] (1/4) Epoch 27, batch 4100, loss[loss=0.009239, audio_tagging_loss=0.009239, over 24044.00 frames. ], tot_loss[loss=0.01293, audio_tagging_loss=0.01293, over 4944621.90 frames. ], batch size: 100, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:12:09,285 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.30 vs. limit=6.0 2023-12-23 00:12:18,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.75 vs. limit=22.5 2023-12-23 00:12:22,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=853573.3333333334, ans=0.125 2023-12-23 00:12:28,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=853573.3333333334, ans=0.0 2023-12-23 00:12:29,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=853573.3333333334, ans=0.1 2023-12-23 00:12:30,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=853573.3333333334, ans=0.0 2023-12-23 00:12:35,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=853640.0, ans=0.125 2023-12-23 00:12:36,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=853640.0, ans=0.2 2023-12-23 00:12:38,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=853640.0, ans=0.2 2023-12-23 00:12:40,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=853640.0, ans=0.125 2023-12-23 00:12:44,653 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.808e+01 3.219e+01 3.295e+01 3.462e+01 3.940e+01, threshold=6.591e+01, percent-clipped=0.0 2023-12-23 00:12:48,569 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.35 vs. limit=22.5 2023-12-23 00:12:54,572 INFO [train.py:886] (1/4) Epoch 27, batch 4150, loss[loss=0.01337, audio_tagging_loss=0.01337, over 25000.00 frames. ], tot_loss[loss=0.01287, audio_tagging_loss=0.01287, over 4945372.23 frames. ], batch size: 100, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:13:00,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=853773.3333333334, ans=0.125 2023-12-23 00:13:05,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=853840.0, ans=0.2 2023-12-23 00:13:22,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=853906.6666666666, ans=0.015 2023-12-23 00:13:30,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=853973.3333333334, ans=0.0 2023-12-23 00:13:31,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=853973.3333333334, ans=0.2 2023-12-23 00:13:46,162 INFO [train.py:886] (1/4) Epoch 27, batch 4200, loss[loss=0.01256, audio_tagging_loss=0.01256, over 25000.00 frames. ], tot_loss[loss=0.01284, audio_tagging_loss=0.01284, over 4942059.10 frames. ], batch size: 100, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:13:46,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=854106.6666666666, ans=0.2 2023-12-23 00:13:48,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=854106.6666666666, ans=0.05 2023-12-23 00:13:49,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=854106.6666666666, ans=0.125 2023-12-23 00:14:00,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=854173.3333333334, ans=0.2 2023-12-23 00:14:01,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=854173.3333333334, ans=0.1 2023-12-23 00:14:01,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=854173.3333333334, ans=0.1 2023-12-23 00:14:21,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=854306.6666666666, ans=0.0 2023-12-23 00:14:28,652 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.773e+01 3.191e+01 3.309e+01 3.476e+01 4.229e+01, threshold=6.619e+01, percent-clipped=0.0 2023-12-23 00:14:37,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=854373.3333333334, ans=0.125 2023-12-23 00:14:38,661 INFO [train.py:886] (1/4) Epoch 27, batch 4250, loss[loss=0.01439, audio_tagging_loss=0.01439, over 25000.00 frames. ], tot_loss[loss=0.01281, audio_tagging_loss=0.01281, over 4941065.10 frames. ], batch size: 100, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:14:44,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=854440.0, ans=0.125 2023-12-23 00:14:45,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=854440.0, ans=0.125 2023-12-23 00:15:13,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=854640.0, ans=0.125 2023-12-23 00:15:25,383 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.45 vs. limit=15.0 2023-12-23 00:15:26,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=854706.6666666666, ans=0.125 2023-12-23 00:15:28,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=854706.6666666666, ans=0.125 2023-12-23 00:15:28,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=854706.6666666666, ans=0.1 2023-12-23 00:15:29,740 INFO [train.py:886] (1/4) Epoch 27, batch 4300, loss[loss=0.01238, audio_tagging_loss=0.01238, over 25000.00 frames. ], tot_loss[loss=0.01278, audio_tagging_loss=0.01278, over 4942201.81 frames. ], batch size: 100, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:15:44,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=854840.0, ans=0.125 2023-12-23 00:15:45,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=854840.0, ans=0.125 2023-12-23 00:15:45,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=854840.0, ans=0.0 2023-12-23 00:15:56,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=854906.6666666666, ans=0.0 2023-12-23 00:15:57,853 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.93 vs. limit=15.0 2023-12-23 00:16:00,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=854973.3333333334, ans=0.0 2023-12-23 00:16:13,511 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.793e+01 3.183e+01 3.299e+01 3.496e+01 3.905e+01, threshold=6.598e+01, percent-clipped=0.0 2023-12-23 00:16:22,709 INFO [train.py:886] (1/4) Epoch 27, batch 4350, loss[loss=0.01438, audio_tagging_loss=0.01438, over 25000.00 frames. ], tot_loss[loss=0.01275, audio_tagging_loss=0.01275, over 4950210.61 frames. ], batch size: 100, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:16:45,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=855240.0, ans=0.125 2023-12-23 00:16:48,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=855240.0, ans=0.0 2023-12-23 00:16:49,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=855240.0, ans=0.125 2023-12-23 00:17:01,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=855306.6666666666, ans=0.07 2023-12-23 00:17:13,782 INFO [train.py:886] (1/4) Epoch 27, batch 4400, loss[loss=0.01154, audio_tagging_loss=0.01154, over 24750.00 frames. ], tot_loss[loss=0.01293, audio_tagging_loss=0.01293, over 4945544.06 frames. ], batch size: 99, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:17:18,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=855440.0, ans=0.0 2023-12-23 00:17:23,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=855506.6666666666, ans=0.125 2023-12-23 00:17:32,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=855506.6666666666, ans=0.1 2023-12-23 00:17:46,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=855640.0, ans=0.125 2023-12-23 00:17:55,453 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.876e+01 3.175e+01 3.302e+01 3.505e+01 4.332e+01, threshold=6.604e+01, percent-clipped=0.0 2023-12-23 00:18:02,769 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.98 vs. limit=15.0 2023-12-23 00:18:03,949 INFO [train.py:886] (1/4) Epoch 27, batch 4450, loss[loss=0.01303, audio_tagging_loss=0.01303, over 25000.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 4942565.32 frames. ], batch size: 100, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:18:24,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=855906.6666666666, ans=0.1 2023-12-23 00:18:26,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=855906.6666666666, ans=0.125 2023-12-23 00:18:29,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=855906.6666666666, ans=0.125 2023-12-23 00:18:32,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=855906.6666666666, ans=0.125 2023-12-23 00:18:32,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=855906.6666666666, ans=0.07 2023-12-23 00:18:33,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=855973.3333333334, ans=0.125 2023-12-23 00:18:41,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=855973.3333333334, ans=0.0 2023-12-23 00:18:52,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=856040.0, ans=0.125 2023-12-23 00:18:55,923 INFO [train.py:886] (1/4) Epoch 27, batch 4500, loss[loss=0.01251, audio_tagging_loss=0.01251, over 24750.00 frames. ], tot_loss[loss=0.01287, audio_tagging_loss=0.01287, over 4939997.44 frames. ], batch size: 99, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:19:12,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=856173.3333333334, ans=0.0 2023-12-23 00:19:13,545 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.09 vs. limit=12.0 2023-12-23 00:19:28,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=856306.6666666666, ans=0.125 2023-12-23 00:19:36,622 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.40 vs. limit=15.0 2023-12-23 00:19:37,245 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.864e+01 3.151e+01 3.331e+01 3.446e+01 3.976e+01, threshold=6.663e+01, percent-clipped=0.0 2023-12-23 00:19:45,689 INFO [train.py:886] (1/4) Epoch 27, batch 4550, loss[loss=0.01185, audio_tagging_loss=0.01185, over 24750.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4946500.70 frames. ], batch size: 99, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:19:51,676 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.06 vs. limit=15.0 2023-12-23 00:19:55,285 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.34 vs. limit=22.5 2023-12-23 00:20:06,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=856506.6666666666, ans=0.125 2023-12-23 00:20:08,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=856573.3333333334, ans=0.125 2023-12-23 00:20:12,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=856573.3333333334, ans=0.95 2023-12-23 00:20:36,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=856706.6666666666, ans=0.025 2023-12-23 00:20:38,709 INFO [train.py:886] (1/4) Epoch 27, batch 4600, loss[loss=0.01177, audio_tagging_loss=0.01177, over 25000.00 frames. ], tot_loss[loss=0.01285, audio_tagging_loss=0.01285, over 4957050.94 frames. ], batch size: 100, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:20:44,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=856773.3333333334, ans=0.1 2023-12-23 00:21:07,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=856906.6666666666, ans=0.125 2023-12-23 00:21:10,269 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.10 vs. limit=10.0 2023-12-23 00:21:12,845 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 00:21:20,174 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.671e+01 3.147e+01 3.251e+01 3.413e+01 3.679e+01, threshold=6.502e+01, percent-clipped=0.0 2023-12-23 00:21:20,918 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.09 vs. limit=10.0 2023-12-23 00:21:30,186 INFO [train.py:886] (1/4) Epoch 27, batch 4650, loss[loss=0.01077, audio_tagging_loss=0.01077, over 25000.00 frames. ], tot_loss[loss=0.01285, audio_tagging_loss=0.01285, over 4958479.75 frames. ], batch size: 100, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:21:31,560 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.57 vs. limit=22.5 2023-12-23 00:21:44,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=857173.3333333334, ans=0.0 2023-12-23 00:21:44,799 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.06 vs. limit=15.0 2023-12-23 00:21:46,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=857173.3333333334, ans=0.125 2023-12-23 00:21:47,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=857173.3333333334, ans=0.0 2023-12-23 00:22:02,107 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.40 vs. limit=22.5 2023-12-23 00:22:20,068 INFO [train.py:886] (1/4) Epoch 27, batch 4700, loss[loss=0.01602, audio_tagging_loss=0.01602, over 25000.00 frames. ], tot_loss[loss=0.013, audio_tagging_loss=0.013, over 4955627.57 frames. ], batch size: 100, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:22:20,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=857440.0, ans=0.2 2023-12-23 00:22:26,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=857440.0, ans=0.0 2023-12-23 00:22:29,645 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.74 vs. limit=22.5 2023-12-23 00:22:34,076 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.62 vs. limit=15.0 2023-12-23 00:22:51,380 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.40 vs. limit=15.0 2023-12-23 00:22:57,964 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.951e+01 3.169e+01 3.333e+01 3.511e+01 4.698e+01, threshold=6.666e+01, percent-clipped=0.0 2023-12-23 00:22:59,018 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.01 vs. limit=22.5 2023-12-23 00:23:00,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=857706.6666666666, ans=0.125 2023-12-23 00:23:07,084 INFO [train.py:886] (1/4) Epoch 27, batch 4750, loss[loss=0.01412, audio_tagging_loss=0.01412, over 24750.00 frames. ], tot_loss[loss=0.01309, audio_tagging_loss=0.01309, over 4951593.45 frames. ], batch size: 99, lr: 3.96e-03, grad_scale: 64.0 2023-12-23 00:23:11,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=857773.3333333334, ans=0.2 2023-12-23 00:23:14,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=857773.3333333334, ans=0.1 2023-12-23 00:23:42,401 INFO [train.py:886] (1/4) Epoch 28, batch 0, loss[loss=0.02838, audio_tagging_loss=0.02838, over 24012.00 frames. ], tot_loss[loss=0.02838, audio_tagging_loss=0.02838, over 24012.00 frames. ], batch size: 100, lr: 3.89e-03, grad_scale: 32.0 2023-12-23 00:23:42,401 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 00:24:03,709 INFO [train.py:917] (1/4) Epoch 28, validation: loss=0.03329, audio_tagging_loss=0.03329, over 3737520.00 frames. 2023-12-23 00:24:03,709 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 00:24:19,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=857946.6666666666, ans=0.1 2023-12-23 00:24:26,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=858013.3333333334, ans=0.0 2023-12-23 00:24:26,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=858013.3333333334, ans=0.125 2023-12-23 00:24:34,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=858080.0, ans=0.2 2023-12-23 00:24:47,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.80 vs. limit=6.0 2023-12-23 00:24:53,479 INFO [train.py:886] (1/4) Epoch 28, batch 50, loss[loss=0.01626, audio_tagging_loss=0.01626, over 25000.00 frames. ], tot_loss[loss=0.02085, audio_tagging_loss=0.02085, over 1118052.42 frames. ], batch size: 100, lr: 3.89e-03, grad_scale: 32.0 2023-12-23 00:24:59,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=858213.3333333334, ans=0.1 2023-12-23 00:25:17,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=858346.6666666666, ans=0.2 2023-12-23 00:25:17,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=858346.6666666666, ans=0.125 2023-12-23 00:25:20,166 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.879e+01 3.617e+01 4.003e+01 4.694e+01 1.109e+02, threshold=8.005e+01, percent-clipped=9.0 2023-12-23 00:25:30,060 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.44 vs. limit=15.0 2023-12-23 00:25:44,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=858546.6666666666, ans=0.2 2023-12-23 00:25:45,095 INFO [train.py:886] (1/4) Epoch 28, batch 100, loss[loss=0.01264, audio_tagging_loss=0.01264, over 25000.00 frames. ], tot_loss[loss=0.01771, audio_tagging_loss=0.01771, over 1975459.68 frames. ], batch size: 100, lr: 3.89e-03, grad_scale: 32.0 2023-12-23 00:25:45,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=858546.6666666666, ans=0.125 2023-12-23 00:25:50,354 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.62 vs. limit=6.0 2023-12-23 00:26:10,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=858680.0, ans=0.1 2023-12-23 00:26:18,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=858746.6666666666, ans=0.0 2023-12-23 00:26:20,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=858746.6666666666, ans=0.2 2023-12-23 00:26:21,122 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.22 vs. limit=10.0 2023-12-23 00:26:25,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=858813.3333333334, ans=0.2 2023-12-23 00:26:25,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.24 vs. limit=22.5 2023-12-23 00:26:35,731 INFO [train.py:886] (1/4) Epoch 28, batch 150, loss[loss=0.01297, audio_tagging_loss=0.01297, over 25000.00 frames. ], tot_loss[loss=0.01626, audio_tagging_loss=0.01626, over 2630657.86 frames. ], batch size: 100, lr: 3.89e-03, grad_scale: 32.0 2023-12-23 00:27:02,517 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.973e+01 3.280e+01 3.440e+01 3.594e+01 4.041e+01, threshold=6.880e+01, percent-clipped=0.0 2023-12-23 00:27:11,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=859080.0, ans=0.1 2023-12-23 00:27:22,535 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.35 vs. limit=10.0 2023-12-23 00:27:27,349 INFO [train.py:886] (1/4) Epoch 28, batch 200, loss[loss=0.0127, audio_tagging_loss=0.0127, over 25000.00 frames. ], tot_loss[loss=0.01518, audio_tagging_loss=0.01518, over 3152175.45 frames. ], batch size: 100, lr: 3.89e-03, grad_scale: 32.0 2023-12-23 00:28:05,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=859413.3333333334, ans=0.0 2023-12-23 00:28:17,873 INFO [train.py:886] (1/4) Epoch 28, batch 250, loss[loss=0.01569, audio_tagging_loss=0.01569, over 24948.00 frames. ], tot_loss[loss=0.01455, audio_tagging_loss=0.01455, over 3554445.84 frames. ], batch size: 100, lr: 3.89e-03, grad_scale: 32.0 2023-12-23 00:28:26,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=859546.6666666666, ans=0.125 2023-12-23 00:28:26,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=859546.6666666666, ans=0.125 2023-12-23 00:28:27,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=859613.3333333334, ans=0.125 2023-12-23 00:28:39,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=859680.0, ans=0.125 2023-12-23 00:28:39,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.51 vs. limit=15.0 2023-12-23 00:28:43,837 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.923e+01 3.198e+01 3.303e+01 3.432e+01 3.896e+01, threshold=6.605e+01, percent-clipped=0.0 2023-12-23 00:28:51,205 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.59 vs. limit=15.0 2023-12-23 00:28:56,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=859746.6666666666, ans=0.0 2023-12-23 00:29:08,729 INFO [train.py:886] (1/4) Epoch 28, batch 300, loss[loss=0.01384, audio_tagging_loss=0.01384, over 24750.00 frames. ], tot_loss[loss=0.0143, audio_tagging_loss=0.0143, over 3856943.70 frames. ], batch size: 99, lr: 3.89e-03, grad_scale: 32.0 2023-12-23 00:29:08,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=859880.0, ans=0.125 2023-12-23 00:29:21,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=859946.6666666666, ans=0.0 2023-12-23 00:29:26,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=859946.6666666666, ans=0.0 2023-12-23 00:29:27,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=859946.6666666666, ans=0.0 2023-12-23 00:29:35,740 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.84 vs. limit=15.0 2023-12-23 00:29:36,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=860013.3333333334, ans=0.125 2023-12-23 00:29:59,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=860146.6666666666, ans=0.1 2023-12-23 00:30:01,099 INFO [train.py:886] (1/4) Epoch 28, batch 350, loss[loss=0.01255, audio_tagging_loss=0.01255, over 25000.00 frames. ], tot_loss[loss=0.01406, audio_tagging_loss=0.01406, over 4097075.10 frames. ], batch size: 100, lr: 3.89e-03, grad_scale: 32.0 2023-12-23 00:30:07,893 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.69 vs. limit=10.0 2023-12-23 00:30:08,060 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.20 vs. limit=15.0 2023-12-23 00:30:26,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=860346.6666666666, ans=0.1 2023-12-23 00:30:28,421 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.837e+01 3.152e+01 3.318e+01 3.475e+01 4.174e+01, threshold=6.637e+01, percent-clipped=0.0 2023-12-23 00:30:37,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.96 vs. limit=15.0 2023-12-23 00:30:37,473 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.49 vs. limit=15.0 2023-12-23 00:30:39,957 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.25 vs. limit=10.0 2023-12-23 00:30:41,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=860480.0, ans=15.0 2023-12-23 00:30:45,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=860480.0, ans=0.2 2023-12-23 00:30:52,542 INFO [train.py:886] (1/4) Epoch 28, batch 400, loss[loss=0.01085, audio_tagging_loss=0.01085, over 25000.00 frames. ], tot_loss[loss=0.01366, audio_tagging_loss=0.01366, over 4280500.33 frames. ], batch size: 100, lr: 3.89e-03, grad_scale: 32.0 2023-12-23 00:30:52,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=860546.6666666666, ans=0.125 2023-12-23 00:31:01,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=860546.6666666666, ans=0.1 2023-12-23 00:31:06,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=860613.3333333334, ans=0.125 2023-12-23 00:31:17,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=860680.0, ans=0.125 2023-12-23 00:31:18,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=860680.0, ans=0.125 2023-12-23 00:31:21,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=860680.0, ans=0.2 2023-12-23 00:31:21,964 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=15.0 2023-12-23 00:31:31,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=860746.6666666666, ans=0.125 2023-12-23 00:31:35,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=860813.3333333334, ans=0.1 2023-12-23 00:31:44,311 INFO [train.py:886] (1/4) Epoch 28, batch 450, loss[loss=0.01189, audio_tagging_loss=0.01189, over 24750.00 frames. ], tot_loss[loss=0.01337, audio_tagging_loss=0.01337, over 4431303.62 frames. ], batch size: 99, lr: 3.89e-03, grad_scale: 32.0 2023-12-23 00:31:55,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=860946.6666666666, ans=0.125 2023-12-23 00:32:11,566 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.702e+01 3.063e+01 3.224e+01 3.410e+01 3.950e+01, threshold=6.447e+01, percent-clipped=0.0 2023-12-23 00:32:20,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=861080.0, ans=0.0 2023-12-23 00:32:20,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=861080.0, ans=0.2 2023-12-23 00:32:29,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=861146.6666666666, ans=0.0 2023-12-23 00:32:37,191 INFO [train.py:886] (1/4) Epoch 28, batch 500, loss[loss=0.01562, audio_tagging_loss=0.01562, over 25000.00 frames. ], tot_loss[loss=0.01311, audio_tagging_loss=0.01311, over 4550813.38 frames. ], batch size: 100, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:32:45,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=861213.3333333334, ans=0.1 2023-12-23 00:32:51,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=861280.0, ans=0.95 2023-12-23 00:33:05,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=861346.6666666666, ans=0.0 2023-12-23 00:33:23,269 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.65 vs. limit=22.5 2023-12-23 00:33:27,715 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.54 vs. limit=15.0 2023-12-23 00:33:28,210 INFO [train.py:886] (1/4) Epoch 28, batch 550, loss[loss=0.01462, audio_tagging_loss=0.01462, over 25000.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 4636047.09 frames. ], batch size: 100, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:33:40,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=861613.3333333334, ans=0.2 2023-12-23 00:33:54,957 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.816e+01 3.118e+01 3.304e+01 3.442e+01 3.885e+01, threshold=6.607e+01, percent-clipped=0.0 2023-12-23 00:33:58,208 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.79 vs. limit=15.0 2023-12-23 00:34:02,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=861746.6666666666, ans=0.125 2023-12-23 00:34:20,602 INFO [train.py:886] (1/4) Epoch 28, batch 600, loss[loss=0.0123, audio_tagging_loss=0.0123, over 24947.00 frames. ], tot_loss[loss=0.01308, audio_tagging_loss=0.01308, over 4709838.80 frames. ], batch size: 100, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:34:24,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=861880.0, ans=0.125 2023-12-23 00:34:25,006 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.05 vs. limit=22.5 2023-12-23 00:34:31,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=861946.6666666666, ans=0.125 2023-12-23 00:34:39,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=861946.6666666666, ans=0.2 2023-12-23 00:35:11,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=862213.3333333334, ans=0.125 2023-12-23 00:35:12,858 INFO [train.py:886] (1/4) Epoch 28, batch 650, loss[loss=0.01162, audio_tagging_loss=0.01162, over 24750.00 frames. ], tot_loss[loss=0.01316, audio_tagging_loss=0.01316, over 4759399.72 frames. ], batch size: 99, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:35:20,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=862213.3333333334, ans=0.2 2023-12-23 00:35:29,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=862280.0, ans=0.0 2023-12-23 00:35:30,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=862280.0, ans=0.125 2023-12-23 00:35:38,971 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.866e+01 3.196e+01 3.336e+01 3.480e+01 3.806e+01, threshold=6.671e+01, percent-clipped=0.0 2023-12-23 00:35:43,110 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.16 vs. limit=12.0 2023-12-23 00:35:48,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=862413.3333333334, ans=0.0 2023-12-23 00:36:01,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=862480.0, ans=0.0 2023-12-23 00:36:03,743 INFO [train.py:886] (1/4) Epoch 28, batch 700, loss[loss=0.01212, audio_tagging_loss=0.01212, over 24750.00 frames. ], tot_loss[loss=0.01311, audio_tagging_loss=0.01311, over 4801637.37 frames. ], batch size: 99, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:36:18,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=862613.3333333334, ans=0.125 2023-12-23 00:36:24,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=862680.0, ans=0.0 2023-12-23 00:36:24,752 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.85 vs. limit=10.0 2023-12-23 00:36:55,336 INFO [train.py:886] (1/4) Epoch 28, batch 750, loss[loss=0.01163, audio_tagging_loss=0.01163, over 24750.00 frames. ], tot_loss[loss=0.01302, audio_tagging_loss=0.01302, over 4836596.10 frames. ], batch size: 99, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:37:00,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=862880.0, ans=0.0 2023-12-23 00:37:02,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=862880.0, ans=0.1 2023-12-23 00:37:18,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=863013.3333333334, ans=0.0 2023-12-23 00:37:19,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=863013.3333333334, ans=0.125 2023-12-23 00:37:23,190 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.734e+01 3.104e+01 3.286e+01 3.429e+01 3.776e+01, threshold=6.573e+01, percent-clipped=0.0 2023-12-23 00:37:29,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=863080.0, ans=0.125 2023-12-23 00:37:29,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=863080.0, ans=0.125 2023-12-23 00:37:45,967 INFO [train.py:886] (1/4) Epoch 28, batch 800, loss[loss=0.01214, audio_tagging_loss=0.01214, over 25000.00 frames. ], tot_loss[loss=0.01299, audio_tagging_loss=0.01299, over 4866530.73 frames. ], batch size: 100, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:37:46,469 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.21 vs. limit=15.0 2023-12-23 00:38:17,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=863413.3333333334, ans=0.07 2023-12-23 00:38:19,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=863413.3333333334, ans=0.5 2023-12-23 00:38:33,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=863480.0, ans=0.1 2023-12-23 00:38:39,116 INFO [train.py:886] (1/4) Epoch 28, batch 850, loss[loss=0.01285, audio_tagging_loss=0.01285, over 25000.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4888450.59 frames. ], batch size: 100, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:38:55,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=863613.3333333334, ans=0.0 2023-12-23 00:39:06,379 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.957e+01 3.161e+01 3.315e+01 3.496e+01 3.961e+01, threshold=6.630e+01, percent-clipped=0.0 2023-12-23 00:39:13,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=863746.6666666666, ans=0.125 2023-12-23 00:39:31,542 INFO [train.py:886] (1/4) Epoch 28, batch 900, loss[loss=0.01257, audio_tagging_loss=0.01257, over 25000.00 frames. ], tot_loss[loss=0.01294, audio_tagging_loss=0.01294, over 4907073.88 frames. ], batch size: 100, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:39:37,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=863880.0, ans=0.125 2023-12-23 00:39:39,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=863880.0, ans=0.125 2023-12-23 00:39:51,716 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.23 vs. limit=22.5 2023-12-23 00:39:59,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=864013.3333333334, ans=0.125 2023-12-23 00:40:09,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=864080.0, ans=0.125 2023-12-23 00:40:13,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=864146.6666666666, ans=0.035 2023-12-23 00:40:21,516 INFO [train.py:886] (1/4) Epoch 28, batch 950, loss[loss=0.01382, audio_tagging_loss=0.01382, over 24750.00 frames. ], tot_loss[loss=0.01298, audio_tagging_loss=0.01298, over 4915335.61 frames. ], batch size: 99, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:40:32,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=864280.0, ans=0.0 2023-12-23 00:40:33,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=864280.0, ans=10.0 2023-12-23 00:40:38,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=864280.0, ans=0.125 2023-12-23 00:40:40,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=864280.0, ans=0.125 2023-12-23 00:40:47,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=864346.6666666666, ans=0.125 2023-12-23 00:40:48,395 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.699e+01 3.226e+01 3.347e+01 3.519e+01 4.208e+01, threshold=6.694e+01, percent-clipped=0.0 2023-12-23 00:40:50,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=864346.6666666666, ans=0.125 2023-12-23 00:40:52,689 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2023-12-23 00:40:57,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=864413.3333333334, ans=0.125 2023-12-23 00:41:10,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=864480.0, ans=0.07 2023-12-23 00:41:14,088 INFO [train.py:886] (1/4) Epoch 28, batch 1000, loss[loss=0.01395, audio_tagging_loss=0.01395, over 22117.00 frames. ], tot_loss[loss=0.01295, audio_tagging_loss=0.01295, over 4913209.02 frames. ], batch size: 107, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:41:20,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=864546.6666666666, ans=10.0 2023-12-23 00:41:46,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=864746.6666666666, ans=0.2 2023-12-23 00:41:57,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=864813.3333333334, ans=0.0 2023-12-23 00:41:59,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=864813.3333333334, ans=0.0 2023-12-23 00:42:05,071 INFO [train.py:886] (1/4) Epoch 28, batch 1050, loss[loss=0.01477, audio_tagging_loss=0.01477, over 25000.00 frames. ], tot_loss[loss=0.01289, audio_tagging_loss=0.01289, over 4918547.47 frames. ], batch size: 100, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:42:26,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=865013.3333333334, ans=0.04949747468305833 2023-12-23 00:42:30,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=865013.3333333334, ans=0.125 2023-12-23 00:42:31,251 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.791e+01 3.118e+01 3.211e+01 3.407e+01 4.036e+01, threshold=6.422e+01, percent-clipped=0.0 2023-12-23 00:42:40,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=865080.0, ans=0.0 2023-12-23 00:42:54,666 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.23 vs. limit=22.5 2023-12-23 00:42:56,479 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.74 vs. limit=15.0 2023-12-23 00:42:56,963 INFO [train.py:886] (1/4) Epoch 28, batch 1100, loss[loss=0.01162, audio_tagging_loss=0.01162, over 25000.00 frames. ], tot_loss[loss=0.01283, audio_tagging_loss=0.01283, over 4927089.28 frames. ], batch size: 100, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:42:57,573 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.38 vs. limit=6.0 2023-12-23 00:43:19,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=865346.6666666666, ans=0.1 2023-12-23 00:43:48,809 INFO [train.py:886] (1/4) Epoch 28, batch 1150, loss[loss=0.01311, audio_tagging_loss=0.01311, over 25000.00 frames. ], tot_loss[loss=0.01274, audio_tagging_loss=0.01274, over 4932926.83 frames. ], batch size: 100, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:43:56,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=865546.6666666666, ans=0.0 2023-12-23 00:43:57,633 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.45 vs. limit=15.0 2023-12-23 00:44:02,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=865613.3333333334, ans=0.1 2023-12-23 00:44:04,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=865613.3333333334, ans=0.125 2023-12-23 00:44:08,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=865680.0, ans=0.125 2023-12-23 00:44:14,776 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.695e+01 3.161e+01 3.263e+01 3.387e+01 3.814e+01, threshold=6.527e+01, percent-clipped=0.0 2023-12-23 00:44:38,946 INFO [train.py:886] (1/4) Epoch 28, batch 1200, loss[loss=0.0132, audio_tagging_loss=0.0132, over 25000.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4934800.06 frames. ], batch size: 100, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:44:41,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=865880.0, ans=0.125 2023-12-23 00:44:51,981 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.36 vs. limit=12.0 2023-12-23 00:45:30,755 INFO [train.py:886] (1/4) Epoch 28, batch 1250, loss[loss=0.01341, audio_tagging_loss=0.01341, over 24750.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4936311.52 frames. ], batch size: 99, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:45:31,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=866213.3333333334, ans=0.95 2023-12-23 00:45:56,743 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.732e+01 3.240e+01 3.398e+01 3.546e+01 3.975e+01, threshold=6.796e+01, percent-clipped=0.0 2023-12-23 00:45:57,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=866346.6666666666, ans=0.1 2023-12-23 00:45:57,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=866346.6666666666, ans=0.125 2023-12-23 00:46:10,384 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2023-12-23 00:46:21,501 INFO [train.py:886] (1/4) Epoch 28, batch 1300, loss[loss=0.01151, audio_tagging_loss=0.01151, over 24750.00 frames. ], tot_loss[loss=0.01285, audio_tagging_loss=0.01285, over 4938396.75 frames. ], batch size: 99, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:46:21,966 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.97 vs. limit=15.0 2023-12-23 00:46:32,664 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 00:46:35,518 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.52 vs. limit=12.0 2023-12-23 00:46:38,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=866613.3333333334, ans=0.04949747468305833 2023-12-23 00:46:46,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=866680.0, ans=0.0 2023-12-23 00:46:48,037 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.08 vs. limit=15.0 2023-12-23 00:47:12,303 INFO [train.py:886] (1/4) Epoch 28, batch 1350, loss[loss=0.009836, audio_tagging_loss=0.009836, over 25000.00 frames. ], tot_loss[loss=0.01281, audio_tagging_loss=0.01281, over 4939193.44 frames. ], batch size: 100, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:47:13,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=866880.0, ans=0.125 2023-12-23 00:47:15,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=866880.0, ans=0.125 2023-12-23 00:47:22,126 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.62 vs. limit=15.0 2023-12-23 00:47:29,038 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 00:47:31,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=867013.3333333334, ans=0.1 2023-12-23 00:47:31,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=867013.3333333334, ans=0.0 2023-12-23 00:47:33,205 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2023-12-23 00:47:36,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=867013.3333333334, ans=0.125 2023-12-23 00:47:39,127 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.836e+01 3.136e+01 3.279e+01 3.497e+01 4.059e+01, threshold=6.558e+01, percent-clipped=0.0 2023-12-23 00:47:40,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=867013.3333333334, ans=0.125 2023-12-23 00:47:54,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=867146.6666666666, ans=0.0 2023-12-23 00:47:56,449 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.82 vs. limit=15.0 2023-12-23 00:47:58,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=867146.6666666666, ans=0.0 2023-12-23 00:47:59,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=867146.6666666666, ans=0.1 2023-12-23 00:48:02,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=867213.3333333334, ans=0.0 2023-12-23 00:48:02,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=867213.3333333334, ans=0.0 2023-12-23 00:48:03,365 INFO [train.py:886] (1/4) Epoch 28, batch 1400, loss[loss=0.01019, audio_tagging_loss=0.01019, over 25000.00 frames. ], tot_loss[loss=0.01281, audio_tagging_loss=0.01281, over 4947156.52 frames. ], batch size: 100, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:48:09,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=867213.3333333334, ans=0.125 2023-12-23 00:48:29,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=867346.6666666666, ans=0.1 2023-12-23 00:48:29,894 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.68 vs. limit=15.0 2023-12-23 00:48:40,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=867413.3333333334, ans=0.0 2023-12-23 00:48:54,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=867546.6666666666, ans=0.125 2023-12-23 00:48:54,806 INFO [train.py:886] (1/4) Epoch 28, batch 1450, loss[loss=0.01161, audio_tagging_loss=0.01161, over 22296.00 frames. ], tot_loss[loss=0.01275, audio_tagging_loss=0.01275, over 4946749.76 frames. ], batch size: 107, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:49:06,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=867613.3333333334, ans=0.125 2023-12-23 00:49:22,063 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.828e+01 3.154e+01 3.288e+01 3.472e+01 3.862e+01, threshold=6.577e+01, percent-clipped=0.0 2023-12-23 00:49:26,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=867746.6666666666, ans=0.2 2023-12-23 00:49:28,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=867746.6666666666, ans=0.0 2023-12-23 00:49:42,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=867813.3333333334, ans=0.125 2023-12-23 00:49:46,854 INFO [train.py:886] (1/4) Epoch 28, batch 1500, loss[loss=0.01111, audio_tagging_loss=0.01111, over 25000.00 frames. ], tot_loss[loss=0.01272, audio_tagging_loss=0.01272, over 4953848.01 frames. ], batch size: 100, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:49:53,622 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.05 vs. limit=15.0 2023-12-23 00:49:56,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=867946.6666666666, ans=0.04949747468305833 2023-12-23 00:49:59,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=867946.6666666666, ans=0.0 2023-12-23 00:50:01,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=867946.6666666666, ans=0.125 2023-12-23 00:50:23,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=868080.0, ans=0.0 2023-12-23 00:50:29,295 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2023-12-23 00:50:39,577 INFO [train.py:886] (1/4) Epoch 28, batch 1550, loss[loss=0.01322, audio_tagging_loss=0.01322, over 24750.00 frames. ], tot_loss[loss=0.01286, audio_tagging_loss=0.01286, over 4949603.22 frames. ], batch size: 99, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:50:58,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=868346.6666666666, ans=0.05 2023-12-23 00:51:04,679 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.893e+01 3.216e+01 3.352e+01 3.508e+01 3.923e+01, threshold=6.705e+01, percent-clipped=0.0 2023-12-23 00:51:17,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=868413.3333333334, ans=0.1 2023-12-23 00:51:22,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=868480.0, ans=0.0 2023-12-23 00:51:29,617 INFO [train.py:886] (1/4) Epoch 28, batch 1600, loss[loss=0.01381, audio_tagging_loss=0.01381, over 25000.00 frames. ], tot_loss[loss=0.01286, audio_tagging_loss=0.01286, over 4941505.57 frames. ], batch size: 100, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:51:45,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=868613.3333333334, ans=0.0 2023-12-23 00:52:02,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=868746.6666666666, ans=0.2 2023-12-23 00:52:17,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=868813.3333333334, ans=0.125 2023-12-23 00:52:20,942 INFO [train.py:886] (1/4) Epoch 28, batch 1650, loss[loss=0.01109, audio_tagging_loss=0.01109, over 24750.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4940629.57 frames. ], batch size: 99, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:52:40,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=869013.3333333334, ans=0.0 2023-12-23 00:52:47,486 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.836e+01 3.157e+01 3.317e+01 3.477e+01 3.959e+01, threshold=6.634e+01, percent-clipped=0.0 2023-12-23 00:52:58,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=869080.0, ans=0.125 2023-12-23 00:53:01,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=869146.6666666666, ans=0.04949747468305833 2023-12-23 00:53:02,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=869146.6666666666, ans=0.125 2023-12-23 00:53:11,752 INFO [train.py:886] (1/4) Epoch 28, batch 1700, loss[loss=0.01062, audio_tagging_loss=0.01062, over 25000.00 frames. ], tot_loss[loss=0.01275, audio_tagging_loss=0.01275, over 4943544.89 frames. ], batch size: 100, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:53:22,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=869280.0, ans=0.05 2023-12-23 00:53:30,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=869346.6666666666, ans=0.0 2023-12-23 00:53:35,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=869346.6666666666, ans=0.125 2023-12-23 00:53:38,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=869346.6666666666, ans=0.125 2023-12-23 00:53:41,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=869413.3333333334, ans=0.125 2023-12-23 00:53:49,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=869413.3333333334, ans=0.2 2023-12-23 00:54:02,270 INFO [train.py:886] (1/4) Epoch 28, batch 1750, loss[loss=0.01473, audio_tagging_loss=0.01473, over 25000.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 4946837.32 frames. ], batch size: 100, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:54:04,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=869546.6666666666, ans=0.0 2023-12-23 00:54:08,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=869546.6666666666, ans=0.0 2023-12-23 00:54:11,983 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.87 vs. limit=12.0 2023-12-23 00:54:24,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=869680.0, ans=0.125 2023-12-23 00:54:24,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=869680.0, ans=0.125 2023-12-23 00:54:28,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=869680.0, ans=0.0 2023-12-23 00:54:29,677 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.893e+01 3.178e+01 3.276e+01 3.403e+01 3.949e+01, threshold=6.552e+01, percent-clipped=0.0 2023-12-23 00:54:31,104 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.63 vs. limit=15.0 2023-12-23 00:54:32,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=869746.6666666666, ans=0.0 2023-12-23 00:54:35,186 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2023-12-23 00:54:53,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=869880.0, ans=0.125 2023-12-23 00:54:54,484 INFO [train.py:886] (1/4) Epoch 28, batch 1800, loss[loss=0.01271, audio_tagging_loss=0.01271, over 24750.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4952500.85 frames. ], batch size: 99, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:55:30,820 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 00:55:43,838 INFO [train.py:886] (1/4) Epoch 28, batch 1850, loss[loss=0.01395, audio_tagging_loss=0.01395, over 24750.00 frames. ], tot_loss[loss=0.01289, audio_tagging_loss=0.01289, over 4956701.30 frames. ], batch size: 99, lr: 3.86e-03, grad_scale: 32.0 2023-12-23 00:55:58,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=870280.0, ans=0.125 2023-12-23 00:56:03,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=870280.0, ans=0.125 2023-12-23 00:56:03,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=870280.0, ans=0.1 2023-12-23 00:56:11,270 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.825e+01 3.215e+01 3.384e+01 3.493e+01 4.271e+01, threshold=6.769e+01, percent-clipped=0.0 2023-12-23 00:56:16,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=870413.3333333334, ans=0.0 2023-12-23 00:56:36,128 INFO [train.py:886] (1/4) Epoch 28, batch 1900, loss[loss=0.01211, audio_tagging_loss=0.01211, over 25000.00 frames. ], tot_loss[loss=0.01295, audio_tagging_loss=0.01295, over 4951213.81 frames. ], batch size: 100, lr: 3.86e-03, grad_scale: 32.0 2023-12-23 00:56:41,272 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.73 vs. limit=22.5 2023-12-23 00:56:47,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=870613.3333333334, ans=0.125 2023-12-23 00:56:48,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=870613.3333333334, ans=0.2 2023-12-23 00:56:49,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=870613.3333333334, ans=0.125 2023-12-23 00:56:52,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=870613.3333333334, ans=0.1 2023-12-23 00:57:03,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=870680.0, ans=0.04949747468305833 2023-12-23 00:57:27,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=870813.3333333334, ans=0.125 2023-12-23 00:57:28,666 INFO [train.py:886] (1/4) Epoch 28, batch 1950, loss[loss=0.01279, audio_tagging_loss=0.01279, over 24750.00 frames. ], tot_loss[loss=0.01288, audio_tagging_loss=0.01288, over 4949806.59 frames. ], batch size: 99, lr: 3.86e-03, grad_scale: 32.0 2023-12-23 00:57:50,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=871013.3333333334, ans=0.125 2023-12-23 00:57:53,804 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.813e+01 3.143e+01 3.252e+01 3.432e+01 3.799e+01, threshold=6.504e+01, percent-clipped=0.0 2023-12-23 00:58:10,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=871146.6666666666, ans=0.0 2023-12-23 00:58:14,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.12 vs. limit=10.0 2023-12-23 00:58:15,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=871146.6666666666, ans=10.0 2023-12-23 00:58:18,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=871213.3333333334, ans=0.0 2023-12-23 00:58:19,551 INFO [train.py:886] (1/4) Epoch 28, batch 2000, loss[loss=0.01342, audio_tagging_loss=0.01342, over 25000.00 frames. ], tot_loss[loss=0.01278, audio_tagging_loss=0.01278, over 4947446.10 frames. ], batch size: 100, lr: 3.86e-03, grad_scale: 64.0 2023-12-23 00:58:33,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=871280.0, ans=0.0 2023-12-23 00:58:35,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=871280.0, ans=0.0 2023-12-23 00:58:55,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=871413.3333333334, ans=0.125 2023-12-23 00:59:02,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=871480.0, ans=0.0 2023-12-23 00:59:10,243 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.58 vs. limit=15.0 2023-12-23 00:59:11,805 INFO [train.py:886] (1/4) Epoch 28, batch 2050, loss[loss=0.01081, audio_tagging_loss=0.01081, over 24076.00 frames. ], tot_loss[loss=0.01268, audio_tagging_loss=0.01268, over 4950009.19 frames. ], batch size: 100, lr: 3.86e-03, grad_scale: 64.0 2023-12-23 00:59:12,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=871546.6666666666, ans=0.0 2023-12-23 00:59:24,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=871613.3333333334, ans=0.07 2023-12-23 00:59:38,688 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.794e+01 3.157e+01 3.303e+01 3.497e+01 3.860e+01, threshold=6.607e+01, percent-clipped=0.0 2023-12-23 00:59:40,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=871680.0, ans=0.125 2023-12-23 00:59:42,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=871746.6666666666, ans=0.125 2023-12-23 00:59:46,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=871746.6666666666, ans=0.0 2023-12-23 00:59:50,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.51 vs. limit=15.0 2023-12-23 01:00:02,003 INFO [train.py:886] (1/4) Epoch 28, batch 2100, loss[loss=0.01234, audio_tagging_loss=0.01234, over 25000.00 frames. ], tot_loss[loss=0.01274, audio_tagging_loss=0.01274, over 4957829.33 frames. ], batch size: 100, lr: 3.86e-03, grad_scale: 64.0 2023-12-23 01:00:02,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=871880.0, ans=0.2 2023-12-23 01:00:07,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=871880.0, ans=0.125 2023-12-23 01:00:09,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=871880.0, ans=0.0 2023-12-23 01:00:14,535 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.36 vs. limit=22.5 2023-12-23 01:00:16,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=871946.6666666666, ans=0.2 2023-12-23 01:00:17,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=871946.6666666666, ans=0.035 2023-12-23 01:00:19,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=871946.6666666666, ans=0.2 2023-12-23 01:00:21,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=871946.6666666666, ans=0.1 2023-12-23 01:00:23,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=872013.3333333334, ans=0.125 2023-12-23 01:00:54,490 INFO [train.py:886] (1/4) Epoch 28, batch 2150, loss[loss=0.01717, audio_tagging_loss=0.01717, over 24943.00 frames. ], tot_loss[loss=0.01289, audio_tagging_loss=0.01289, over 4958310.58 frames. ], batch size: 100, lr: 3.86e-03, grad_scale: 64.0 2023-12-23 01:01:21,867 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.753e+01 3.234e+01 3.354e+01 3.492e+01 4.264e+01, threshold=6.708e+01, percent-clipped=0.0 2023-12-23 01:01:38,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=872480.0, ans=0.1 2023-12-23 01:01:44,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=872480.0, ans=0.0 2023-12-23 01:01:46,231 INFO [train.py:886] (1/4) Epoch 28, batch 2200, loss[loss=0.01325, audio_tagging_loss=0.01325, over 24750.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4954226.50 frames. ], batch size: 99, lr: 3.86e-03, grad_scale: 64.0 2023-12-23 01:01:52,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=872546.6666666666, ans=0.1 2023-12-23 01:01:59,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=872613.3333333334, ans=0.125 2023-12-23 01:02:11,834 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.50 vs. limit=15.0 2023-12-23 01:02:21,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=872746.6666666666, ans=0.125 2023-12-23 01:02:29,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=872813.3333333334, ans=0.0 2023-12-23 01:02:33,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=872813.3333333334, ans=0.2 2023-12-23 01:02:34,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=872813.3333333334, ans=0.0 2023-12-23 01:02:38,066 INFO [train.py:886] (1/4) Epoch 28, batch 2250, loss[loss=0.01493, audio_tagging_loss=0.01493, over 22306.00 frames. ], tot_loss[loss=0.01296, audio_tagging_loss=0.01296, over 4942938.48 frames. ], batch size: 107, lr: 3.86e-03, grad_scale: 64.0 2023-12-23 01:02:39,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=872880.0, ans=0.1 2023-12-23 01:02:43,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=872880.0, ans=0.07 2023-12-23 01:02:51,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=872946.6666666666, ans=0.1 2023-12-23 01:02:54,482 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.25 vs. limit=15.0 2023-12-23 01:03:04,779 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.837e+01 3.155e+01 3.335e+01 3.468e+01 4.219e+01, threshold=6.670e+01, percent-clipped=0.0 2023-12-23 01:03:12,016 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.67 vs. limit=12.0 2023-12-23 01:03:14,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=873080.0, ans=0.0 2023-12-23 01:03:17,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=873080.0, ans=10.0 2023-12-23 01:03:30,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=873213.3333333334, ans=0.2 2023-12-23 01:03:30,848 INFO [train.py:886] (1/4) Epoch 28, batch 2300, loss[loss=0.01324, audio_tagging_loss=0.01324, over 25000.00 frames. ], tot_loss[loss=0.01287, audio_tagging_loss=0.01287, over 4948095.19 frames. ], batch size: 100, lr: 3.86e-03, grad_scale: 64.0 2023-12-23 01:03:32,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=873213.3333333334, ans=0.125 2023-12-23 01:03:39,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=873280.0, ans=0.2 2023-12-23 01:03:49,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=873280.0, ans=0.125 2023-12-23 01:04:15,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=873480.0, ans=0.0 2023-12-23 01:04:22,667 INFO [train.py:886] (1/4) Epoch 28, batch 2350, loss[loss=0.01453, audio_tagging_loss=0.01453, over 24750.00 frames. ], tot_loss[loss=0.01285, audio_tagging_loss=0.01285, over 4946482.82 frames. ], batch size: 99, lr: 3.86e-03, grad_scale: 64.0 2023-12-23 01:04:31,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=873546.6666666666, ans=0.2 2023-12-23 01:04:48,803 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.776e+01 3.130e+01 3.254e+01 3.391e+01 3.968e+01, threshold=6.508e+01, percent-clipped=0.0 2023-12-23 01:05:05,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=873813.3333333334, ans=0.2 2023-12-23 01:05:08,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=873813.3333333334, ans=0.0 2023-12-23 01:05:09,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=873813.3333333334, ans=0.1 2023-12-23 01:05:13,637 INFO [train.py:886] (1/4) Epoch 28, batch 2400, loss[loss=0.01261, audio_tagging_loss=0.01261, over 25000.00 frames. ], tot_loss[loss=0.01275, audio_tagging_loss=0.01275, over 4947559.06 frames. ], batch size: 100, lr: 3.86e-03, grad_scale: 64.0 2023-12-23 01:05:20,040 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.49 vs. limit=15.0 2023-12-23 01:05:22,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=873946.6666666666, ans=0.125 2023-12-23 01:05:22,901 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.90 vs. limit=15.0 2023-12-23 01:05:24,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=873946.6666666666, ans=0.0 2023-12-23 01:05:44,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=874080.0, ans=0.1 2023-12-23 01:05:51,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=874080.0, ans=0.125 2023-12-23 01:06:01,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=874146.6666666666, ans=0.125 2023-12-23 01:06:03,943 INFO [train.py:886] (1/4) Epoch 28, batch 2450, loss[loss=0.01479, audio_tagging_loss=0.01479, over 25000.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4947326.82 frames. ], batch size: 100, lr: 3.86e-03, grad_scale: 64.0 2023-12-23 01:06:06,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.56 vs. limit=12.0 2023-12-23 01:06:20,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=874280.0, ans=0.125 2023-12-23 01:06:25,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=874346.6666666666, ans=0.0 2023-12-23 01:06:31,241 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.962e+01 3.185e+01 3.322e+01 3.528e+01 4.944e+01, threshold=6.645e+01, percent-clipped=0.0 2023-12-23 01:06:43,253 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.73 vs. limit=15.0 2023-12-23 01:06:51,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=874480.0, ans=0.125 2023-12-23 01:06:52,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=874480.0, ans=0.05 2023-12-23 01:06:55,573 INFO [train.py:886] (1/4) Epoch 28, batch 2500, loss[loss=0.01289, audio_tagging_loss=0.01289, over 24750.00 frames. ], tot_loss[loss=0.01281, audio_tagging_loss=0.01281, over 4946823.67 frames. ], batch size: 99, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:06:55,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=874546.6666666666, ans=0.035 2023-12-23 01:06:57,995 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.46 vs. limit=15.0 2023-12-23 01:07:14,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=874613.3333333334, ans=0.0 2023-12-23 01:07:20,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=874680.0, ans=0.0 2023-12-23 01:07:20,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=874680.0, ans=0.125 2023-12-23 01:07:24,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=874680.0, ans=0.05 2023-12-23 01:07:25,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=874746.6666666666, ans=0.1 2023-12-23 01:07:27,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=874746.6666666666, ans=0.125 2023-12-23 01:07:32,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=874746.6666666666, ans=0.125 2023-12-23 01:07:34,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=874746.6666666666, ans=0.09899494936611666 2023-12-23 01:07:38,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=874813.3333333334, ans=0.125 2023-12-23 01:07:46,632 INFO [train.py:886] (1/4) Epoch 28, batch 2550, loss[loss=0.01369, audio_tagging_loss=0.01369, over 24750.00 frames. ], tot_loss[loss=0.01292, audio_tagging_loss=0.01292, over 4944415.98 frames. ], batch size: 99, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:07:47,209 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.53 vs. limit=15.0 2023-12-23 01:07:51,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=874880.0, ans=0.07 2023-12-23 01:07:56,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=874946.6666666666, ans=0.125 2023-12-23 01:08:06,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=874946.6666666666, ans=0.1 2023-12-23 01:08:13,718 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.39 vs. limit=15.0 2023-12-23 01:08:14,111 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.835e+01 3.213e+01 3.387e+01 3.518e+01 3.975e+01, threshold=6.773e+01, percent-clipped=0.0 2023-12-23 01:08:18,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=875080.0, ans=0.1 2023-12-23 01:08:26,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=875080.0, ans=0.2 2023-12-23 01:08:38,591 INFO [train.py:886] (1/4) Epoch 28, batch 2600, loss[loss=0.01542, audio_tagging_loss=0.01542, over 25000.00 frames. ], tot_loss[loss=0.0128, audio_tagging_loss=0.0128, over 4944775.72 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:08:44,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.71 vs. limit=10.0 2023-12-23 01:08:52,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=875280.0, ans=0.125 2023-12-23 01:09:06,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=875346.6666666666, ans=0.1 2023-12-23 01:09:12,194 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.92 vs. limit=10.0 2023-12-23 01:09:18,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=875413.3333333334, ans=0.0 2023-12-23 01:09:30,271 INFO [train.py:886] (1/4) Epoch 28, batch 2650, loss[loss=0.01526, audio_tagging_loss=0.01526, over 25000.00 frames. ], tot_loss[loss=0.01271, audio_tagging_loss=0.01271, over 4952963.47 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:09:52,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=875680.0, ans=0.0 2023-12-23 01:09:56,414 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.754e+01 3.093e+01 3.250e+01 3.446e+01 3.869e+01, threshold=6.500e+01, percent-clipped=0.0 2023-12-23 01:10:04,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=875746.6666666666, ans=0.2 2023-12-23 01:10:21,859 INFO [train.py:886] (1/4) Epoch 28, batch 2700, loss[loss=0.01427, audio_tagging_loss=0.01427, over 25000.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4953469.94 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:10:37,390 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.77 vs. limit=10.0 2023-12-23 01:11:03,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=876146.6666666666, ans=0.2 2023-12-23 01:11:10,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=876146.6666666666, ans=0.1 2023-12-23 01:11:12,604 INFO [train.py:886] (1/4) Epoch 28, batch 2750, loss[loss=0.01192, audio_tagging_loss=0.01192, over 25000.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4957475.45 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:11:19,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=876213.3333333334, ans=0.2 2023-12-23 01:11:38,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=876346.6666666666, ans=0.125 2023-12-23 01:11:39,282 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.779e+01 3.198e+01 3.346e+01 3.453e+01 3.797e+01, threshold=6.692e+01, percent-clipped=0.0 2023-12-23 01:12:04,153 INFO [train.py:886] (1/4) Epoch 28, batch 2800, loss[loss=0.01424, audio_tagging_loss=0.01424, over 24750.00 frames. ], tot_loss[loss=0.01267, audio_tagging_loss=0.01267, over 4954704.03 frames. ], batch size: 99, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:12:14,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=876613.3333333334, ans=0.125 2023-12-23 01:12:23,645 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.69 vs. limit=12.0 2023-12-23 01:12:29,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=876680.0, ans=0.0 2023-12-23 01:12:46,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=876813.3333333334, ans=0.0 2023-12-23 01:12:56,768 INFO [train.py:886] (1/4) Epoch 28, batch 2850, loss[loss=0.01239, audio_tagging_loss=0.01239, over 25000.00 frames. ], tot_loss[loss=0.01276, audio_tagging_loss=0.01276, over 4945853.63 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:13:00,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=876880.0, ans=15.0 2023-12-23 01:13:08,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=876946.6666666666, ans=0.125 2023-12-23 01:13:20,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=877013.3333333334, ans=0.125 2023-12-23 01:13:23,946 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.944e+01 3.205e+01 3.349e+01 3.516e+01 3.954e+01, threshold=6.699e+01, percent-clipped=0.0 2023-12-23 01:13:26,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=877013.3333333334, ans=15.0 2023-12-23 01:13:47,388 INFO [train.py:886] (1/4) Epoch 28, batch 2900, loss[loss=0.01279, audio_tagging_loss=0.01279, over 25000.00 frames. ], tot_loss[loss=0.01278, audio_tagging_loss=0.01278, over 4944422.72 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:14:02,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=877280.0, ans=0.1 2023-12-23 01:14:06,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=877280.0, ans=0.0 2023-12-23 01:14:18,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=877413.3333333334, ans=0.125 2023-12-23 01:14:26,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=877413.3333333334, ans=0.125 2023-12-23 01:14:41,108 INFO [train.py:886] (1/4) Epoch 28, batch 2950, loss[loss=0.009986, audio_tagging_loss=0.009986, over 24750.00 frames. ], tot_loss[loss=0.01265, audio_tagging_loss=0.01265, over 4945540.68 frames. ], batch size: 99, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:15:01,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=877680.0, ans=0.0 2023-12-23 01:15:08,594 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.855e+01 3.109e+01 3.285e+01 3.424e+01 3.829e+01, threshold=6.571e+01, percent-clipped=0.0 2023-12-23 01:15:33,435 INFO [train.py:886] (1/4) Epoch 28, batch 3000, loss[loss=0.01438, audio_tagging_loss=0.01438, over 25000.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 4946578.35 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:15:33,436 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 01:15:44,123 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.1502, 1.1509, 4.4167, 4.2957], device='cuda:1') 2023-12-23 01:15:54,366 INFO [train.py:917] (1/4) Epoch 28, validation: loss=0.03338, audio_tagging_loss=0.03338, over 3737520.00 frames. 2023-12-23 01:15:54,367 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 01:16:09,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=877946.6666666666, ans=0.125 2023-12-23 01:16:20,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=878013.3333333334, ans=0.125 2023-12-23 01:16:37,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=878146.6666666666, ans=0.2 2023-12-23 01:16:46,342 INFO [train.py:886] (1/4) Epoch 28, batch 3050, loss[loss=0.01035, audio_tagging_loss=0.01035, over 24019.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 4948343.15 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:16:47,861 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.47 vs. limit=22.5 2023-12-23 01:17:14,049 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.873e+01 3.142e+01 3.296e+01 3.491e+01 4.100e+01, threshold=6.591e+01, percent-clipped=0.0 2023-12-23 01:17:38,123 INFO [train.py:886] (1/4) Epoch 28, batch 3100, loss[loss=0.01348, audio_tagging_loss=0.01348, over 25000.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 4949849.67 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:17:40,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=878546.6666666666, ans=0.1 2023-12-23 01:17:41,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=878546.6666666666, ans=0.0 2023-12-23 01:17:41,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=878546.6666666666, ans=0.2 2023-12-23 01:17:47,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=878546.6666666666, ans=0.125 2023-12-23 01:17:49,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=878613.3333333334, ans=0.1 2023-12-23 01:17:58,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=878680.0, ans=10.0 2023-12-23 01:18:00,022 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.77 vs. limit=22.5 2023-12-23 01:18:03,031 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.14 vs. limit=15.0 2023-12-23 01:18:05,542 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.19 vs. limit=15.0 2023-12-23 01:18:07,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=878680.0, ans=0.125 2023-12-23 01:18:12,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=878746.6666666666, ans=0.1 2023-12-23 01:18:17,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=878746.6666666666, ans=0.09899494936611666 2023-12-23 01:18:29,896 INFO [train.py:886] (1/4) Epoch 28, batch 3150, loss[loss=0.01198, audio_tagging_loss=0.01198, over 25000.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 4946349.26 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:18:32,489 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.31 vs. limit=10.0 2023-12-23 01:18:33,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=878880.0, ans=0.125 2023-12-23 01:18:50,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=879013.3333333334, ans=0.2 2023-12-23 01:18:57,080 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.894e+01 3.254e+01 3.356e+01 3.478e+01 4.076e+01, threshold=6.712e+01, percent-clipped=0.0 2023-12-23 01:19:01,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=879080.0, ans=0.125 2023-12-23 01:19:15,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=879146.6666666666, ans=0.125 2023-12-23 01:19:22,611 INFO [train.py:886] (1/4) Epoch 28, batch 3200, loss[loss=0.0124, audio_tagging_loss=0.0124, over 25000.00 frames. ], tot_loss[loss=0.01276, audio_tagging_loss=0.01276, over 4943162.32 frames. ], batch size: 100, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:19:24,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=879213.3333333334, ans=0.125 2023-12-23 01:19:26,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=879213.3333333334, ans=0.0 2023-12-23 01:19:39,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=879280.0, ans=0.95 2023-12-23 01:19:45,646 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.98 vs. limit=15.0 2023-12-23 01:20:13,635 INFO [train.py:886] (1/4) Epoch 28, batch 3250, loss[loss=0.0131, audio_tagging_loss=0.0131, over 25000.00 frames. ], tot_loss[loss=0.01271, audio_tagging_loss=0.01271, over 4948329.77 frames. ], batch size: 100, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:20:17,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=879546.6666666666, ans=0.125 2023-12-23 01:20:24,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.29 vs. limit=15.0 2023-12-23 01:20:25,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=879613.3333333334, ans=0.0 2023-12-23 01:20:38,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=879680.0, ans=0.015 2023-12-23 01:20:40,467 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.820e+01 3.146e+01 3.294e+01 3.408e+01 4.034e+01, threshold=6.589e+01, percent-clipped=0.0 2023-12-23 01:20:56,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=879813.3333333334, ans=0.0 2023-12-23 01:21:01,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=879813.3333333334, ans=0.0 2023-12-23 01:21:05,287 INFO [train.py:886] (1/4) Epoch 28, batch 3300, loss[loss=0.0105, audio_tagging_loss=0.0105, over 24750.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4950879.42 frames. ], batch size: 99, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:21:41,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=880080.0, ans=10.0 2023-12-23 01:21:43,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=880080.0, ans=0.125 2023-12-23 01:21:55,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.04 vs. limit=22.5 2023-12-23 01:21:59,418 INFO [train.py:886] (1/4) Epoch 28, batch 3350, loss[loss=0.01256, audio_tagging_loss=0.01256, over 25000.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4946192.97 frames. ], batch size: 100, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:22:00,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=880213.3333333334, ans=0.125 2023-12-23 01:22:18,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=880280.0, ans=0.125 2023-12-23 01:22:24,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=880346.6666666666, ans=0.125 2023-12-23 01:22:26,189 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.722e+01 3.184e+01 3.328e+01 3.464e+01 4.025e+01, threshold=6.657e+01, percent-clipped=0.0 2023-12-23 01:22:39,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=880413.3333333334, ans=0.125 2023-12-23 01:22:48,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=880480.0, ans=0.1 2023-12-23 01:22:50,986 INFO [train.py:886] (1/4) Epoch 28, batch 3400, loss[loss=0.01238, audio_tagging_loss=0.01238, over 24750.00 frames. ], tot_loss[loss=0.01261, audio_tagging_loss=0.01261, over 4948533.12 frames. ], batch size: 99, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:22:57,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=880546.6666666666, ans=0.1 2023-12-23 01:23:03,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=880613.3333333334, ans=0.0 2023-12-23 01:23:14,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=880680.0, ans=0.0 2023-12-23 01:23:35,957 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.57 vs. limit=15.0 2023-12-23 01:23:36,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=880813.3333333334, ans=10.0 2023-12-23 01:23:41,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=880813.3333333334, ans=0.2 2023-12-23 01:23:43,214 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.58 vs. limit=22.5 2023-12-23 01:23:43,691 INFO [train.py:886] (1/4) Epoch 28, batch 3450, loss[loss=0.01247, audio_tagging_loss=0.01247, over 24750.00 frames. ], tot_loss[loss=0.01275, audio_tagging_loss=0.01275, over 4951851.61 frames. ], batch size: 99, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:23:52,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=880946.6666666666, ans=0.1 2023-12-23 01:24:08,943 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.65 vs. limit=10.0 2023-12-23 01:24:10,421 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.974e+01 3.236e+01 3.395e+01 3.566e+01 3.957e+01, threshold=6.789e+01, percent-clipped=0.0 2023-12-23 01:24:12,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=881013.3333333334, ans=0.2 2023-12-23 01:24:24,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=881146.6666666666, ans=0.125 2023-12-23 01:24:28,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=881146.6666666666, ans=0.0 2023-12-23 01:24:29,639 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.67 vs. limit=15.0 2023-12-23 01:24:35,511 INFO [train.py:886] (1/4) Epoch 28, batch 3500, loss[loss=0.01174, audio_tagging_loss=0.01174, over 24750.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 4944730.39 frames. ], batch size: 99, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:24:56,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=881346.6666666666, ans=6.0 2023-12-23 01:25:02,284 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.75 vs. limit=15.0 2023-12-23 01:25:10,444 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.14 vs. limit=22.5 2023-12-23 01:25:13,046 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=15.0 2023-12-23 01:25:17,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=881480.0, ans=0.125 2023-12-23 01:25:27,305 INFO [train.py:886] (1/4) Epoch 28, batch 3550, loss[loss=0.01184, audio_tagging_loss=0.01184, over 25000.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4945001.37 frames. ], batch size: 100, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:25:28,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=881546.6666666666, ans=0.125 2023-12-23 01:25:39,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=881613.3333333334, ans=0.0 2023-12-23 01:25:54,594 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.851e+01 3.124e+01 3.262e+01 3.428e+01 4.045e+01, threshold=6.525e+01, percent-clipped=0.0 2023-12-23 01:25:54,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=881680.0, ans=0.0 2023-12-23 01:25:55,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=881680.0, ans=0.125 2023-12-23 01:26:08,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=881813.3333333334, ans=0.0 2023-12-23 01:26:19,137 INFO [train.py:886] (1/4) Epoch 28, batch 3600, loss[loss=0.01347, audio_tagging_loss=0.01347, over 25000.00 frames. ], tot_loss[loss=0.01266, audio_tagging_loss=0.01266, over 4938410.52 frames. ], batch size: 100, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:26:20,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=881880.0, ans=0.0 2023-12-23 01:26:48,430 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=12.0 2023-12-23 01:27:04,780 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.27 vs. limit=15.0 2023-12-23 01:27:07,744 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.45 vs. limit=15.0 2023-12-23 01:27:08,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=882146.6666666666, ans=0.2 2023-12-23 01:27:10,018 INFO [train.py:886] (1/4) Epoch 28, batch 3650, loss[loss=0.01247, audio_tagging_loss=0.01247, over 25000.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 4940693.42 frames. ], batch size: 100, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:27:19,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=882213.3333333334, ans=0.125 2023-12-23 01:27:33,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=882346.6666666666, ans=0.2 2023-12-23 01:27:36,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=882346.6666666666, ans=0.2 2023-12-23 01:27:37,444 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.830e+01 3.117e+01 3.248e+01 3.424e+01 3.950e+01, threshold=6.496e+01, percent-clipped=0.0 2023-12-23 01:27:37,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=882346.6666666666, ans=0.0 2023-12-23 01:27:40,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=882413.3333333334, ans=0.125 2023-12-23 01:27:44,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=882413.3333333334, ans=0.125 2023-12-23 01:27:45,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=882413.3333333334, ans=0.1 2023-12-23 01:27:52,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=882480.0, ans=0.0 2023-12-23 01:27:55,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=882480.0, ans=0.04949747468305833 2023-12-23 01:27:59,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=882480.0, ans=0.0 2023-12-23 01:28:02,357 INFO [train.py:886] (1/4) Epoch 28, batch 3700, loss[loss=0.01293, audio_tagging_loss=0.01293, over 25000.00 frames. ], tot_loss[loss=0.0127, audio_tagging_loss=0.0127, over 4950824.33 frames. ], batch size: 100, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:28:13,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=882613.3333333334, ans=0.125 2023-12-23 01:28:14,803 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.91 vs. limit=6.0 2023-12-23 01:28:15,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=882613.3333333334, ans=0.0 2023-12-23 01:28:19,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=882613.3333333334, ans=0.2 2023-12-23 01:28:32,041 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.99 vs. limit=6.0 2023-12-23 01:28:41,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=882746.6666666666, ans=0.125 2023-12-23 01:28:54,372 INFO [train.py:886] (1/4) Epoch 28, batch 3750, loss[loss=0.01662, audio_tagging_loss=0.01662, over 24948.00 frames. ], tot_loss[loss=0.01277, audio_tagging_loss=0.01277, over 4953140.12 frames. ], batch size: 100, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:28:55,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=882880.0, ans=0.125 2023-12-23 01:29:02,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=882880.0, ans=0.125 2023-12-23 01:29:07,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=882946.6666666666, ans=0.2 2023-12-23 01:29:21,070 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.970e+01 3.212e+01 3.329e+01 3.465e+01 4.032e+01, threshold=6.658e+01, percent-clipped=0.0 2023-12-23 01:29:35,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=883146.6666666666, ans=0.125 2023-12-23 01:29:38,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=883146.6666666666, ans=0.0 2023-12-23 01:29:45,895 INFO [train.py:886] (1/4) Epoch 28, batch 3800, loss[loss=0.01438, audio_tagging_loss=0.01438, over 24750.00 frames. ], tot_loss[loss=0.01286, audio_tagging_loss=0.01286, over 4946906.74 frames. ], batch size: 99, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:29:46,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=883213.3333333334, ans=0.125 2023-12-23 01:29:51,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=883213.3333333334, ans=0.125 2023-12-23 01:29:57,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.58 vs. limit=8.0 2023-12-23 01:30:05,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=883280.0, ans=0.05 2023-12-23 01:30:09,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=883346.6666666666, ans=0.0 2023-12-23 01:30:14,027 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=15.0 2023-12-23 01:30:31,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=883480.0, ans=0.125 2023-12-23 01:30:38,185 INFO [train.py:886] (1/4) Epoch 28, batch 3850, loss[loss=0.01063, audio_tagging_loss=0.01063, over 25000.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4941199.59 frames. ], batch size: 100, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:30:38,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=883546.6666666666, ans=0.0 2023-12-23 01:30:47,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=883613.3333333334, ans=0.125 2023-12-23 01:31:04,610 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2023-12-23 01:31:05,059 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.788e+01 3.155e+01 3.335e+01 3.521e+01 4.328e+01, threshold=6.670e+01, percent-clipped=0.0 2023-12-23 01:31:29,959 INFO [train.py:886] (1/4) Epoch 28, batch 3900, loss[loss=0.01532, audio_tagging_loss=0.01532, over 24750.00 frames. ], tot_loss[loss=0.01272, audio_tagging_loss=0.01272, over 4949862.39 frames. ], batch size: 99, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:31:41,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=883946.6666666666, ans=0.125 2023-12-23 01:31:42,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=883946.6666666666, ans=0.125 2023-12-23 01:31:44,431 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.31 vs. limit=10.0 2023-12-23 01:32:03,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=884080.0, ans=0.125 2023-12-23 01:32:21,906 INFO [train.py:886] (1/4) Epoch 28, batch 3950, loss[loss=0.01238, audio_tagging_loss=0.01238, over 25000.00 frames. ], tot_loss[loss=0.0127, audio_tagging_loss=0.0127, over 4954621.78 frames. ], batch size: 100, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:32:26,010 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.99 vs. limit=22.5 2023-12-23 01:32:49,272 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.873e+01 3.204e+01 3.330e+01 3.435e+01 3.816e+01, threshold=6.660e+01, percent-clipped=0.0 2023-12-23 01:32:50,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=884346.6666666666, ans=0.05 2023-12-23 01:32:53,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=884413.3333333334, ans=0.0 2023-12-23 01:32:53,504 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.31 vs. limit=22.5 2023-12-23 01:33:03,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=884480.0, ans=0.125 2023-12-23 01:33:13,913 INFO [train.py:886] (1/4) Epoch 28, batch 4000, loss[loss=0.0112, audio_tagging_loss=0.0112, over 22114.00 frames. ], tot_loss[loss=0.01268, audio_tagging_loss=0.01268, over 4957267.65 frames. ], batch size: 107, lr: 3.83e-03, grad_scale: 128.0 2023-12-23 01:33:43,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=884746.6666666666, ans=0.0 2023-12-23 01:33:46,248 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 01:34:02,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=884813.3333333334, ans=0.125 2023-12-23 01:34:03,581 INFO [train.py:886] (1/4) Epoch 28, batch 4050, loss[loss=0.01294, audio_tagging_loss=0.01294, over 24750.00 frames. ], tot_loss[loss=0.01275, audio_tagging_loss=0.01275, over 4956974.48 frames. ], batch size: 99, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:34:04,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=884880.0, ans=0.1 2023-12-23 01:34:05,905 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.46 vs. limit=22.5 2023-12-23 01:34:14,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=884946.6666666666, ans=0.1 2023-12-23 01:34:21,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=884946.6666666666, ans=0.0 2023-12-23 01:34:31,374 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.870e+01 3.168e+01 3.309e+01 3.446e+01 3.884e+01, threshold=6.618e+01, percent-clipped=0.0 2023-12-23 01:34:36,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=885080.0, ans=0.125 2023-12-23 01:34:39,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=885080.0, ans=0.0 2023-12-23 01:34:40,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=885080.0, ans=0.0 2023-12-23 01:34:46,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=885146.6666666666, ans=0.0 2023-12-23 01:34:55,294 INFO [train.py:886] (1/4) Epoch 28, batch 4100, loss[loss=0.01244, audio_tagging_loss=0.01244, over 24035.00 frames. ], tot_loss[loss=0.01281, audio_tagging_loss=0.01281, over 4954454.21 frames. ], batch size: 100, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:35:02,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=885213.3333333334, ans=0.0 2023-12-23 01:35:14,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=885346.6666666666, ans=0.0 2023-12-23 01:35:16,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=885346.6666666666, ans=0.5 2023-12-23 01:35:17,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=885346.6666666666, ans=0.1 2023-12-23 01:35:24,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=885346.6666666666, ans=0.125 2023-12-23 01:35:34,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=885480.0, ans=0.1 2023-12-23 01:35:40,751 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.70 vs. limit=15.0 2023-12-23 01:35:42,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=885480.0, ans=0.0 2023-12-23 01:35:46,243 INFO [train.py:886] (1/4) Epoch 28, batch 4150, loss[loss=0.01124, audio_tagging_loss=0.01124, over 25000.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4945818.80 frames. ], batch size: 100, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:36:02,571 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.21 vs. limit=15.0 2023-12-23 01:36:09,039 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.71 vs. limit=15.0 2023-12-23 01:36:12,416 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.816e+01 3.213e+01 3.310e+01 3.521e+01 4.267e+01, threshold=6.621e+01, percent-clipped=0.0 2023-12-23 01:36:35,510 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2023-12-23 01:36:37,005 INFO [train.py:886] (1/4) Epoch 28, batch 4200, loss[loss=0.01354, audio_tagging_loss=0.01354, over 25000.00 frames. ], tot_loss[loss=0.01281, audio_tagging_loss=0.01281, over 4947721.09 frames. ], batch size: 100, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:36:38,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=885880.0, ans=0.125 2023-12-23 01:37:02,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=886013.3333333334, ans=0.1 2023-12-23 01:37:11,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=886080.0, ans=0.0 2023-12-23 01:37:11,508 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.18 vs. limit=10.0 2023-12-23 01:37:13,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=886080.0, ans=0.0 2023-12-23 01:37:13,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.95 vs. limit=22.5 2023-12-23 01:37:26,033 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.80 vs. limit=15.0 2023-12-23 01:37:26,997 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2023-12-23 01:37:29,997 INFO [train.py:886] (1/4) Epoch 28, batch 4250, loss[loss=0.01235, audio_tagging_loss=0.01235, over 25000.00 frames. ], tot_loss[loss=0.01267, audio_tagging_loss=0.01267, over 4944564.54 frames. ], batch size: 100, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:37:32,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=886213.3333333334, ans=0.1 2023-12-23 01:37:57,038 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.864e+01 3.165e+01 3.374e+01 3.516e+01 4.346e+01, threshold=6.749e+01, percent-clipped=0.0 2023-12-23 01:38:00,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=886413.3333333334, ans=0.0 2023-12-23 01:38:11,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=886480.0, ans=0.125 2023-12-23 01:38:11,994 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=15.0 2023-12-23 01:38:19,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=886546.6666666666, ans=0.0 2023-12-23 01:38:19,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=886546.6666666666, ans=0.05 2023-12-23 01:38:20,010 INFO [train.py:886] (1/4) Epoch 28, batch 4300, loss[loss=0.01419, audio_tagging_loss=0.01419, over 24750.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4947238.24 frames. ], batch size: 99, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:38:29,361 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 01:38:43,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=886680.0, ans=0.2 2023-12-23 01:38:46,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=886680.0, ans=0.125 2023-12-23 01:39:13,175 INFO [train.py:886] (1/4) Epoch 28, batch 4350, loss[loss=0.01302, audio_tagging_loss=0.01302, over 24750.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 4947400.98 frames. ], batch size: 99, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:39:24,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=886946.6666666666, ans=0.2 2023-12-23 01:39:30,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=886946.6666666666, ans=0.0 2023-12-23 01:39:33,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=887013.3333333334, ans=0.1 2023-12-23 01:39:41,274 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.916e+01 3.191e+01 3.325e+01 3.447e+01 4.229e+01, threshold=6.650e+01, percent-clipped=0.0 2023-12-23 01:39:42,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=887013.3333333334, ans=0.015 2023-12-23 01:39:47,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=887080.0, ans=0.0 2023-12-23 01:39:54,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=887146.6666666666, ans=0.09899494936611666 2023-12-23 01:39:57,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=887146.6666666666, ans=0.1 2023-12-23 01:40:04,486 INFO [train.py:886] (1/4) Epoch 28, batch 4400, loss[loss=0.0157, audio_tagging_loss=0.0157, over 21830.00 frames. ], tot_loss[loss=0.01287, audio_tagging_loss=0.01287, over 4941198.19 frames. ], batch size: 107, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:40:06,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=887213.3333333334, ans=0.0 2023-12-23 01:40:11,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=887213.3333333334, ans=0.0 2023-12-23 01:40:28,216 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.73 vs. limit=15.0 2023-12-23 01:40:35,321 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.37 vs. limit=15.0 2023-12-23 01:40:46,414 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.51 vs. limit=6.0 2023-12-23 01:40:55,228 INFO [train.py:886] (1/4) Epoch 28, batch 4450, loss[loss=0.0126, audio_tagging_loss=0.0126, over 24750.00 frames. ], tot_loss[loss=0.01293, audio_tagging_loss=0.01293, over 4934409.75 frames. ], batch size: 99, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:41:23,624 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.801e+01 3.227e+01 3.347e+01 3.540e+01 3.940e+01, threshold=6.695e+01, percent-clipped=0.0 2023-12-23 01:41:28,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=887746.6666666666, ans=0.125 2023-12-23 01:41:28,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=887746.6666666666, ans=0.1 2023-12-23 01:41:35,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=887813.3333333334, ans=0.125 2023-12-23 01:41:41,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=887813.3333333334, ans=0.0 2023-12-23 01:41:43,866 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.96 vs. limit=22.5 2023-12-23 01:41:45,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=887813.3333333334, ans=0.0 2023-12-23 01:41:47,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=887880.0, ans=0.125 2023-12-23 01:41:48,290 INFO [train.py:886] (1/4) Epoch 28, batch 4500, loss[loss=0.01097, audio_tagging_loss=0.01097, over 25000.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4937439.61 frames. ], batch size: 100, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:42:04,647 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 01:42:10,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=888013.3333333334, ans=0.0 2023-12-23 01:42:18,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=888080.0, ans=0.125 2023-12-23 01:42:23,771 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.91 vs. limit=6.0 2023-12-23 01:42:24,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=888080.0, ans=0.0 2023-12-23 01:42:38,517 INFO [train.py:886] (1/4) Epoch 28, batch 4550, loss[loss=0.01243, audio_tagging_loss=0.01243, over 25000.00 frames. ], tot_loss[loss=0.01277, audio_tagging_loss=0.01277, over 4946865.35 frames. ], batch size: 100, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:43:04,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=888346.6666666666, ans=0.125 2023-12-23 01:43:06,886 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.744e+01 3.196e+01 3.308e+01 3.514e+01 4.018e+01, threshold=6.616e+01, percent-clipped=0.0 2023-12-23 01:43:09,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=888413.3333333334, ans=0.125 2023-12-23 01:43:13,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=888413.3333333334, ans=0.125 2023-12-23 01:43:17,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=888413.3333333334, ans=0.125 2023-12-23 01:43:24,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=888480.0, ans=0.125 2023-12-23 01:43:31,514 INFO [train.py:886] (1/4) Epoch 28, batch 4600, loss[loss=0.01165, audio_tagging_loss=0.01165, over 25000.00 frames. ], tot_loss[loss=0.01272, audio_tagging_loss=0.01272, over 4952283.57 frames. ], batch size: 100, lr: 3.82e-03, grad_scale: 64.0 2023-12-23 01:43:37,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=888546.6666666666, ans=0.1 2023-12-23 01:43:51,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=888680.0, ans=0.125 2023-12-23 01:43:53,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=888680.0, ans=0.125 2023-12-23 01:43:57,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=888680.0, ans=0.0 2023-12-23 01:43:59,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=888680.0, ans=0.125 2023-12-23 01:44:23,030 INFO [train.py:886] (1/4) Epoch 28, batch 4650, loss[loss=0.009989, audio_tagging_loss=0.009989, over 25000.00 frames. ], tot_loss[loss=0.0127, audio_tagging_loss=0.0127, over 4956009.24 frames. ], batch size: 100, lr: 3.82e-03, grad_scale: 64.0 2023-12-23 01:44:43,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=889013.3333333334, ans=0.125 2023-12-23 01:44:49,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=889013.3333333334, ans=0.125 2023-12-23 01:44:50,547 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.857e+01 3.209e+01 3.315e+01 3.492e+01 4.117e+01, threshold=6.630e+01, percent-clipped=0.0 2023-12-23 01:44:54,601 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.88 vs. limit=15.0 2023-12-23 01:44:55,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=889080.0, ans=0.05 2023-12-23 01:45:13,975 INFO [train.py:886] (1/4) Epoch 28, batch 4700, loss[loss=0.0121, audio_tagging_loss=0.0121, over 24750.00 frames. ], tot_loss[loss=0.01277, audio_tagging_loss=0.01277, over 4952859.54 frames. ], batch size: 99, lr: 3.82e-03, grad_scale: 64.0 2023-12-23 01:45:41,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=889413.3333333334, ans=0.1 2023-12-23 01:45:56,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=889480.0, ans=0.1 2023-12-23 01:46:00,588 INFO [train.py:886] (1/4) Epoch 28, batch 4750, loss[loss=0.016, audio_tagging_loss=0.016, over 25000.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4945305.04 frames. ], batch size: 100, lr: 3.82e-03, grad_scale: 64.0 2023-12-23 01:46:12,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=889613.3333333334, ans=0.125 2023-12-23 01:46:36,938 INFO [train.py:886] (1/4) Epoch 29, batch 0, loss[loss=0.02812, audio_tagging_loss=0.02812, over 25000.00 frames. ], tot_loss[loss=0.02812, audio_tagging_loss=0.02812, over 25000.00 frames. ], batch size: 100, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:46:36,938 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 01:46:48,440 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.2442, 3.4880, 3.7170, 3.5425], device='cuda:1') 2023-12-23 01:46:49,752 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.7016, 3.3928, 3.8930, 3.8696], device='cuda:1') 2023-12-23 01:46:58,158 INFO [train.py:917] (1/4) Epoch 29, validation: loss=0.03319, audio_tagging_loss=0.03319, over 3737520.00 frames. 2023-12-23 01:46:58,159 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 01:46:59,752 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.25 vs. limit=15.0 2023-12-23 01:47:10,288 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.924e+01 3.242e+01 3.406e+01 3.707e+01 9.005e+01, threshold=6.813e+01, percent-clipped=9.0 2023-12-23 01:47:27,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=889786.6666666666, ans=0.0 2023-12-23 01:47:49,193 INFO [train.py:886] (1/4) Epoch 29, batch 50, loss[loss=0.01901, audio_tagging_loss=0.01901, over 25000.00 frames. ], tot_loss[loss=0.02024, audio_tagging_loss=0.02024, over 1121594.61 frames. ], batch size: 100, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:47:49,661 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.50 vs. limit=15.0 2023-12-23 01:48:05,875 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.48 vs. limit=22.5 2023-12-23 01:48:34,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=890253.3333333334, ans=0.2 2023-12-23 01:48:34,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=890253.3333333334, ans=0.0 2023-12-23 01:48:38,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=890253.3333333334, ans=0.0 2023-12-23 01:48:41,287 INFO [train.py:886] (1/4) Epoch 29, batch 100, loss[loss=0.01559, audio_tagging_loss=0.01559, over 24750.00 frames. ], tot_loss[loss=0.01737, audio_tagging_loss=0.01737, over 1971989.15 frames. ], batch size: 99, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:48:41,929 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.17 vs. limit=22.5 2023-12-23 01:48:43,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=890320.0, ans=0.09899494936611666 2023-12-23 01:48:53,292 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.271e+01 3.674e+01 3.939e+01 4.263e+01 5.538e+01, threshold=7.878e+01, percent-clipped=0.0 2023-12-23 01:49:02,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=890453.3333333334, ans=0.1 2023-12-23 01:49:12,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=890520.0, ans=0.0 2023-12-23 01:49:25,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=890586.6666666666, ans=0.125 2023-12-23 01:49:32,036 INFO [train.py:886] (1/4) Epoch 29, batch 150, loss[loss=0.0136, audio_tagging_loss=0.0136, over 24750.00 frames. ], tot_loss[loss=0.01595, audio_tagging_loss=0.01595, over 2635867.79 frames. ], batch size: 99, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:49:38,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=890653.3333333334, ans=0.125 2023-12-23 01:49:46,253 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.22 vs. limit=15.0 2023-12-23 01:49:48,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.13 vs. limit=15.0 2023-12-23 01:49:55,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=890786.6666666666, ans=0.0 2023-12-23 01:50:24,557 INFO [train.py:886] (1/4) Epoch 29, batch 200, loss[loss=0.01267, audio_tagging_loss=0.01267, over 25000.00 frames. ], tot_loss[loss=0.01506, audio_tagging_loss=0.01506, over 3154050.38 frames. ], batch size: 100, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:50:34,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=891053.3333333334, ans=0.125 2023-12-23 01:50:36,602 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.903e+01 3.220e+01 3.343e+01 3.508e+01 4.197e+01, threshold=6.685e+01, percent-clipped=0.0 2023-12-23 01:51:16,770 INFO [train.py:886] (1/4) Epoch 29, batch 250, loss[loss=0.01385, audio_tagging_loss=0.01385, over 25000.00 frames. ], tot_loss[loss=0.0145, audio_tagging_loss=0.0145, over 3554218.60 frames. ], batch size: 100, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:51:21,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=891320.0, ans=0.125 2023-12-23 01:51:28,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=891386.6666666666, ans=0.125 2023-12-23 01:52:08,238 INFO [train.py:886] (1/4) Epoch 29, batch 300, loss[loss=0.01265, audio_tagging_loss=0.01265, over 24750.00 frames. ], tot_loss[loss=0.01409, audio_tagging_loss=0.01409, over 3859705.44 frames. ], batch size: 99, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:52:14,316 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.98 vs. limit=22.5 2023-12-23 01:52:19,684 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.68 vs. limit=15.0 2023-12-23 01:52:20,959 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.835e+01 3.236e+01 3.377e+01 3.518e+01 3.995e+01, threshold=6.753e+01, percent-clipped=0.0 2023-12-23 01:52:26,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=891720.0, ans=0.125 2023-12-23 01:52:27,942 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.13 vs. limit=12.0 2023-12-23 01:52:29,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=891786.6666666666, ans=0.1 2023-12-23 01:52:37,881 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.70 vs. limit=15.0 2023-12-23 01:52:41,681 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.86 vs. limit=15.0 2023-12-23 01:53:00,223 INFO [train.py:886] (1/4) Epoch 29, batch 350, loss[loss=0.01225, audio_tagging_loss=0.01225, over 24750.00 frames. ], tot_loss[loss=0.01378, audio_tagging_loss=0.01378, over 4096684.49 frames. ], batch size: 99, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:53:01,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.72 vs. limit=12.0 2023-12-23 01:53:12,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=892053.3333333334, ans=0.2 2023-12-23 01:53:23,607 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.06 vs. limit=22.5 2023-12-23 01:53:28,215 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.05 vs. limit=15.0 2023-12-23 01:53:29,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=892120.0, ans=0.125 2023-12-23 01:53:38,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=892186.6666666666, ans=0.1 2023-12-23 01:53:47,791 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2023-12-23 01:53:49,814 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.28 vs. limit=22.5 2023-12-23 01:53:51,861 INFO [train.py:886] (1/4) Epoch 29, batch 400, loss[loss=0.01034, audio_tagging_loss=0.01034, over 24750.00 frames. ], tot_loss[loss=0.01345, audio_tagging_loss=0.01345, over 4285469.15 frames. ], batch size: 99, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:53:52,955 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=15.0 2023-12-23 01:53:59,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=892320.0, ans=0.125 2023-12-23 01:54:04,658 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.891e+01 3.197e+01 3.293e+01 3.447e+01 4.008e+01, threshold=6.587e+01, percent-clipped=0.0 2023-12-23 01:54:05,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=892386.6666666666, ans=0.0 2023-12-23 01:54:07,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=892386.6666666666, ans=0.2 2023-12-23 01:54:09,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=892386.6666666666, ans=0.0 2023-12-23 01:54:12,086 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2023-12-23 01:54:14,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=892453.3333333334, ans=0.1 2023-12-23 01:54:31,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=892520.0, ans=0.0 2023-12-23 01:54:31,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=892520.0, ans=0.1 2023-12-23 01:54:43,569 INFO [train.py:886] (1/4) Epoch 29, batch 450, loss[loss=0.01374, audio_tagging_loss=0.01374, over 25000.00 frames. ], tot_loss[loss=0.01321, audio_tagging_loss=0.01321, over 4434936.83 frames. ], batch size: 100, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:55:26,303 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.31 vs. limit=15.0 2023-12-23 01:55:30,641 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.23 vs. limit=15.0 2023-12-23 01:55:36,734 INFO [train.py:886] (1/4) Epoch 29, batch 500, loss[loss=0.01029, audio_tagging_loss=0.01029, over 24750.00 frames. ], tot_loss[loss=0.01306, audio_tagging_loss=0.01306, over 4551254.23 frames. ], batch size: 99, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:55:46,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.55 vs. limit=10.0 2023-12-23 01:55:48,273 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.779e+01 3.131e+01 3.260e+01 3.424e+01 4.258e+01, threshold=6.520e+01, percent-clipped=0.0 2023-12-23 01:55:51,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=893053.3333333334, ans=0.125 2023-12-23 01:55:56,402 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.98 vs. limit=15.0 2023-12-23 01:55:56,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=893120.0, ans=0.125 2023-12-23 01:56:04,077 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.78 vs. limit=12.0 2023-12-23 01:56:07,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=893186.6666666666, ans=0.0 2023-12-23 01:56:24,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=893253.3333333334, ans=0.125 2023-12-23 01:56:25,380 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 01:56:27,083 INFO [train.py:886] (1/4) Epoch 29, batch 550, loss[loss=0.01253, audio_tagging_loss=0.01253, over 25000.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 4640694.01 frames. ], batch size: 100, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:56:49,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=893453.3333333334, ans=0.125 2023-12-23 01:57:01,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=893520.0, ans=0.125 2023-12-23 01:57:20,504 INFO [train.py:886] (1/4) Epoch 29, batch 600, loss[loss=0.01308, audio_tagging_loss=0.01308, over 24750.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 4711167.56 frames. ], batch size: 99, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:57:24,425 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 01:57:31,800 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.900e+01 3.207e+01 3.346e+01 3.493e+01 4.560e+01, threshold=6.691e+01, percent-clipped=0.0 2023-12-23 01:58:12,569 INFO [train.py:886] (1/4) Epoch 29, batch 650, loss[loss=0.01289, audio_tagging_loss=0.01289, over 24750.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 4761416.70 frames. ], batch size: 99, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:58:26,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.30 vs. limit=15.0 2023-12-23 01:58:31,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=894120.0, ans=0.04949747468305833 2023-12-23 01:58:40,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=894120.0, ans=0.07 2023-12-23 01:58:45,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=894186.6666666666, ans=0.2 2023-12-23 01:58:45,179 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.86 vs. limit=15.0 2023-12-23 01:58:51,777 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.66 vs. limit=15.0 2023-12-23 01:59:03,458 INFO [train.py:886] (1/4) Epoch 29, batch 700, loss[loss=0.0159, audio_tagging_loss=0.0159, over 24750.00 frames. ], tot_loss[loss=0.01292, audio_tagging_loss=0.01292, over 4803260.89 frames. ], batch size: 99, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 01:59:05,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=894320.0, ans=15.0 2023-12-23 01:59:15,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=894386.6666666666, ans=0.2 2023-12-23 01:59:16,889 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.862e+01 3.181e+01 3.381e+01 3.506e+01 3.965e+01, threshold=6.761e+01, percent-clipped=0.0 2023-12-23 01:59:31,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=894453.3333333334, ans=0.125 2023-12-23 01:59:33,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=894520.0, ans=0.125 2023-12-23 01:59:56,274 INFO [train.py:886] (1/4) Epoch 29, batch 750, loss[loss=0.01082, audio_tagging_loss=0.01082, over 24750.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4840033.79 frames. ], batch size: 99, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 01:59:56,768 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.33 vs. limit=15.0 2023-12-23 02:00:02,854 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.75 vs. limit=6.0 2023-12-23 02:00:03,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=894653.3333333334, ans=0.125 2023-12-23 02:00:06,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=894720.0, ans=0.0 2023-12-23 02:00:08,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=894720.0, ans=0.015 2023-12-23 02:00:17,989 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.86 vs. limit=22.5 2023-12-23 02:00:24,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=894786.6666666666, ans=0.125 2023-12-23 02:00:42,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=894920.0, ans=0.125 2023-12-23 02:00:45,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=894986.6666666666, ans=0.0 2023-12-23 02:00:46,096 INFO [train.py:886] (1/4) Epoch 29, batch 800, loss[loss=0.01153, audio_tagging_loss=0.01153, over 21462.00 frames. ], tot_loss[loss=0.01288, audio_tagging_loss=0.01288, over 4863401.32 frames. ], batch size: 107, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:00:54,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=894986.6666666666, ans=0.0 2023-12-23 02:00:59,626 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.846e+01 3.158e+01 3.303e+01 3.490e+01 4.206e+01, threshold=6.605e+01, percent-clipped=0.0 2023-12-23 02:00:59,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=895053.3333333334, ans=0.125 2023-12-23 02:01:05,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=895053.3333333334, ans=0.125 2023-12-23 02:01:05,698 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.25 vs. limit=15.0 2023-12-23 02:01:37,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=895320.0, ans=0.125 2023-12-23 02:01:38,400 INFO [train.py:886] (1/4) Epoch 29, batch 850, loss[loss=0.0138, audio_tagging_loss=0.0138, over 24750.00 frames. ], tot_loss[loss=0.01288, audio_tagging_loss=0.01288, over 4887337.56 frames. ], batch size: 99, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:01:46,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=895320.0, ans=0.1 2023-12-23 02:02:16,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=895520.0, ans=0.125 2023-12-23 02:02:29,870 INFO [train.py:886] (1/4) Epoch 29, batch 900, loss[loss=0.01328, audio_tagging_loss=0.01328, over 24750.00 frames. ], tot_loss[loss=0.01295, audio_tagging_loss=0.01295, over 4898805.10 frames. ], batch size: 99, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:02:31,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.24 vs. limit=15.0 2023-12-23 02:02:42,537 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.824e+01 3.196e+01 3.316e+01 3.462e+01 4.110e+01, threshold=6.632e+01, percent-clipped=0.0 2023-12-23 02:02:50,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=895786.6666666666, ans=0.0 2023-12-23 02:03:01,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=895853.3333333334, ans=0.07 2023-12-23 02:03:20,998 INFO [train.py:886] (1/4) Epoch 29, batch 950, loss[loss=0.01271, audio_tagging_loss=0.01271, over 24750.00 frames. ], tot_loss[loss=0.01302, audio_tagging_loss=0.01302, over 4905457.41 frames. ], batch size: 99, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:03:43,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=896120.0, ans=0.125 2023-12-23 02:03:48,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=896120.0, ans=0.1 2023-12-23 02:03:54,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=896186.6666666666, ans=0.125 2023-12-23 02:04:04,047 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.24 vs. limit=15.0 2023-12-23 02:04:05,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=896253.3333333334, ans=0.0 2023-12-23 02:04:13,323 INFO [train.py:886] (1/4) Epoch 29, batch 1000, loss[loss=0.01094, audio_tagging_loss=0.01094, over 24750.00 frames. ], tot_loss[loss=0.01292, audio_tagging_loss=0.01292, over 4910227.07 frames. ], batch size: 99, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:04:20,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=896320.0, ans=0.125 2023-12-23 02:04:24,551 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.901e+01 3.234e+01 3.376e+01 3.564e+01 4.018e+01, threshold=6.752e+01, percent-clipped=0.0 2023-12-23 02:04:37,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=896453.3333333334, ans=0.125 2023-12-23 02:04:46,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=896520.0, ans=0.1 2023-12-23 02:04:47,394 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.46 vs. limit=15.0 2023-12-23 02:04:48,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=896520.0, ans=0.0 2023-12-23 02:04:58,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=896586.6666666666, ans=0.2 2023-12-23 02:05:03,443 INFO [train.py:886] (1/4) Epoch 29, batch 1050, loss[loss=0.01104, audio_tagging_loss=0.01104, over 24750.00 frames. ], tot_loss[loss=0.01278, audio_tagging_loss=0.01278, over 4923458.60 frames. ], batch size: 99, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:05:05,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=896653.3333333334, ans=0.1 2023-12-23 02:05:06,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=896653.3333333334, ans=0.1 2023-12-23 02:05:08,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=896653.3333333334, ans=0.1 2023-12-23 02:05:09,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=896653.3333333334, ans=0.125 2023-12-23 02:05:17,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=896720.0, ans=0.0 2023-12-23 02:05:21,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=896720.0, ans=0.05 2023-12-23 02:05:32,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=896786.6666666666, ans=0.2 2023-12-23 02:05:33,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=896853.3333333334, ans=0.1 2023-12-23 02:05:41,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=896853.3333333334, ans=0.0 2023-12-23 02:05:45,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=896920.0, ans=0.125 2023-12-23 02:05:55,171 INFO [train.py:886] (1/4) Epoch 29, batch 1100, loss[loss=0.0129, audio_tagging_loss=0.0129, over 25000.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4933814.34 frames. ], batch size: 100, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:05:56,698 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.20 vs. limit=15.0 2023-12-23 02:06:07,975 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.822e+01 3.193e+01 3.319e+01 3.484e+01 4.077e+01, threshold=6.637e+01, percent-clipped=0.0 2023-12-23 02:06:18,717 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2023-12-23 02:06:20,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=897120.0, ans=0.0 2023-12-23 02:06:25,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=897120.0, ans=0.2 2023-12-23 02:06:46,466 INFO [train.py:886] (1/4) Epoch 29, batch 1150, loss[loss=0.01539, audio_tagging_loss=0.01539, over 25000.00 frames. ], tot_loss[loss=0.01278, audio_tagging_loss=0.01278, over 4943638.92 frames. ], batch size: 100, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:06:51,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=897320.0, ans=0.0 2023-12-23 02:07:14,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=897453.3333333334, ans=0.05 2023-12-23 02:07:38,469 INFO [train.py:886] (1/4) Epoch 29, batch 1200, loss[loss=0.01214, audio_tagging_loss=0.01214, over 25000.00 frames. ], tot_loss[loss=0.01281, audio_tagging_loss=0.01281, over 4954086.36 frames. ], batch size: 100, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:07:50,592 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.874e+01 3.246e+01 3.372e+01 3.513e+01 4.009e+01, threshold=6.745e+01, percent-clipped=0.0 2023-12-23 02:08:01,396 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.01 vs. limit=10.0 2023-12-23 02:08:08,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=897853.3333333334, ans=0.1 2023-12-23 02:08:08,564 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.71 vs. limit=6.0 2023-12-23 02:08:11,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=897853.3333333334, ans=0.125 2023-12-23 02:08:21,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=897920.0, ans=0.2 2023-12-23 02:08:29,791 INFO [train.py:886] (1/4) Epoch 29, batch 1250, loss[loss=0.0126, audio_tagging_loss=0.0126, over 24750.00 frames. ], tot_loss[loss=0.01289, audio_tagging_loss=0.01289, over 4949625.03 frames. ], batch size: 99, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:08:37,693 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.13 vs. limit=10.0 2023-12-23 02:08:59,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=898120.0, ans=0.125 2023-12-23 02:09:18,949 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.08 vs. limit=22.5 2023-12-23 02:09:21,421 INFO [train.py:886] (1/4) Epoch 29, batch 1300, loss[loss=0.01193, audio_tagging_loss=0.01193, over 24750.00 frames. ], tot_loss[loss=0.01293, audio_tagging_loss=0.01293, over 4948175.89 frames. ], batch size: 99, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:09:33,509 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.906e+01 3.213e+01 3.404e+01 3.516e+01 4.030e+01, threshold=6.807e+01, percent-clipped=0.0 2023-12-23 02:09:49,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=898453.3333333334, ans=0.0 2023-12-23 02:10:12,329 INFO [train.py:886] (1/4) Epoch 29, batch 1350, loss[loss=0.01263, audio_tagging_loss=0.01263, over 25000.00 frames. ], tot_loss[loss=0.01285, audio_tagging_loss=0.01285, over 4946826.07 frames. ], batch size: 100, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:10:15,882 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.89 vs. limit=15.0 2023-12-23 02:10:16,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.06 vs. limit=15.0 2023-12-23 02:10:18,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=898653.3333333334, ans=0.2 2023-12-23 02:10:40,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=898786.6666666666, ans=0.0 2023-12-23 02:10:40,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=898786.6666666666, ans=0.1 2023-12-23 02:10:41,097 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.26 vs. limit=15.0 2023-12-23 02:10:45,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=898853.3333333334, ans=0.125 2023-12-23 02:10:56,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=898920.0, ans=0.0 2023-12-23 02:11:03,479 INFO [train.py:886] (1/4) Epoch 29, batch 1400, loss[loss=0.01258, audio_tagging_loss=0.01258, over 24750.00 frames. ], tot_loss[loss=0.01272, audio_tagging_loss=0.01272, over 4952384.96 frames. ], batch size: 99, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:11:07,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=898986.6666666666, ans=0.95 2023-12-23 02:11:14,759 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.846e+01 3.186e+01 3.286e+01 3.462e+01 3.963e+01, threshold=6.572e+01, percent-clipped=0.0 2023-12-23 02:11:17,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=899053.3333333334, ans=0.0 2023-12-23 02:11:40,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=899186.6666666666, ans=0.125 2023-12-23 02:11:45,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=899253.3333333334, ans=0.1 2023-12-23 02:11:48,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=899253.3333333334, ans=0.125 2023-12-23 02:11:53,860 INFO [train.py:886] (1/4) Epoch 29, batch 1450, loss[loss=0.01309, audio_tagging_loss=0.01309, over 25000.00 frames. ], tot_loss[loss=0.0127, audio_tagging_loss=0.0127, over 4959066.32 frames. ], batch size: 100, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:12:03,596 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.71 vs. limit=15.0 2023-12-23 02:12:03,599 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.86 vs. limit=15.0 2023-12-23 02:12:19,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=899453.3333333334, ans=0.0 2023-12-23 02:12:20,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=899453.3333333334, ans=0.125 2023-12-23 02:12:33,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=899586.6666666666, ans=0.5 2023-12-23 02:12:44,698 INFO [train.py:886] (1/4) Epoch 29, batch 1500, loss[loss=0.01569, audio_tagging_loss=0.01569, over 25000.00 frames. ], tot_loss[loss=0.0127, audio_tagging_loss=0.0127, over 4959794.23 frames. ], batch size: 100, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:12:51,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=899653.3333333334, ans=0.125 2023-12-23 02:12:54,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=899720.0, ans=0.125 2023-12-23 02:12:56,602 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.983e+01 3.233e+01 3.348e+01 3.462e+01 4.143e+01, threshold=6.696e+01, percent-clipped=0.0 2023-12-23 02:13:05,480 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=15.0 2023-12-23 02:13:08,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=899786.6666666666, ans=0.0 2023-12-23 02:13:36,105 INFO [train.py:886] (1/4) Epoch 29, batch 1550, loss[loss=0.01048, audio_tagging_loss=0.01048, over 24750.00 frames. ], tot_loss[loss=0.0128, audio_tagging_loss=0.0128, over 4957310.52 frames. ], batch size: 99, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:14:07,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=900186.6666666666, ans=0.1 2023-12-23 02:14:10,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=900186.6666666666, ans=0.05 2023-12-23 02:14:19,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=900253.3333333334, ans=0.125 2023-12-23 02:14:26,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=900320.0, ans=0.125 2023-12-23 02:14:27,187 INFO [train.py:886] (1/4) Epoch 29, batch 1600, loss[loss=0.01184, audio_tagging_loss=0.01184, over 24750.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4953631.68 frames. ], batch size: 99, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:14:28,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=900320.0, ans=0.2 2023-12-23 02:14:40,714 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.938e+01 3.277e+01 3.394e+01 3.581e+01 4.487e+01, threshold=6.788e+01, percent-clipped=0.0 2023-12-23 02:15:19,556 INFO [train.py:886] (1/4) Epoch 29, batch 1650, loss[loss=0.01279, audio_tagging_loss=0.01279, over 24750.00 frames. ], tot_loss[loss=0.01274, audio_tagging_loss=0.01274, over 4955612.52 frames. ], batch size: 99, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:15:35,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=900720.0, ans=0.0 2023-12-23 02:15:41,272 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.62 vs. limit=15.0 2023-12-23 02:15:45,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.27 vs. limit=15.0 2023-12-23 02:15:53,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=900853.3333333334, ans=0.0 2023-12-23 02:15:57,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=900853.3333333334, ans=0.2 2023-12-23 02:16:10,151 INFO [train.py:886] (1/4) Epoch 29, batch 1700, loss[loss=0.01346, audio_tagging_loss=0.01346, over 24750.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4957126.30 frames. ], batch size: 99, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:16:17,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=900986.6666666666, ans=0.0 2023-12-23 02:16:22,985 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.862e+01 3.200e+01 3.336e+01 3.521e+01 4.401e+01, threshold=6.671e+01, percent-clipped=0.0 2023-12-23 02:16:32,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=901120.0, ans=0.0 2023-12-23 02:16:53,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=901253.3333333334, ans=0.0 2023-12-23 02:17:01,612 INFO [train.py:886] (1/4) Epoch 29, batch 1750, loss[loss=0.01381, audio_tagging_loss=0.01381, over 25000.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4953481.67 frames. ], batch size: 100, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:17:08,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=901320.0, ans=0.125 2023-12-23 02:17:28,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=901453.3333333334, ans=0.0 2023-12-23 02:17:32,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=901520.0, ans=0.1 2023-12-23 02:17:35,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=901520.0, ans=0.0 2023-12-23 02:17:44,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=901586.6666666666, ans=0.125 2023-12-23 02:17:51,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=901586.6666666666, ans=0.1 2023-12-23 02:17:51,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=901586.6666666666, ans=0.2 2023-12-23 02:17:53,494 INFO [train.py:886] (1/4) Epoch 29, batch 1800, loss[loss=0.01251, audio_tagging_loss=0.01251, over 24750.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 4955528.06 frames. ], batch size: 99, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:17:54,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=901653.3333333334, ans=0.125 2023-12-23 02:18:01,584 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.66 vs. limit=15.0 2023-12-23 02:18:05,613 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.853e+01 3.204e+01 3.323e+01 3.491e+01 3.903e+01, threshold=6.647e+01, percent-clipped=0.0 2023-12-23 02:18:39,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=901920.0, ans=0.125 2023-12-23 02:18:44,441 INFO [train.py:886] (1/4) Epoch 29, batch 1850, loss[loss=0.01125, audio_tagging_loss=0.01125, over 25000.00 frames. ], tot_loss[loss=0.0127, audio_tagging_loss=0.0127, over 4957410.02 frames. ], batch size: 100, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:18:47,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=901986.6666666666, ans=0.0 2023-12-23 02:18:54,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=902053.3333333334, ans=0.0 2023-12-23 02:19:12,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=902120.0, ans=0.0 2023-12-23 02:19:15,450 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.96 vs. limit=15.0 2023-12-23 02:19:18,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=902186.6666666666, ans=0.0 2023-12-23 02:19:37,448 INFO [train.py:886] (1/4) Epoch 29, batch 1900, loss[loss=0.01155, audio_tagging_loss=0.01155, over 24750.00 frames. ], tot_loss[loss=0.01284, audio_tagging_loss=0.01284, over 4949146.75 frames. ], batch size: 99, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:19:48,704 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.018e+01 3.307e+01 3.435e+01 3.560e+01 3.951e+01, threshold=6.871e+01, percent-clipped=0.0 2023-12-23 02:20:05,226 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=15.0 2023-12-23 02:20:07,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=902520.0, ans=0.125 2023-12-23 02:20:17,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=902586.6666666666, ans=0.04949747468305833 2023-12-23 02:20:21,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=902586.6666666666, ans=0.125 2023-12-23 02:20:28,815 INFO [train.py:886] (1/4) Epoch 29, batch 1950, loss[loss=0.01044, audio_tagging_loss=0.01044, over 25000.00 frames. ], tot_loss[loss=0.01275, audio_tagging_loss=0.01275, over 4938767.68 frames. ], batch size: 100, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:20:42,135 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.45 vs. limit=15.0 2023-12-23 02:20:58,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=902786.6666666666, ans=0.0 2023-12-23 02:21:00,519 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.89 vs. limit=22.5 2023-12-23 02:21:08,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=902853.3333333334, ans=0.125 2023-12-23 02:21:17,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=902920.0, ans=0.0 2023-12-23 02:21:18,896 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 02:21:19,623 INFO [train.py:886] (1/4) Epoch 29, batch 2000, loss[loss=0.01129, audio_tagging_loss=0.01129, over 24750.00 frames. ], tot_loss[loss=0.01262, audio_tagging_loss=0.01262, over 4940493.59 frames. ], batch size: 99, lr: 3.73e-03, grad_scale: 64.0 2023-12-23 02:21:31,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=903053.3333333334, ans=0.05 2023-12-23 02:21:32,359 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.952e+01 3.186e+01 3.326e+01 3.515e+01 4.262e+01, threshold=6.651e+01, percent-clipped=0.0 2023-12-23 02:21:47,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=903120.0, ans=0.1 2023-12-23 02:22:03,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=903253.3333333334, ans=0.0 2023-12-23 02:22:10,993 INFO [train.py:886] (1/4) Epoch 29, batch 2050, loss[loss=0.01065, audio_tagging_loss=0.01065, over 25000.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 4943715.76 frames. ], batch size: 100, lr: 3.73e-03, grad_scale: 64.0 2023-12-23 02:22:13,829 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 02:22:25,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=903386.6666666666, ans=0.125 2023-12-23 02:22:27,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=903386.6666666666, ans=0.0 2023-12-23 02:22:28,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=903386.6666666666, ans=0.0 2023-12-23 02:22:39,575 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.46 vs. limit=15.0 2023-12-23 02:22:42,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=903520.0, ans=0.125 2023-12-23 02:22:45,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=903520.0, ans=0.1 2023-12-23 02:22:49,611 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2023-12-23 02:22:51,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=903520.0, ans=0.125 2023-12-23 02:22:55,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=903586.6666666666, ans=0.125 2023-12-23 02:22:59,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=903586.6666666666, ans=0.125 2023-12-23 02:23:00,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.22 vs. limit=12.0 2023-12-23 02:23:02,155 INFO [train.py:886] (1/4) Epoch 29, batch 2100, loss[loss=0.01381, audio_tagging_loss=0.01381, over 25000.00 frames. ], tot_loss[loss=0.01261, audio_tagging_loss=0.01261, over 4949189.86 frames. ], batch size: 100, lr: 3.73e-03, grad_scale: 64.0 2023-12-23 02:23:05,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=903653.3333333334, ans=0.1 2023-12-23 02:23:10,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=903653.3333333334, ans=0.0 2023-12-23 02:23:14,811 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.861e+01 3.167e+01 3.386e+01 3.538e+01 3.863e+01, threshold=6.772e+01, percent-clipped=0.0 2023-12-23 02:23:23,263 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.65 vs. limit=15.0 2023-12-23 02:23:25,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=903786.6666666666, ans=0.125 2023-12-23 02:23:37,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=903853.3333333334, ans=0.04949747468305833 2023-12-23 02:23:38,399 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.87 vs. limit=22.5 2023-12-23 02:23:54,327 INFO [train.py:886] (1/4) Epoch 29, batch 2150, loss[loss=0.01246, audio_tagging_loss=0.01246, over 25000.00 frames. ], tot_loss[loss=0.01264, audio_tagging_loss=0.01264, over 4954338.68 frames. ], batch size: 100, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:23:58,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=903986.6666666666, ans=0.1 2023-12-23 02:24:15,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=904120.0, ans=0.0 2023-12-23 02:24:45,883 INFO [train.py:886] (1/4) Epoch 29, batch 2200, loss[loss=0.01269, audio_tagging_loss=0.01269, over 24049.00 frames. ], tot_loss[loss=0.01274, audio_tagging_loss=0.01274, over 4953162.58 frames. ], batch size: 100, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:24:47,357 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=15.0 2023-12-23 02:24:49,123 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.92 vs. limit=12.0 2023-12-23 02:24:56,418 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.83 vs. limit=15.0 2023-12-23 02:24:57,913 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.956e+01 3.244e+01 3.385e+01 3.592e+01 6.696e+01, threshold=6.770e+01, percent-clipped=0.0 2023-12-23 02:25:01,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=904386.6666666666, ans=0.125 2023-12-23 02:25:15,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=904453.3333333334, ans=0.1 2023-12-23 02:25:25,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=904520.0, ans=0.125 2023-12-23 02:25:26,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=904586.6666666666, ans=0.125 2023-12-23 02:25:36,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=904586.6666666666, ans=0.0 2023-12-23 02:25:37,571 INFO [train.py:886] (1/4) Epoch 29, batch 2250, loss[loss=0.0127, audio_tagging_loss=0.0127, over 25000.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 4953237.06 frames. ], batch size: 100, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:25:45,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=904653.3333333334, ans=0.125 2023-12-23 02:26:06,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=904786.6666666666, ans=0.125 2023-12-23 02:26:07,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=904786.6666666666, ans=0.0 2023-12-23 02:26:08,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=904853.3333333334, ans=0.125 2023-12-23 02:26:25,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=904920.0, ans=0.1 2023-12-23 02:26:28,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=904986.6666666666, ans=0.125 2023-12-23 02:26:29,636 INFO [train.py:886] (1/4) Epoch 29, batch 2300, loss[loss=0.01126, audio_tagging_loss=0.01126, over 24750.00 frames. ], tot_loss[loss=0.01278, audio_tagging_loss=0.01278, over 4950563.08 frames. ], batch size: 99, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:26:41,539 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.860e+01 3.170e+01 3.367e+01 3.537e+01 3.962e+01, threshold=6.735e+01, percent-clipped=0.0 2023-12-23 02:27:04,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=905186.6666666666, ans=0.025 2023-12-23 02:27:05,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=905186.6666666666, ans=0.5 2023-12-23 02:27:15,101 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.65 vs. limit=15.0 2023-12-23 02:27:19,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.26 vs. limit=15.0 2023-12-23 02:27:21,212 INFO [train.py:886] (1/4) Epoch 29, batch 2350, loss[loss=0.01343, audio_tagging_loss=0.01343, over 25000.00 frames. ], tot_loss[loss=0.01266, audio_tagging_loss=0.01266, over 4952933.81 frames. ], batch size: 100, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:27:23,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=905320.0, ans=0.2 2023-12-23 02:27:29,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=905320.0, ans=0.125 2023-12-23 02:27:58,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=905520.0, ans=0.09899494936611666 2023-12-23 02:28:12,505 INFO [train.py:886] (1/4) Epoch 29, batch 2400, loss[loss=0.01281, audio_tagging_loss=0.01281, over 25000.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 4956134.22 frames. ], batch size: 100, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:28:25,400 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.824e+01 3.172e+01 3.352e+01 3.505e+01 4.147e+01, threshold=6.703e+01, percent-clipped=0.0 2023-12-23 02:28:28,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=905720.0, ans=0.0 2023-12-23 02:28:37,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=905786.6666666666, ans=0.0 2023-12-23 02:28:42,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=905853.3333333334, ans=0.125 2023-12-23 02:28:43,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=905853.3333333334, ans=0.1 2023-12-23 02:29:03,273 INFO [train.py:886] (1/4) Epoch 29, batch 2450, loss[loss=0.01314, audio_tagging_loss=0.01314, over 25000.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4958298.24 frames. ], batch size: 100, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:29:04,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=905986.6666666666, ans=0.035 2023-12-23 02:29:30,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=906120.0, ans=0.125 2023-12-23 02:29:43,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=906253.3333333334, ans=0.125 2023-12-23 02:29:54,562 INFO [train.py:886] (1/4) Epoch 29, batch 2500, loss[loss=0.01188, audio_tagging_loss=0.01188, over 24750.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 4955175.11 frames. ], batch size: 99, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:30:04,769 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.98 vs. limit=15.0 2023-12-23 02:30:05,667 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.42 vs. limit=15.0 2023-12-23 02:30:06,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=906386.6666666666, ans=0.1 2023-12-23 02:30:07,120 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.911e+01 3.261e+01 3.366e+01 3.535e+01 4.148e+01, threshold=6.733e+01, percent-clipped=0.0 2023-12-23 02:30:12,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=906386.6666666666, ans=0.0 2023-12-23 02:30:28,953 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.05 vs. limit=15.0 2023-12-23 02:30:46,637 INFO [train.py:886] (1/4) Epoch 29, batch 2550, loss[loss=0.01311, audio_tagging_loss=0.01311, over 25000.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4948967.25 frames. ], batch size: 100, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:31:28,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=906920.0, ans=0.0 2023-12-23 02:31:29,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=906920.0, ans=0.1 2023-12-23 02:31:38,795 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.24 vs. limit=15.0 2023-12-23 02:31:39,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=906920.0, ans=0.125 2023-12-23 02:31:41,242 INFO [train.py:886] (1/4) Epoch 29, batch 2600, loss[loss=0.01244, audio_tagging_loss=0.01244, over 24750.00 frames. ], tot_loss[loss=0.01272, audio_tagging_loss=0.01272, over 4953163.11 frames. ], batch size: 99, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:31:53,149 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.908e+01 3.274e+01 3.410e+01 3.573e+01 4.224e+01, threshold=6.821e+01, percent-clipped=0.0 2023-12-23 02:32:07,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=907120.0, ans=0.1 2023-12-23 02:32:10,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=907120.0, ans=0.125 2023-12-23 02:32:32,189 INFO [train.py:886] (1/4) Epoch 29, batch 2650, loss[loss=0.01163, audio_tagging_loss=0.01163, over 25000.00 frames. ], tot_loss[loss=0.01265, audio_tagging_loss=0.01265, over 4952075.89 frames. ], batch size: 100, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:32:50,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.39 vs. limit=15.0 2023-12-23 02:33:02,736 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.59 vs. limit=10.0 2023-12-23 02:33:15,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=907586.6666666666, ans=0.125 2023-12-23 02:33:25,041 INFO [train.py:886] (1/4) Epoch 29, batch 2700, loss[loss=0.01309, audio_tagging_loss=0.01309, over 25000.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4953637.82 frames. ], batch size: 100, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:33:34,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=907720.0, ans=0.2 2023-12-23 02:33:36,309 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.874e+01 3.174e+01 3.334e+01 3.479e+01 4.050e+01, threshold=6.667e+01, percent-clipped=0.0 2023-12-23 02:33:36,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=907720.0, ans=0.125 2023-12-23 02:34:08,871 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.01 vs. limit=6.0 2023-12-23 02:34:11,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=907920.0, ans=0.1 2023-12-23 02:34:14,948 INFO [train.py:886] (1/4) Epoch 29, batch 2750, loss[loss=0.01402, audio_tagging_loss=0.01402, over 25000.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4957577.56 frames. ], batch size: 100, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:34:16,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=907986.6666666666, ans=0.2 2023-12-23 02:34:22,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=907986.6666666666, ans=0.0 2023-12-23 02:34:37,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=908120.0, ans=0.125 2023-12-23 02:35:06,886 INFO [train.py:886] (1/4) Epoch 29, batch 2800, loss[loss=0.01278, audio_tagging_loss=0.01278, over 24750.00 frames. ], tot_loss[loss=0.01262, audio_tagging_loss=0.01262, over 4947234.87 frames. ], batch size: 99, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:35:18,190 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.855e+01 3.185e+01 3.347e+01 3.494e+01 4.048e+01, threshold=6.695e+01, percent-clipped=0.0 2023-12-23 02:35:23,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.13 vs. limit=22.5 2023-12-23 02:35:29,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=908453.3333333334, ans=0.125 2023-12-23 02:35:32,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=908453.3333333334, ans=0.125 2023-12-23 02:35:53,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.43 vs. limit=15.0 2023-12-23 02:35:58,181 INFO [train.py:886] (1/4) Epoch 29, batch 2850, loss[loss=0.01481, audio_tagging_loss=0.01481, over 24750.00 frames. ], tot_loss[loss=0.01264, audio_tagging_loss=0.01264, over 4943510.06 frames. ], batch size: 99, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:35:58,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=908653.3333333334, ans=0.125 2023-12-23 02:36:01,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=908653.3333333334, ans=0.2 2023-12-23 02:36:18,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=908786.6666666666, ans=0.125 2023-12-23 02:36:28,386 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.66 vs. limit=15.0 2023-12-23 02:36:34,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=908853.3333333334, ans=0.125 2023-12-23 02:36:44,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=908920.0, ans=0.0 2023-12-23 02:36:47,539 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2023-12-23 02:36:48,909 INFO [train.py:886] (1/4) Epoch 29, batch 2900, loss[loss=0.01165, audio_tagging_loss=0.01165, over 24750.00 frames. ], tot_loss[loss=0.01268, audio_tagging_loss=0.01268, over 4943012.95 frames. ], batch size: 99, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:37:02,340 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.712e+01 3.255e+01 3.355e+01 3.547e+01 3.874e+01, threshold=6.710e+01, percent-clipped=0.0 2023-12-23 02:37:03,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=909053.3333333334, ans=0.1 2023-12-23 02:37:16,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=909120.0, ans=0.1 2023-12-23 02:37:17,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=909120.0, ans=0.125 2023-12-23 02:37:19,772 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.03 vs. limit=15.0 2023-12-23 02:37:22,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=909186.6666666666, ans=15.0 2023-12-23 02:37:31,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=909253.3333333334, ans=0.125 2023-12-23 02:37:41,167 INFO [train.py:886] (1/4) Epoch 29, batch 2950, loss[loss=0.01143, audio_tagging_loss=0.01143, over 24750.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 4945501.01 frames. ], batch size: 99, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:37:57,862 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.85 vs. limit=12.0 2023-12-23 02:37:58,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.48 vs. limit=15.0 2023-12-23 02:38:05,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=909453.3333333334, ans=0.125 2023-12-23 02:38:32,181 INFO [train.py:886] (1/4) Epoch 29, batch 3000, loss[loss=0.01315, audio_tagging_loss=0.01315, over 25000.00 frames. ], tot_loss[loss=0.01261, audio_tagging_loss=0.01261, over 4949386.46 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:38:32,182 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 02:38:52,608 INFO [train.py:917] (1/4) Epoch 29, validation: loss=0.03351, audio_tagging_loss=0.03351, over 3737520.00 frames. 2023-12-23 02:38:52,608 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 02:38:56,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=909653.3333333334, ans=0.0 2023-12-23 02:39:02,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=909720.0, ans=0.125 2023-12-23 02:39:03,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=909720.0, ans=0.125 2023-12-23 02:39:06,001 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.927e+01 3.194e+01 3.342e+01 3.489e+01 4.333e+01, threshold=6.683e+01, percent-clipped=0.0 2023-12-23 02:39:45,472 INFO [train.py:886] (1/4) Epoch 29, batch 3050, loss[loss=0.01528, audio_tagging_loss=0.01528, over 25000.00 frames. ], tot_loss[loss=0.01261, audio_tagging_loss=0.01261, over 4947852.93 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:39:52,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=909986.6666666666, ans=0.0 2023-12-23 02:40:08,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=910120.0, ans=0.1 2023-12-23 02:40:08,985 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.29 vs. limit=15.0 2023-12-23 02:40:13,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=910120.0, ans=0.125 2023-12-23 02:40:15,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=910186.6666666666, ans=0.125 2023-12-23 02:40:34,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=910253.3333333334, ans=0.0 2023-12-23 02:40:36,999 INFO [train.py:886] (1/4) Epoch 29, batch 3100, loss[loss=0.01401, audio_tagging_loss=0.01401, over 24950.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 4948854.57 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:40:39,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=910320.0, ans=0.125 2023-12-23 02:40:39,943 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 02:40:42,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=910320.0, ans=0.125 2023-12-23 02:40:48,925 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.900e+01 3.214e+01 3.353e+01 3.491e+01 4.472e+01, threshold=6.707e+01, percent-clipped=0.0 2023-12-23 02:40:52,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=910386.6666666666, ans=0.125 2023-12-23 02:41:12,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=910520.0, ans=0.125 2023-12-23 02:41:27,512 INFO [train.py:886] (1/4) Epoch 29, batch 3150, loss[loss=0.01373, audio_tagging_loss=0.01373, over 24750.00 frames. ], tot_loss[loss=0.01277, audio_tagging_loss=0.01277, over 4949423.09 frames. ], batch size: 99, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:42:00,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=910853.3333333334, ans=0.125 2023-12-23 02:42:19,863 INFO [train.py:886] (1/4) Epoch 29, batch 3200, loss[loss=0.01257, audio_tagging_loss=0.01257, over 24750.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4942594.60 frames. ], batch size: 99, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:42:29,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=911053.3333333334, ans=0.125 2023-12-23 02:42:31,884 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.926e+01 3.238e+01 3.417e+01 3.574e+01 4.125e+01, threshold=6.834e+01, percent-clipped=0.0 2023-12-23 02:42:35,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=911053.3333333334, ans=0.125 2023-12-23 02:42:35,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=911053.3333333334, ans=0.0 2023-12-23 02:42:58,938 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.73 vs. limit=15.0 2023-12-23 02:43:11,990 INFO [train.py:886] (1/4) Epoch 29, batch 3250, loss[loss=0.01383, audio_tagging_loss=0.01383, over 25000.00 frames. ], tot_loss[loss=0.0127, audio_tagging_loss=0.0127, over 4942630.80 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:43:18,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=911320.0, ans=0.125 2023-12-23 02:43:21,887 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=15.0 2023-12-23 02:43:28,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=911386.6666666666, ans=0.0 2023-12-23 02:44:02,798 INFO [train.py:886] (1/4) Epoch 29, batch 3300, loss[loss=0.01252, audio_tagging_loss=0.01252, over 25000.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 4947568.22 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:44:11,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=911720.0, ans=0.04949747468305833 2023-12-23 02:44:14,884 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.877e+01 3.159e+01 3.341e+01 3.490e+01 4.241e+01, threshold=6.682e+01, percent-clipped=0.0 2023-12-23 02:44:16,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=911720.0, ans=0.125 2023-12-23 02:44:20,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=911720.0, ans=0.1 2023-12-23 02:44:23,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=911786.6666666666, ans=0.0 2023-12-23 02:44:27,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=911786.6666666666, ans=0.1 2023-12-23 02:44:31,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=911786.6666666666, ans=0.09899494936611666 2023-12-23 02:44:31,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=911786.6666666666, ans=0.2 2023-12-23 02:44:38,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=911853.3333333334, ans=0.125 2023-12-23 02:44:53,152 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.67 vs. limit=15.0 2023-12-23 02:44:53,541 INFO [train.py:886] (1/4) Epoch 29, batch 3350, loss[loss=0.01381, audio_tagging_loss=0.01381, over 25000.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4951270.52 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:45:17,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=912120.0, ans=0.1 2023-12-23 02:45:28,339 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.13 vs. limit=15.0 2023-12-23 02:45:43,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=912253.3333333334, ans=0.125 2023-12-23 02:45:45,273 INFO [train.py:886] (1/4) Epoch 29, batch 3400, loss[loss=0.01325, audio_tagging_loss=0.01325, over 25000.00 frames. ], tot_loss[loss=0.01262, audio_tagging_loss=0.01262, over 4960286.89 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:45:48,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=912320.0, ans=0.0 2023-12-23 02:45:50,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=912320.0, ans=0.0 2023-12-23 02:45:57,955 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.777e+01 3.196e+01 3.334e+01 3.498e+01 4.044e+01, threshold=6.669e+01, percent-clipped=0.0 2023-12-23 02:46:03,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=912386.6666666666, ans=0.125 2023-12-23 02:46:04,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=912386.6666666666, ans=0.125 2023-12-23 02:46:17,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=912520.0, ans=0.025 2023-12-23 02:46:25,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=912586.6666666666, ans=0.125 2023-12-23 02:46:36,487 INFO [train.py:886] (1/4) Epoch 29, batch 3450, loss[loss=0.01266, audio_tagging_loss=0.01266, over 24750.00 frames. ], tot_loss[loss=0.01272, audio_tagging_loss=0.01272, over 4954324.17 frames. ], batch size: 99, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:46:53,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=912720.0, ans=0.1 2023-12-23 02:46:55,142 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 02:47:01,058 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.44 vs. limit=22.5 2023-12-23 02:47:08,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=912853.3333333334, ans=0.125 2023-12-23 02:47:28,191 INFO [train.py:886] (1/4) Epoch 29, batch 3500, loss[loss=0.01203, audio_tagging_loss=0.01203, over 25000.00 frames. ], tot_loss[loss=0.01274, audio_tagging_loss=0.01274, over 4947856.84 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:47:40,255 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.822e+01 3.263e+01 3.374e+01 3.511e+01 3.785e+01, threshold=6.748e+01, percent-clipped=0.0 2023-12-23 02:47:59,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=913186.6666666666, ans=0.125 2023-12-23 02:48:06,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=913186.6666666666, ans=0.0 2023-12-23 02:48:18,280 INFO [train.py:886] (1/4) Epoch 29, batch 3550, loss[loss=0.01557, audio_tagging_loss=0.01557, over 25000.00 frames. ], tot_loss[loss=0.01277, audio_tagging_loss=0.01277, over 4950158.16 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:48:48,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=913520.0, ans=0.1 2023-12-23 02:48:51,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=913520.0, ans=0.0 2023-12-23 02:48:55,358 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.74 vs. limit=10.0 2023-12-23 02:48:59,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=913586.6666666666, ans=0.0 2023-12-23 02:49:11,427 INFO [train.py:886] (1/4) Epoch 29, batch 3600, loss[loss=0.01443, audio_tagging_loss=0.01443, over 24750.00 frames. ], tot_loss[loss=0.01262, audio_tagging_loss=0.01262, over 4955246.30 frames. ], batch size: 99, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:49:14,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.57 vs. limit=22.5 2023-12-23 02:49:17,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=913653.3333333334, ans=0.0 2023-12-23 02:49:22,666 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.831e+01 3.175e+01 3.361e+01 3.491e+01 3.935e+01, threshold=6.721e+01, percent-clipped=0.0 2023-12-23 02:49:29,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=913720.0, ans=0.125 2023-12-23 02:49:40,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=913853.3333333334, ans=0.0 2023-12-23 02:49:49,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=913853.3333333334, ans=0.125 2023-12-23 02:49:53,950 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 02:50:02,208 INFO [train.py:886] (1/4) Epoch 29, batch 3650, loss[loss=0.01239, audio_tagging_loss=0.01239, over 25000.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4957307.78 frames. ], batch size: 100, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:50:03,622 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.29 vs. limit=15.0 2023-12-23 02:50:12,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=914053.3333333334, ans=0.125 2023-12-23 02:50:12,584 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.06 vs. limit=15.0 2023-12-23 02:50:23,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=914120.0, ans=0.125 2023-12-23 02:50:28,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=914120.0, ans=0.2 2023-12-23 02:50:29,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=914120.0, ans=0.125 2023-12-23 02:50:31,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=914186.6666666666, ans=0.125 2023-12-23 02:50:54,402 INFO [train.py:886] (1/4) Epoch 29, batch 3700, loss[loss=0.01185, audio_tagging_loss=0.01185, over 25000.00 frames. ], tot_loss[loss=0.01259, audio_tagging_loss=0.01259, over 4961093.30 frames. ], batch size: 100, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:51:05,788 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.954e+01 3.187e+01 3.336e+01 3.523e+01 3.935e+01, threshold=6.671e+01, percent-clipped=0.0 2023-12-23 02:51:23,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=914453.3333333334, ans=0.05 2023-12-23 02:51:35,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=914586.6666666666, ans=0.1 2023-12-23 02:51:46,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=914653.3333333334, ans=0.1 2023-12-23 02:51:46,767 INFO [train.py:886] (1/4) Epoch 29, batch 3750, loss[loss=0.01159, audio_tagging_loss=0.01159, over 24750.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4949188.52 frames. ], batch size: 99, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:51:49,407 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.94 vs. limit=15.0 2023-12-23 02:51:52,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=914653.3333333334, ans=0.2 2023-12-23 02:51:55,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=914653.3333333334, ans=0.2 2023-12-23 02:52:23,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=914853.3333333334, ans=0.1 2023-12-23 02:52:27,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=914920.0, ans=0.125 2023-12-23 02:52:37,228 INFO [train.py:886] (1/4) Epoch 29, batch 3800, loss[loss=0.01449, audio_tagging_loss=0.01449, over 24750.00 frames. ], tot_loss[loss=0.01283, audio_tagging_loss=0.01283, over 4943959.47 frames. ], batch size: 99, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:52:44,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=914986.6666666666, ans=0.2 2023-12-23 02:52:47,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=915053.3333333334, ans=0.125 2023-12-23 02:52:50,531 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.921e+01 3.295e+01 3.394e+01 3.553e+01 4.650e+01, threshold=6.788e+01, percent-clipped=0.0 2023-12-23 02:52:51,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=915053.3333333334, ans=0.0 2023-12-23 02:52:51,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=915053.3333333334, ans=0.2 2023-12-23 02:53:29,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=915320.0, ans=0.0 2023-12-23 02:53:29,911 INFO [train.py:886] (1/4) Epoch 29, batch 3850, loss[loss=0.009509, audio_tagging_loss=0.009509, over 25000.00 frames. ], tot_loss[loss=0.01278, audio_tagging_loss=0.01278, over 4944639.22 frames. ], batch size: 100, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:53:49,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=915453.3333333334, ans=0.0 2023-12-23 02:53:52,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=915453.3333333334, ans=0.0 2023-12-23 02:53:57,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=915453.3333333334, ans=0.025 2023-12-23 02:53:58,730 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.12 vs. limit=22.5 2023-12-23 02:54:04,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=915520.0, ans=0.125 2023-12-23 02:54:10,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=915586.6666666666, ans=0.125 2023-12-23 02:54:19,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=915653.3333333334, ans=0.0 2023-12-23 02:54:21,531 INFO [train.py:886] (1/4) Epoch 29, batch 3900, loss[loss=0.01104, audio_tagging_loss=0.01104, over 25000.00 frames. ], tot_loss[loss=0.01268, audio_tagging_loss=0.01268, over 4945120.41 frames. ], batch size: 100, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:54:24,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=15.0 2023-12-23 02:54:26,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=915653.3333333334, ans=0.0 2023-12-23 02:54:34,395 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.806e+01 3.127e+01 3.290e+01 3.423e+01 3.887e+01, threshold=6.579e+01, percent-clipped=0.0 2023-12-23 02:54:43,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=915786.6666666666, ans=0.1 2023-12-23 02:54:43,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=915786.6666666666, ans=0.125 2023-12-23 02:54:44,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=915786.6666666666, ans=0.125 2023-12-23 02:55:08,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=915920.0, ans=0.125 2023-12-23 02:55:09,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=915920.0, ans=0.2 2023-12-23 02:55:13,362 INFO [train.py:886] (1/4) Epoch 29, batch 3950, loss[loss=0.01279, audio_tagging_loss=0.01279, over 24750.00 frames. ], tot_loss[loss=0.01265, audio_tagging_loss=0.01265, over 4944505.88 frames. ], batch size: 99, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:55:19,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=915986.6666666666, ans=0.0 2023-12-23 02:55:33,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=916120.0, ans=0.0 2023-12-23 02:55:39,775 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.78 vs. limit=22.5 2023-12-23 02:55:44,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=916186.6666666666, ans=0.125 2023-12-23 02:55:53,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=916253.3333333334, ans=0.0 2023-12-23 02:55:56,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=916253.3333333334, ans=0.125 2023-12-23 02:56:05,088 INFO [train.py:886] (1/4) Epoch 29, batch 4000, loss[loss=0.01268, audio_tagging_loss=0.01268, over 25000.00 frames. ], tot_loss[loss=0.01264, audio_tagging_loss=0.01264, over 4947364.11 frames. ], batch size: 100, lr: 3.70e-03, grad_scale: 128.0 2023-12-23 02:56:12,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=916320.0, ans=0.125 2023-12-23 02:56:17,053 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.016e+01 3.298e+01 3.392e+01 3.535e+01 4.607e+01, threshold=6.784e+01, percent-clipped=0.0 2023-12-23 02:56:19,588 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.29 vs. limit=15.0 2023-12-23 02:56:27,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=916453.3333333334, ans=0.1 2023-12-23 02:56:33,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=916453.3333333334, ans=0.125 2023-12-23 02:56:36,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=916520.0, ans=0.1 2023-12-23 02:56:42,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=916520.0, ans=0.125 2023-12-23 02:56:44,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=916586.6666666666, ans=10.0 2023-12-23 02:56:54,925 INFO [train.py:886] (1/4) Epoch 29, batch 4050, loss[loss=0.01395, audio_tagging_loss=0.01395, over 24750.00 frames. ], tot_loss[loss=0.01271, audio_tagging_loss=0.01271, over 4945824.00 frames. ], batch size: 99, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:56:56,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=916653.3333333334, ans=0.2 2023-12-23 02:57:07,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=916720.0, ans=0.125 2023-12-23 02:57:16,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=916786.6666666666, ans=0.0 2023-12-23 02:57:42,023 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.93 vs. limit=22.5 2023-12-23 02:57:47,883 INFO [train.py:886] (1/4) Epoch 29, batch 4100, loss[loss=0.01379, audio_tagging_loss=0.01379, over 24750.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4941456.87 frames. ], batch size: 99, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:58:00,225 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.035e+01 3.272e+01 3.403e+01 3.607e+01 4.047e+01, threshold=6.806e+01, percent-clipped=0.0 2023-12-23 02:58:07,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=917120.0, ans=0.0 2023-12-23 02:58:09,757 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.53 vs. limit=15.0 2023-12-23 02:58:13,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=917120.0, ans=0.0 2023-12-23 02:58:17,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=917120.0, ans=0.125 2023-12-23 02:58:19,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=917186.6666666666, ans=0.0 2023-12-23 02:58:23,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=917186.6666666666, ans=0.09899494936611666 2023-12-23 02:58:24,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=917186.6666666666, ans=0.0 2023-12-23 02:58:24,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=917186.6666666666, ans=0.125 2023-12-23 02:58:28,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=917253.3333333334, ans=0.5 2023-12-23 02:58:28,723 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 02:58:39,294 INFO [train.py:886] (1/4) Epoch 29, batch 4150, loss[loss=0.01138, audio_tagging_loss=0.01138, over 25000.00 frames. ], tot_loss[loss=0.01272, audio_tagging_loss=0.01272, over 4941705.23 frames. ], batch size: 100, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:59:00,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=917453.3333333334, ans=0.125 2023-12-23 02:59:12,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=917520.0, ans=0.125 2023-12-23 02:59:23,200 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=15.0 2023-12-23 02:59:25,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=917586.6666666666, ans=0.07 2023-12-23 02:59:27,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=917586.6666666666, ans=0.125 2023-12-23 02:59:30,206 INFO [train.py:886] (1/4) Epoch 29, batch 4200, loss[loss=0.01084, audio_tagging_loss=0.01084, over 24048.00 frames. ], tot_loss[loss=0.01271, audio_tagging_loss=0.01271, over 4945128.44 frames. ], batch size: 100, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:59:43,290 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.970e+01 3.179e+01 3.334e+01 3.481e+01 3.882e+01, threshold=6.668e+01, percent-clipped=0.0 2023-12-23 03:00:02,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=917853.3333333334, ans=0.125 2023-12-23 03:00:04,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=917853.3333333334, ans=0.125 2023-12-23 03:00:21,204 INFO [train.py:886] (1/4) Epoch 29, batch 4250, loss[loss=0.01171, audio_tagging_loss=0.01171, over 24074.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 4947660.62 frames. ], batch size: 100, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 03:00:24,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=917986.6666666666, ans=0.125 2023-12-23 03:00:45,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=918120.0, ans=0.125 2023-12-23 03:00:49,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=918120.0, ans=0.0 2023-12-23 03:00:58,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=918186.6666666666, ans=0.125 2023-12-23 03:01:05,780 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.25 vs. limit=15.0 2023-12-23 03:01:06,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=918253.3333333334, ans=0.0 2023-12-23 03:01:11,962 INFO [train.py:886] (1/4) Epoch 29, batch 4300, loss[loss=0.01257, audio_tagging_loss=0.01257, over 25000.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 4953073.82 frames. ], batch size: 100, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 03:01:25,827 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.896e+01 3.260e+01 3.361e+01 3.479e+01 4.825e+01, threshold=6.722e+01, percent-clipped=0.0 2023-12-23 03:01:26,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=918386.6666666666, ans=0.125 2023-12-23 03:01:28,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=918386.6666666666, ans=0.0 2023-12-23 03:01:39,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=918453.3333333334, ans=0.2 2023-12-23 03:01:43,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=918520.0, ans=0.0 2023-12-23 03:01:54,107 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.25 vs. limit=22.5 2023-12-23 03:02:02,826 INFO [train.py:886] (1/4) Epoch 29, batch 4350, loss[loss=0.01055, audio_tagging_loss=0.01055, over 24750.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 4954821.10 frames. ], batch size: 99, lr: 3.69e-03, grad_scale: 64.0 2023-12-23 03:02:26,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=918786.6666666666, ans=0.125 2023-12-23 03:02:27,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=918786.6666666666, ans=0.0 2023-12-23 03:02:29,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=918786.6666666666, ans=0.125 2023-12-23 03:02:30,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=918786.6666666666, ans=0.0 2023-12-23 03:02:33,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=918853.3333333334, ans=0.2 2023-12-23 03:02:39,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=918853.3333333334, ans=0.125 2023-12-23 03:02:49,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=918920.0, ans=0.125 2023-12-23 03:02:54,089 INFO [train.py:886] (1/4) Epoch 29, batch 4400, loss[loss=0.01322, audio_tagging_loss=0.01322, over 24040.00 frames. ], tot_loss[loss=0.01283, audio_tagging_loss=0.01283, over 4947968.50 frames. ], batch size: 100, lr: 3.69e-03, grad_scale: 64.0 2023-12-23 03:03:06,952 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.926e+01 3.260e+01 3.393e+01 3.638e+01 4.099e+01, threshold=6.787e+01, percent-clipped=0.0 2023-12-23 03:03:15,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=919120.0, ans=0.0 2023-12-23 03:03:22,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=919120.0, ans=0.0 2023-12-23 03:03:38,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=919253.3333333334, ans=0.125 2023-12-23 03:03:45,716 INFO [train.py:886] (1/4) Epoch 29, batch 4450, loss[loss=0.01376, audio_tagging_loss=0.01376, over 25000.00 frames. ], tot_loss[loss=0.01286, audio_tagging_loss=0.01286, over 4945027.72 frames. ], batch size: 100, lr: 3.69e-03, grad_scale: 64.0 2023-12-23 03:04:20,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=919520.0, ans=0.0 2023-12-23 03:04:32,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=919586.6666666666, ans=0.125 2023-12-23 03:04:38,030 INFO [train.py:886] (1/4) Epoch 29, batch 4500, loss[loss=0.01243, audio_tagging_loss=0.01243, over 24750.00 frames. ], tot_loss[loss=0.0127, audio_tagging_loss=0.0127, over 4937207.47 frames. ], batch size: 99, lr: 3.69e-03, grad_scale: 64.0 2023-12-23 03:04:39,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=919653.3333333334, ans=0.0 2023-12-23 03:04:41,370 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.55 vs. limit=15.0 2023-12-23 03:04:42,957 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:04:51,065 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.905e+01 3.193e+01 3.342e+01 3.524e+01 4.106e+01, threshold=6.683e+01, percent-clipped=0.0 2023-12-23 03:04:51,744 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.42 vs. limit=6.0 2023-12-23 03:05:01,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=919786.6666666666, ans=0.1 2023-12-23 03:05:12,932 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.74 vs. limit=15.0 2023-12-23 03:05:29,610 INFO [train.py:886] (1/4) Epoch 29, batch 4550, loss[loss=0.01067, audio_tagging_loss=0.01067, over 25000.00 frames. ], tot_loss[loss=0.0127, audio_tagging_loss=0.0127, over 4945758.55 frames. ], batch size: 100, lr: 3.69e-03, grad_scale: 64.0 2023-12-23 03:05:40,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=920053.3333333334, ans=0.0 2023-12-23 03:05:46,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=920053.3333333334, ans=0.2 2023-12-23 03:05:48,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=920053.3333333334, ans=0.0 2023-12-23 03:05:52,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=920120.0, ans=0.0 2023-12-23 03:05:54,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=920120.0, ans=0.125 2023-12-23 03:06:18,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=920253.3333333334, ans=0.125 2023-12-23 03:06:20,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=920320.0, ans=0.2 2023-12-23 03:06:21,698 INFO [train.py:886] (1/4) Epoch 29, batch 4600, loss[loss=0.01274, audio_tagging_loss=0.01274, over 25000.00 frames. ], tot_loss[loss=0.01259, audio_tagging_loss=0.01259, over 4954847.19 frames. ], batch size: 100, lr: 3.69e-03, grad_scale: 64.0 2023-12-23 03:06:21,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=920320.0, ans=0.125 2023-12-23 03:06:24,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=920320.0, ans=0.125 2023-12-23 03:06:35,367 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.907e+01 3.235e+01 3.344e+01 3.450e+01 3.985e+01, threshold=6.687e+01, percent-clipped=0.0 2023-12-23 03:06:38,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=920386.6666666666, ans=0.125 2023-12-23 03:06:56,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=920520.0, ans=0.035 2023-12-23 03:06:58,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=920520.0, ans=0.0 2023-12-23 03:07:00,259 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.76 vs. limit=15.0 2023-12-23 03:07:13,870 INFO [train.py:886] (1/4) Epoch 29, batch 4650, loss[loss=0.01466, audio_tagging_loss=0.01466, over 25000.00 frames. ], tot_loss[loss=0.01262, audio_tagging_loss=0.01262, over 4953323.16 frames. ], batch size: 100, lr: 3.69e-03, grad_scale: 64.0 2023-12-23 03:07:22,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=920653.3333333334, ans=0.125 2023-12-23 03:07:23,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=920720.0, ans=0.0 2023-12-23 03:07:31,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=920720.0, ans=0.2 2023-12-23 03:07:38,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=920786.6666666666, ans=0.1 2023-12-23 03:07:50,809 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.41 vs. limit=15.0 2023-12-23 03:08:04,173 INFO [train.py:886] (1/4) Epoch 29, batch 4700, loss[loss=0.01385, audio_tagging_loss=0.01385, over 25000.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 4951008.32 frames. ], batch size: 100, lr: 3.69e-03, grad_scale: 64.0 2023-12-23 03:08:06,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=920986.6666666666, ans=0.1 2023-12-23 03:08:15,990 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.987e+01 3.319e+01 3.435e+01 3.584e+01 4.089e+01, threshold=6.869e+01, percent-clipped=0.0 2023-12-23 03:08:28,718 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.10 vs. limit=15.0 2023-12-23 03:08:31,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=921186.6666666666, ans=0.125 2023-12-23 03:08:51,956 INFO [train.py:886] (1/4) Epoch 29, batch 4750, loss[loss=0.01171, audio_tagging_loss=0.01171, over 24750.00 frames. ], tot_loss[loss=0.01272, audio_tagging_loss=0.01272, over 4947674.87 frames. ], batch size: 99, lr: 3.69e-03, grad_scale: 64.0 2023-12-23 03:09:26,349 INFO [train.py:886] (1/4) Epoch 30, batch 0, loss[loss=0.03506, audio_tagging_loss=0.03506, over 19933.00 frames. ], tot_loss[loss=0.03506, audio_tagging_loss=0.03506, over 19933.00 frames. ], batch size: 107, lr: 3.63e-03, grad_scale: 32.0 2023-12-23 03:09:26,350 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 03:09:33,886 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.6048, 5.7861, 5.3073, 5.5960], device='cuda:1') 2023-12-23 03:09:47,395 INFO [train.py:917] (1/4) Epoch 30, validation: loss=0.03363, audio_tagging_loss=0.03363, over 3737520.00 frames. 2023-12-23 03:09:47,396 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 03:09:59,130 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.74 vs. limit=15.0 2023-12-23 03:10:02,831 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.96 vs. limit=15.0 2023-12-23 03:10:03,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=921493.3333333334, ans=0.125 2023-12-23 03:10:06,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=921560.0, ans=0.0 2023-12-23 03:10:17,177 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.22 vs. limit=15.0 2023-12-23 03:10:22,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=921626.6666666666, ans=0.125 2023-12-23 03:10:33,198 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.95 vs. limit=22.5 2023-12-23 03:10:35,649 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.955e+01 3.394e+01 3.725e+01 4.735e+01 9.451e+01, threshold=7.450e+01, percent-clipped=7.0 2023-12-23 03:10:37,600 INFO [train.py:886] (1/4) Epoch 30, batch 50, loss[loss=0.01755, audio_tagging_loss=0.01755, over 25000.00 frames. ], tot_loss[loss=0.0201, audio_tagging_loss=0.0201, over 1113304.63 frames. ], batch size: 100, lr: 3.63e-03, grad_scale: 32.0 2023-12-23 03:10:48,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=921826.6666666666, ans=0.125 2023-12-23 03:11:00,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=921893.3333333334, ans=0.125 2023-12-23 03:11:02,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=921893.3333333334, ans=0.0 2023-12-23 03:11:06,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=921893.3333333334, ans=0.1 2023-12-23 03:11:12,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=921960.0, ans=0.125 2023-12-23 03:11:19,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=922026.6666666666, ans=0.1 2023-12-23 03:11:30,722 INFO [train.py:886] (1/4) Epoch 30, batch 100, loss[loss=0.01511, audio_tagging_loss=0.01511, over 24750.00 frames. ], tot_loss[loss=0.01748, audio_tagging_loss=0.01748, over 1969250.60 frames. ], batch size: 99, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:11:37,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=922093.3333333334, ans=0.5 2023-12-23 03:11:41,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=922160.0, ans=0.125 2023-12-23 03:11:43,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=922160.0, ans=0.0 2023-12-23 03:12:10,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=922360.0, ans=0.05 2023-12-23 03:12:14,070 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.07 vs. limit=15.0 2023-12-23 03:12:18,429 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.004e+01 3.542e+01 3.730e+01 3.937e+01 4.567e+01, threshold=7.459e+01, percent-clipped=0.0 2023-12-23 03:12:20,994 INFO [train.py:886] (1/4) Epoch 30, batch 150, loss[loss=0.01323, audio_tagging_loss=0.01323, over 24750.00 frames. ], tot_loss[loss=0.01573, audio_tagging_loss=0.01573, over 2634224.93 frames. ], batch size: 99, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:12:36,718 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.26 vs. limit=15.0 2023-12-23 03:12:49,069 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.36 vs. limit=15.0 2023-12-23 03:12:49,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=922560.0, ans=0.0 2023-12-23 03:12:57,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=922626.6666666666, ans=0.0 2023-12-23 03:13:01,592 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.98 vs. limit=15.0 2023-12-23 03:13:02,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=922626.6666666666, ans=0.125 2023-12-23 03:13:05,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=922693.3333333334, ans=0.125 2023-12-23 03:13:06,137 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.41 vs. limit=15.0 2023-12-23 03:13:08,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=922693.3333333334, ans=0.125 2023-12-23 03:13:10,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=922693.3333333334, ans=0.125 2023-12-23 03:13:13,257 INFO [train.py:886] (1/4) Epoch 30, batch 200, loss[loss=0.01267, audio_tagging_loss=0.01267, over 25000.00 frames. ], tot_loss[loss=0.01487, audio_tagging_loss=0.01487, over 3154619.84 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:13:20,461 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.63 vs. limit=10.0 2023-12-23 03:13:24,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=922826.6666666666, ans=0.07 2023-12-23 03:13:25,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=922826.6666666666, ans=0.0 2023-12-23 03:14:00,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=923026.6666666666, ans=0.2 2023-12-23 03:14:02,675 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.057e+01 3.258e+01 3.386e+01 3.497e+01 4.082e+01, threshold=6.772e+01, percent-clipped=0.0 2023-12-23 03:14:05,237 INFO [train.py:886] (1/4) Epoch 30, batch 250, loss[loss=0.01189, audio_tagging_loss=0.01189, over 25000.00 frames. ], tot_loss[loss=0.01426, audio_tagging_loss=0.01426, over 3548487.73 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:14:05,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=923093.3333333334, ans=0.125 2023-12-23 03:14:45,213 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.75 vs. limit=15.0 2023-12-23 03:14:49,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=923360.0, ans=0.0 2023-12-23 03:14:56,018 INFO [train.py:886] (1/4) Epoch 30, batch 300, loss[loss=0.0145, audio_tagging_loss=0.0145, over 24750.00 frames. ], tot_loss[loss=0.01386, audio_tagging_loss=0.01386, over 3857606.22 frames. ], batch size: 99, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:15:04,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=923426.6666666666, ans=0.2 2023-12-23 03:15:08,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=923493.3333333334, ans=0.09899494936611666 2023-12-23 03:15:18,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=923560.0, ans=0.125 2023-12-23 03:15:33,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=923626.6666666666, ans=0.125 2023-12-23 03:15:38,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=923693.3333333334, ans=0.125 2023-12-23 03:15:46,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=923693.3333333334, ans=0.0 2023-12-23 03:15:46,813 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.964e+01 3.195e+01 3.394e+01 3.540e+01 4.201e+01, threshold=6.789e+01, percent-clipped=0.0 2023-12-23 03:15:48,772 INFO [train.py:886] (1/4) Epoch 30, batch 350, loss[loss=0.01372, audio_tagging_loss=0.01372, over 25000.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4092653.73 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:16:00,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=923826.6666666666, ans=0.125 2023-12-23 03:16:16,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=923893.3333333334, ans=0.015 2023-12-23 03:16:17,287 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.95 vs. limit=10.0 2023-12-23 03:16:21,071 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.77 vs. limit=10.0 2023-12-23 03:16:26,083 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.03 vs. limit=15.0 2023-12-23 03:16:28,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=924026.6666666666, ans=0.0 2023-12-23 03:16:38,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=924026.6666666666, ans=0.125 2023-12-23 03:16:39,593 INFO [train.py:886] (1/4) Epoch 30, batch 400, loss[loss=0.01219, audio_tagging_loss=0.01219, over 25000.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4279376.00 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:17:00,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=924226.6666666666, ans=0.2 2023-12-23 03:17:10,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=924293.3333333334, ans=0.0 2023-12-23 03:17:18,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=924293.3333333334, ans=0.1 2023-12-23 03:17:30,132 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.820e+01 3.158e+01 3.345e+01 3.496e+01 3.973e+01, threshold=6.691e+01, percent-clipped=0.0 2023-12-23 03:17:32,074 INFO [train.py:886] (1/4) Epoch 30, batch 450, loss[loss=0.0122, audio_tagging_loss=0.0122, over 25000.00 frames. ], tot_loss[loss=0.01299, audio_tagging_loss=0.01299, over 4429917.93 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:17:38,273 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.63 vs. limit=15.0 2023-12-23 03:17:43,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=924493.3333333334, ans=0.125 2023-12-23 03:17:49,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=924493.3333333334, ans=0.2 2023-12-23 03:17:58,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=924560.0, ans=0.125 2023-12-23 03:18:12,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=924693.3333333334, ans=0.125 2023-12-23 03:18:24,807 INFO [train.py:886] (1/4) Epoch 30, batch 500, loss[loss=0.01242, audio_tagging_loss=0.01242, over 25000.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4546542.39 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:18:26,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=924760.0, ans=0.0 2023-12-23 03:18:28,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=924760.0, ans=0.125 2023-12-23 03:18:41,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=924826.6666666666, ans=0.015 2023-12-23 03:18:41,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=924826.6666666666, ans=0.125 2023-12-23 03:18:45,126 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.37 vs. limit=15.0 2023-12-23 03:18:48,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=924893.3333333334, ans=0.125 2023-12-23 03:18:49,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=924893.3333333334, ans=15.0 2023-12-23 03:18:50,448 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.17 vs. limit=15.0 2023-12-23 03:19:02,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=924960.0, ans=0.125 2023-12-23 03:19:06,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.77 vs. limit=15.0 2023-12-23 03:19:13,979 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.840e+01 3.274e+01 3.359e+01 3.528e+01 4.075e+01, threshold=6.718e+01, percent-clipped=0.0 2023-12-23 03:19:15,867 INFO [train.py:886] (1/4) Epoch 30, batch 550, loss[loss=0.01197, audio_tagging_loss=0.01197, over 24750.00 frames. ], tot_loss[loss=0.01271, audio_tagging_loss=0.01271, over 4641531.16 frames. ], batch size: 99, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:19:37,721 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:19:37,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=925226.6666666666, ans=0.125 2023-12-23 03:19:54,940 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.93 vs. limit=15.0 2023-12-23 03:19:59,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=925360.0, ans=0.0 2023-12-23 03:20:01,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=925360.0, ans=0.0 2023-12-23 03:20:08,026 INFO [train.py:886] (1/4) Epoch 30, batch 600, loss[loss=0.01192, audio_tagging_loss=0.01192, over 24750.00 frames. ], tot_loss[loss=0.01278, audio_tagging_loss=0.01278, over 4710812.74 frames. ], batch size: 99, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:20:33,176 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.17 vs. limit=15.0 2023-12-23 03:20:52,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=925693.3333333334, ans=0.0 2023-12-23 03:20:57,413 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.002e+01 3.230e+01 3.382e+01 3.569e+01 4.053e+01, threshold=6.765e+01, percent-clipped=0.0 2023-12-23 03:20:59,380 INFO [train.py:886] (1/4) Epoch 30, batch 650, loss[loss=0.01169, audio_tagging_loss=0.01169, over 25000.00 frames. ], tot_loss[loss=0.01276, audio_tagging_loss=0.01276, over 4761925.87 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:21:11,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=925826.6666666666, ans=0.0 2023-12-23 03:21:29,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=925960.0, ans=0.1 2023-12-23 03:21:33,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=925960.0, ans=0.125 2023-12-23 03:21:33,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=925960.0, ans=0.025 2023-12-23 03:21:36,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=925960.0, ans=0.0 2023-12-23 03:21:50,665 INFO [train.py:886] (1/4) Epoch 30, batch 700, loss[loss=0.01124, audio_tagging_loss=0.01124, over 25000.00 frames. ], tot_loss[loss=0.01278, audio_tagging_loss=0.01278, over 4805738.76 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:21:51,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=926093.3333333334, ans=0.125 2023-12-23 03:22:08,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=926160.0, ans=0.0 2023-12-23 03:22:10,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=926160.0, ans=0.2 2023-12-23 03:22:10,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=926226.6666666666, ans=0.125 2023-12-23 03:22:11,432 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.05 vs. limit=15.0 2023-12-23 03:22:15,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=926226.6666666666, ans=22.5 2023-12-23 03:22:38,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=926360.0, ans=0.1 2023-12-23 03:22:41,096 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.868e+01 3.229e+01 3.388e+01 3.605e+01 3.882e+01, threshold=6.777e+01, percent-clipped=0.0 2023-12-23 03:22:43,043 INFO [train.py:886] (1/4) Epoch 30, batch 750, loss[loss=0.01294, audio_tagging_loss=0.01294, over 25000.00 frames. ], tot_loss[loss=0.0128, audio_tagging_loss=0.0128, over 4840479.90 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:22:43,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=926426.6666666666, ans=0.0 2023-12-23 03:22:52,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=926493.3333333334, ans=0.2 2023-12-23 03:23:19,263 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.47 vs. limit=15.0 2023-12-23 03:23:34,688 INFO [train.py:886] (1/4) Epoch 30, batch 800, loss[loss=0.0106, audio_tagging_loss=0.0106, over 24750.00 frames. ], tot_loss[loss=0.01276, audio_tagging_loss=0.01276, over 4867104.37 frames. ], batch size: 99, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:23:37,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=926760.0, ans=0.0 2023-12-23 03:23:43,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=926760.0, ans=0.1 2023-12-23 03:24:04,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=926960.0, ans=0.0 2023-12-23 03:24:24,603 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.909e+01 3.234e+01 3.358e+01 3.542e+01 4.031e+01, threshold=6.717e+01, percent-clipped=0.0 2023-12-23 03:24:26,554 INFO [train.py:886] (1/4) Epoch 30, batch 850, loss[loss=0.01216, audio_tagging_loss=0.01216, over 25000.00 frames. ], tot_loss[loss=0.01274, audio_tagging_loss=0.01274, over 4885134.71 frames. ], batch size: 100, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:24:32,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=927093.3333333334, ans=10.0 2023-12-23 03:24:43,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=927160.0, ans=0.1 2023-12-23 03:25:12,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=927360.0, ans=0.125 2023-12-23 03:25:15,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=927360.0, ans=0.0 2023-12-23 03:25:19,745 INFO [train.py:886] (1/4) Epoch 30, batch 900, loss[loss=0.01338, audio_tagging_loss=0.01338, over 25000.00 frames. ], tot_loss[loss=0.01278, audio_tagging_loss=0.01278, over 4905224.98 frames. ], batch size: 100, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:25:26,714 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.20 vs. limit=22.5 2023-12-23 03:25:29,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=927493.3333333334, ans=0.125 2023-12-23 03:25:35,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=927493.3333333334, ans=0.0 2023-12-23 03:25:41,664 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.26 vs. limit=10.0 2023-12-23 03:26:03,405 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.575e-01 2023-12-23 03:26:05,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=927693.3333333334, ans=0.1 2023-12-23 03:26:07,768 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.944e+01 3.263e+01 3.411e+01 3.563e+01 4.036e+01, threshold=6.823e+01, percent-clipped=0.0 2023-12-23 03:26:08,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=927760.0, ans=0.125 2023-12-23 03:26:10,370 INFO [train.py:886] (1/4) Epoch 30, batch 950, loss[loss=0.01386, audio_tagging_loss=0.01386, over 24750.00 frames. ], tot_loss[loss=0.01292, audio_tagging_loss=0.01292, over 4909605.84 frames. ], batch size: 99, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:26:17,299 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.32 vs. limit=15.0 2023-12-23 03:26:22,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=927826.6666666666, ans=0.125 2023-12-23 03:26:27,166 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.11 vs. limit=12.0 2023-12-23 03:26:37,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=927893.3333333334, ans=0.5 2023-12-23 03:26:38,410 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.32 vs. limit=22.5 2023-12-23 03:26:43,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=927960.0, ans=0.125 2023-12-23 03:27:02,694 INFO [train.py:886] (1/4) Epoch 30, batch 1000, loss[loss=0.01221, audio_tagging_loss=0.01221, over 24750.00 frames. ], tot_loss[loss=0.01286, audio_tagging_loss=0.01286, over 4912982.97 frames. ], batch size: 99, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:27:06,891 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.57 vs. limit=15.0 2023-12-23 03:27:08,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=928093.3333333334, ans=0.125 2023-12-23 03:27:09,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=928093.3333333334, ans=0.125 2023-12-23 03:27:31,370 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.55 vs. limit=22.5 2023-12-23 03:27:46,560 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.80 vs. limit=6.0 2023-12-23 03:27:52,140 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.864e+01 3.168e+01 3.324e+01 3.493e+01 4.167e+01, threshold=6.648e+01, percent-clipped=0.0 2023-12-23 03:27:54,032 INFO [train.py:886] (1/4) Epoch 30, batch 1050, loss[loss=0.01222, audio_tagging_loss=0.01222, over 24750.00 frames. ], tot_loss[loss=0.01274, audio_tagging_loss=0.01274, over 4919527.11 frames. ], batch size: 99, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:28:06,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=928493.3333333334, ans=0.125 2023-12-23 03:28:37,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=928693.3333333334, ans=0.05 2023-12-23 03:28:39,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=928693.3333333334, ans=0.0 2023-12-23 03:28:41,176 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.39 vs. limit=22.5 2023-12-23 03:28:44,581 INFO [train.py:886] (1/4) Epoch 30, batch 1100, loss[loss=0.01151, audio_tagging_loss=0.01151, over 24750.00 frames. ], tot_loss[loss=0.01261, audio_tagging_loss=0.01261, over 4925329.64 frames. ], batch size: 99, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:28:55,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=928826.6666666666, ans=0.0 2023-12-23 03:29:06,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=928893.3333333334, ans=0.125 2023-12-23 03:29:07,843 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.26 vs. limit=15.0 2023-12-23 03:29:25,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=929026.6666666666, ans=0.0 2023-12-23 03:29:25,467 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.29 vs. limit=15.0 2023-12-23 03:29:35,377 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.868e+01 3.223e+01 3.331e+01 3.541e+01 4.042e+01, threshold=6.662e+01, percent-clipped=0.0 2023-12-23 03:29:37,300 INFO [train.py:886] (1/4) Epoch 30, batch 1150, loss[loss=0.01082, audio_tagging_loss=0.01082, over 25000.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4933977.23 frames. ], batch size: 100, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:29:41,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=929093.3333333334, ans=0.125 2023-12-23 03:29:46,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=929160.0, ans=0.025 2023-12-23 03:29:59,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=929226.6666666666, ans=0.0 2023-12-23 03:30:27,924 INFO [train.py:886] (1/4) Epoch 30, batch 1200, loss[loss=0.01387, audio_tagging_loss=0.01387, over 24750.00 frames. ], tot_loss[loss=0.01254, audio_tagging_loss=0.01254, over 4943813.15 frames. ], batch size: 99, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:30:30,197 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.30 vs. limit=15.0 2023-12-23 03:30:50,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=929560.0, ans=0.125 2023-12-23 03:31:16,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=929693.3333333334, ans=0.0 2023-12-23 03:31:18,623 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.002e+01 3.244e+01 3.373e+01 3.542e+01 4.199e+01, threshold=6.745e+01, percent-clipped=0.0 2023-12-23 03:31:18,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=929693.3333333334, ans=0.1 2023-12-23 03:31:19,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2023-12-23 03:31:20,510 INFO [train.py:886] (1/4) Epoch 30, batch 1250, loss[loss=0.01398, audio_tagging_loss=0.01398, over 24750.00 frames. ], tot_loss[loss=0.01254, audio_tagging_loss=0.01254, over 4945272.27 frames. ], batch size: 99, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:31:42,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=929893.3333333334, ans=0.0 2023-12-23 03:31:43,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=929893.3333333334, ans=0.125 2023-12-23 03:32:01,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=930026.6666666666, ans=0.09899494936611666 2023-12-23 03:32:12,538 INFO [train.py:886] (1/4) Epoch 30, batch 1300, loss[loss=0.01117, audio_tagging_loss=0.01117, over 24750.00 frames. ], tot_loss[loss=0.01267, audio_tagging_loss=0.01267, over 4937001.47 frames. ], batch size: 99, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:32:18,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=930093.3333333334, ans=0.1 2023-12-23 03:32:46,754 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.17 vs. limit=22.5 2023-12-23 03:32:49,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=930293.3333333334, ans=0.0 2023-12-23 03:32:50,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=930293.3333333334, ans=6.0 2023-12-23 03:32:59,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=930360.0, ans=0.1 2023-12-23 03:33:01,545 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.974e+01 3.237e+01 3.393e+01 3.536e+01 4.080e+01, threshold=6.786e+01, percent-clipped=0.0 2023-12-23 03:33:03,487 INFO [train.py:886] (1/4) Epoch 30, batch 1350, loss[loss=0.01144, audio_tagging_loss=0.01144, over 24750.00 frames. ], tot_loss[loss=0.01267, audio_tagging_loss=0.01267, over 4942140.42 frames. ], batch size: 99, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:33:18,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=930493.3333333334, ans=0.1 2023-12-23 03:33:20,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=930493.3333333334, ans=0.09899494936611666 2023-12-23 03:33:21,280 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.91 vs. limit=15.0 2023-12-23 03:33:28,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=930560.0, ans=0.125 2023-12-23 03:33:35,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=930626.6666666666, ans=0.5 2023-12-23 03:33:35,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=930626.6666666666, ans=0.2 2023-12-23 03:33:55,842 INFO [train.py:886] (1/4) Epoch 30, batch 1400, loss[loss=0.01331, audio_tagging_loss=0.01331, over 24750.00 frames. ], tot_loss[loss=0.01258, audio_tagging_loss=0.01258, over 4944767.25 frames. ], batch size: 99, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:33:58,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=930760.0, ans=0.0 2023-12-23 03:34:02,837 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.82 vs. limit=15.0 2023-12-23 03:34:04,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=930826.6666666666, ans=0.0 2023-12-23 03:34:19,912 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.57 vs. limit=15.0 2023-12-23 03:34:41,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=931026.6666666666, ans=0.1 2023-12-23 03:34:44,810 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.927e+01 3.190e+01 3.312e+01 3.498e+01 3.963e+01, threshold=6.624e+01, percent-clipped=0.0 2023-12-23 03:34:46,710 INFO [train.py:886] (1/4) Epoch 30, batch 1450, loss[loss=0.01415, audio_tagging_loss=0.01415, over 25000.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4947476.05 frames. ], batch size: 100, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:35:24,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=931293.3333333334, ans=0.125 2023-12-23 03:35:39,329 INFO [train.py:886] (1/4) Epoch 30, batch 1500, loss[loss=0.01243, audio_tagging_loss=0.01243, over 24750.00 frames. ], tot_loss[loss=0.01259, audio_tagging_loss=0.01259, over 4955816.95 frames. ], batch size: 99, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:35:44,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=931426.6666666666, ans=0.125 2023-12-23 03:35:49,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=931493.3333333334, ans=0.1 2023-12-23 03:35:53,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=931493.3333333334, ans=0.2 2023-12-23 03:36:21,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=931693.3333333334, ans=0.0 2023-12-23 03:36:21,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=931693.3333333334, ans=0.125 2023-12-23 03:36:29,276 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.936e+01 3.252e+01 3.425e+01 3.586e+01 4.549e+01, threshold=6.850e+01, percent-clipped=0.0 2023-12-23 03:36:31,154 INFO [train.py:886] (1/4) Epoch 30, batch 1550, loss[loss=0.01146, audio_tagging_loss=0.01146, over 25000.00 frames. ], tot_loss[loss=0.01277, audio_tagging_loss=0.01277, over 4949976.19 frames. ], batch size: 100, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:36:43,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=931826.6666666666, ans=0.1 2023-12-23 03:36:51,359 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.43 vs. limit=5.0 2023-12-23 03:36:54,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=931893.3333333334, ans=0.125 2023-12-23 03:37:00,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=931893.3333333334, ans=0.125 2023-12-23 03:37:04,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=931960.0, ans=0.04949747468305833 2023-12-23 03:37:12,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=932026.6666666666, ans=0.125 2023-12-23 03:37:23,210 INFO [train.py:886] (1/4) Epoch 30, batch 1600, loss[loss=0.01143, audio_tagging_loss=0.01143, over 24750.00 frames. ], tot_loss[loss=0.01272, audio_tagging_loss=0.01272, over 4943412.67 frames. ], batch size: 99, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:37:45,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.83 vs. limit=15.0 2023-12-23 03:37:48,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=932226.6666666666, ans=0.1 2023-12-23 03:37:58,272 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:38:03,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=932293.3333333334, ans=0.125 2023-12-23 03:38:14,422 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.061e+01 3.234e+01 3.380e+01 3.507e+01 4.339e+01, threshold=6.761e+01, percent-clipped=0.0 2023-12-23 03:38:16,310 INFO [train.py:886] (1/4) Epoch 30, batch 1650, loss[loss=0.01169, audio_tagging_loss=0.01169, over 25000.00 frames. ], tot_loss[loss=0.01275, audio_tagging_loss=0.01275, over 4946852.33 frames. ], batch size: 100, lr: 3.60e-03, grad_scale: 32.0 2023-12-23 03:38:23,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=932426.6666666666, ans=0.125 2023-12-23 03:38:42,942 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.02 vs. limit=15.0 2023-12-23 03:38:57,564 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:39:03,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=932693.3333333334, ans=0.125 2023-12-23 03:39:07,499 INFO [train.py:886] (1/4) Epoch 30, batch 1700, loss[loss=0.01133, audio_tagging_loss=0.01133, over 25000.00 frames. ], tot_loss[loss=0.01265, audio_tagging_loss=0.01265, over 4948437.61 frames. ], batch size: 100, lr: 3.60e-03, grad_scale: 32.0 2023-12-23 03:39:17,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=932826.6666666666, ans=0.0 2023-12-23 03:39:23,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=932826.6666666666, ans=0.125 2023-12-23 03:39:31,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=932893.3333333334, ans=0.125 2023-12-23 03:39:46,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=932960.0, ans=0.125 2023-12-23 03:39:46,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=932960.0, ans=0.0 2023-12-23 03:39:52,231 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.16 vs. limit=15.0 2023-12-23 03:39:58,099 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.929e+01 3.264e+01 3.396e+01 3.507e+01 4.450e+01, threshold=6.793e+01, percent-clipped=0.0 2023-12-23 03:39:59,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=933093.3333333334, ans=0.1 2023-12-23 03:40:00,116 INFO [train.py:886] (1/4) Epoch 30, batch 1750, loss[loss=0.01138, audio_tagging_loss=0.01138, over 25000.00 frames. ], tot_loss[loss=0.01258, audio_tagging_loss=0.01258, over 4954898.53 frames. ], batch size: 100, lr: 3.60e-03, grad_scale: 32.0 2023-12-23 03:40:16,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=933160.0, ans=0.0 2023-12-23 03:40:20,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=933226.6666666666, ans=0.125 2023-12-23 03:40:29,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=933226.6666666666, ans=0.0 2023-12-23 03:40:29,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=933226.6666666666, ans=0.0 2023-12-23 03:40:40,989 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.21 vs. limit=15.0 2023-12-23 03:40:55,023 INFO [train.py:886] (1/4) Epoch 30, batch 1800, loss[loss=0.01159, audio_tagging_loss=0.01159, over 25000.00 frames. ], tot_loss[loss=0.01252, audio_tagging_loss=0.01252, over 4954238.03 frames. ], batch size: 100, lr: 3.60e-03, grad_scale: 32.0 2023-12-23 03:40:59,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=933426.6666666666, ans=0.1 2023-12-23 03:41:10,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=933493.3333333334, ans=0.1 2023-12-23 03:41:23,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=933560.0, ans=0.0 2023-12-23 03:41:30,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=933626.6666666666, ans=0.0 2023-12-23 03:41:30,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=933626.6666666666, ans=0.0 2023-12-23 03:41:43,726 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.809e+01 3.246e+01 3.412e+01 3.538e+01 4.076e+01, threshold=6.825e+01, percent-clipped=0.0 2023-12-23 03:41:45,633 INFO [train.py:886] (1/4) Epoch 30, batch 1850, loss[loss=0.01093, audio_tagging_loss=0.01093, over 24044.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4952151.43 frames. ], batch size: 100, lr: 3.60e-03, grad_scale: 32.0 2023-12-23 03:41:52,516 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.66 vs. limit=22.5 2023-12-23 03:42:02,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2023-12-23 03:42:06,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=933893.3333333334, ans=0.125 2023-12-23 03:42:18,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=933960.0, ans=0.0 2023-12-23 03:42:23,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=933960.0, ans=0.125 2023-12-23 03:42:33,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=934026.6666666666, ans=0.125 2023-12-23 03:42:37,479 INFO [train.py:886] (1/4) Epoch 30, batch 1900, loss[loss=0.01283, audio_tagging_loss=0.01283, over 24750.00 frames. ], tot_loss[loss=0.01271, audio_tagging_loss=0.01271, over 4949806.10 frames. ], batch size: 99, lr: 3.60e-03, grad_scale: 32.0 2023-12-23 03:42:37,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=934093.3333333334, ans=10.0 2023-12-23 03:42:40,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.44 vs. limit=15.0 2023-12-23 03:42:41,513 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.82 vs. limit=15.0 2023-12-23 03:42:58,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=934226.6666666666, ans=0.0 2023-12-23 03:42:59,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=934226.6666666666, ans=0.125 2023-12-23 03:43:01,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=934226.6666666666, ans=0.05 2023-12-23 03:43:06,922 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:43:13,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=934293.3333333334, ans=0.125 2023-12-23 03:43:17,270 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.53 vs. limit=15.0 2023-12-23 03:43:26,845 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.989e+01 3.318e+01 3.464e+01 3.638e+01 4.261e+01, threshold=6.929e+01, percent-clipped=0.0 2023-12-23 03:43:29,483 INFO [train.py:886] (1/4) Epoch 30, batch 1950, loss[loss=0.0108, audio_tagging_loss=0.0108, over 24750.00 frames. ], tot_loss[loss=0.01259, audio_tagging_loss=0.01259, over 4946696.41 frames. ], batch size: 99, lr: 3.60e-03, grad_scale: 32.0 2023-12-23 03:43:44,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=934493.3333333334, ans=0.1 2023-12-23 03:44:01,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=934626.6666666666, ans=0.125 2023-12-23 03:44:05,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=934626.6666666666, ans=0.0 2023-12-23 03:44:21,282 INFO [train.py:886] (1/4) Epoch 30, batch 2000, loss[loss=0.01199, audio_tagging_loss=0.01199, over 25000.00 frames. ], tot_loss[loss=0.01253, audio_tagging_loss=0.01253, over 4943572.89 frames. ], batch size: 100, lr: 3.60e-03, grad_scale: 64.0 2023-12-23 03:44:22,962 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.29 vs. limit=15.0 2023-12-23 03:44:24,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=934760.0, ans=0.125 2023-12-23 03:44:27,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=934760.0, ans=0.0 2023-12-23 03:44:44,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=934893.3333333334, ans=0.125 2023-12-23 03:45:05,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.60 vs. limit=10.0 2023-12-23 03:45:09,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=935026.6666666666, ans=0.125 2023-12-23 03:45:11,324 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.26 vs. limit=6.0 2023-12-23 03:45:11,879 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.798e+01 3.193e+01 3.321e+01 3.490e+01 4.233e+01, threshold=6.642e+01, percent-clipped=0.0 2023-12-23 03:45:13,811 INFO [train.py:886] (1/4) Epoch 30, batch 2050, loss[loss=0.01377, audio_tagging_loss=0.01377, over 25000.00 frames. ], tot_loss[loss=0.01256, audio_tagging_loss=0.01256, over 4943401.02 frames. ], batch size: 100, lr: 3.60e-03, grad_scale: 64.0 2023-12-23 03:45:23,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=935160.0, ans=0.2 2023-12-23 03:45:47,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=935293.3333333334, ans=0.0 2023-12-23 03:45:52,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=935293.3333333334, ans=0.125 2023-12-23 03:46:05,444 INFO [train.py:886] (1/4) Epoch 30, batch 2100, loss[loss=0.01341, audio_tagging_loss=0.01341, over 25000.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4949181.58 frames. ], batch size: 100, lr: 3.60e-03, grad_scale: 64.0 2023-12-23 03:46:06,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=935426.6666666666, ans=0.07 2023-12-23 03:46:30,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=935560.0, ans=0.125 2023-12-23 03:46:34,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=935560.0, ans=0.5 2023-12-23 03:46:37,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=935626.6666666666, ans=0.0 2023-12-23 03:46:54,852 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.885e+01 3.237e+01 3.407e+01 3.545e+01 4.095e+01, threshold=6.814e+01, percent-clipped=0.0 2023-12-23 03:46:56,782 INFO [train.py:886] (1/4) Epoch 30, batch 2150, loss[loss=0.01519, audio_tagging_loss=0.01519, over 25000.00 frames. ], tot_loss[loss=0.01253, audio_tagging_loss=0.01253, over 4952402.54 frames. ], batch size: 100, lr: 3.60e-03, grad_scale: 64.0 2023-12-23 03:47:10,914 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.12 vs. limit=15.0 2023-12-23 03:47:19,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=935893.3333333334, ans=0.0 2023-12-23 03:47:26,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=935893.3333333334, ans=0.125 2023-12-23 03:47:28,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=935960.0, ans=0.125 2023-12-23 03:47:49,778 INFO [train.py:886] (1/4) Epoch 30, batch 2200, loss[loss=0.01256, audio_tagging_loss=0.01256, over 24750.00 frames. ], tot_loss[loss=0.01265, audio_tagging_loss=0.01265, over 4950693.63 frames. ], batch size: 99, lr: 3.60e-03, grad_scale: 64.0 2023-12-23 03:47:50,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=936093.3333333334, ans=0.07 2023-12-23 03:48:37,459 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.67 vs. limit=15.0 2023-12-23 03:48:37,953 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.929e+01 3.275e+01 3.471e+01 3.594e+01 4.076e+01, threshold=6.942e+01, percent-clipped=0.0 2023-12-23 03:48:39,962 INFO [train.py:886] (1/4) Epoch 30, batch 2250, loss[loss=0.01407, audio_tagging_loss=0.01407, over 24750.00 frames. ], tot_loss[loss=0.01262, audio_tagging_loss=0.01262, over 4947161.82 frames. ], batch size: 99, lr: 3.60e-03, grad_scale: 64.0 2023-12-23 03:48:44,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=936426.6666666666, ans=0.0 2023-12-23 03:48:45,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=936426.6666666666, ans=0.0 2023-12-23 03:48:49,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=936426.6666666666, ans=0.05 2023-12-23 03:48:54,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=936493.3333333334, ans=0.125 2023-12-23 03:49:02,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=936560.0, ans=0.5 2023-12-23 03:49:03,911 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:49:05,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=936560.0, ans=0.125 2023-12-23 03:49:08,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=936560.0, ans=0.125 2023-12-23 03:49:32,510 INFO [train.py:886] (1/4) Epoch 30, batch 2300, loss[loss=0.01233, audio_tagging_loss=0.01233, over 24750.00 frames. ], tot_loss[loss=0.01256, audio_tagging_loss=0.01256, over 4949205.28 frames. ], batch size: 99, lr: 3.60e-03, grad_scale: 64.0 2023-12-23 03:50:11,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=936960.0, ans=0.0 2023-12-23 03:50:12,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=937026.6666666666, ans=0.1 2023-12-23 03:50:20,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=937026.6666666666, ans=0.125 2023-12-23 03:50:21,596 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.901e+01 3.201e+01 3.316e+01 3.475e+01 3.980e+01, threshold=6.632e+01, percent-clipped=0.0 2023-12-23 03:50:24,251 INFO [train.py:886] (1/4) Epoch 30, batch 2350, loss[loss=0.01226, audio_tagging_loss=0.01226, over 25000.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4955467.62 frames. ], batch size: 100, lr: 3.60e-03, grad_scale: 64.0 2023-12-23 03:50:28,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=937093.3333333334, ans=0.125 2023-12-23 03:50:29,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=937093.3333333334, ans=0.125 2023-12-23 03:50:30,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=937093.3333333334, ans=0.125 2023-12-23 03:50:34,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=937160.0, ans=0.125 2023-12-23 03:50:41,554 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.48 vs. limit=15.0 2023-12-23 03:50:46,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=937226.6666666666, ans=0.0 2023-12-23 03:50:58,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=937293.3333333334, ans=0.125 2023-12-23 03:51:06,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=937360.0, ans=0.125 2023-12-23 03:51:11,209 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:51:11,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=937360.0, ans=0.1 2023-12-23 03:51:15,748 INFO [train.py:886] (1/4) Epoch 30, batch 2400, loss[loss=0.01313, audio_tagging_loss=0.01313, over 25000.00 frames. ], tot_loss[loss=0.01248, audio_tagging_loss=0.01248, over 4953082.55 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 03:51:35,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=937493.3333333334, ans=0.125 2023-12-23 03:51:54,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=937626.6666666666, ans=0.125 2023-12-23 03:52:05,954 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.939e+01 3.251e+01 3.352e+01 3.497e+01 5.027e+01, threshold=6.704e+01, percent-clipped=0.0 2023-12-23 03:52:06,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=937693.3333333334, ans=0.1 2023-12-23 03:52:08,562 INFO [train.py:886] (1/4) Epoch 30, batch 2450, loss[loss=0.01412, audio_tagging_loss=0.01412, over 25000.00 frames. ], tot_loss[loss=0.01247, audio_tagging_loss=0.01247, over 4955848.63 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 03:52:11,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=937760.0, ans=0.125 2023-12-23 03:52:12,746 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.79 vs. limit=6.0 2023-12-23 03:52:21,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=937826.6666666666, ans=0.1 2023-12-23 03:52:22,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=937826.6666666666, ans=0.2 2023-12-23 03:52:29,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.03 vs. limit=15.0 2023-12-23 03:52:44,848 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:52:44,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=937960.0, ans=0.2 2023-12-23 03:52:58,803 INFO [train.py:886] (1/4) Epoch 30, batch 2500, loss[loss=0.0156, audio_tagging_loss=0.0156, over 24750.00 frames. ], tot_loss[loss=0.01265, audio_tagging_loss=0.01265, over 4957353.25 frames. ], batch size: 99, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 03:53:21,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=938226.6666666666, ans=0.05 2023-12-23 03:53:23,727 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.35 vs. limit=12.0 2023-12-23 03:53:28,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=938226.6666666666, ans=0.1 2023-12-23 03:53:49,320 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.034e+01 3.342e+01 3.462e+01 3.638e+01 4.220e+01, threshold=6.925e+01, percent-clipped=0.0 2023-12-23 03:53:51,309 INFO [train.py:886] (1/4) Epoch 30, batch 2550, loss[loss=0.01379, audio_tagging_loss=0.01379, over 25000.00 frames. ], tot_loss[loss=0.01281, audio_tagging_loss=0.01281, over 4946325.20 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 03:54:07,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=938493.3333333334, ans=0.05 2023-12-23 03:54:09,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=938493.3333333334, ans=0.1 2023-12-23 03:54:33,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=938693.3333333334, ans=0.125 2023-12-23 03:54:34,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=938693.3333333334, ans=0.125 2023-12-23 03:54:42,920 INFO [train.py:886] (1/4) Epoch 30, batch 2600, loss[loss=0.01145, audio_tagging_loss=0.01145, over 24750.00 frames. ], tot_loss[loss=0.01271, audio_tagging_loss=0.01271, over 4944444.04 frames. ], batch size: 99, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 03:54:54,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=938826.6666666666, ans=0.125 2023-12-23 03:55:26,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=939026.6666666666, ans=0.2 2023-12-23 03:55:29,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=939026.6666666666, ans=0.125 2023-12-23 03:55:32,986 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.923e+01 3.221e+01 3.370e+01 3.530e+01 4.052e+01, threshold=6.740e+01, percent-clipped=0.0 2023-12-23 03:55:34,927 INFO [train.py:886] (1/4) Epoch 30, batch 2650, loss[loss=0.01427, audio_tagging_loss=0.01427, over 25000.00 frames. ], tot_loss[loss=0.01262, audio_tagging_loss=0.01262, over 4938961.52 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 03:55:39,180 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.98 vs. limit=12.0 2023-12-23 03:55:41,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=939093.3333333334, ans=0.125 2023-12-23 03:55:51,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=939160.0, ans=0.0 2023-12-23 03:55:51,686 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.32 vs. limit=15.0 2023-12-23 03:56:07,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=939293.3333333334, ans=0.125 2023-12-23 03:56:28,538 INFO [train.py:886] (1/4) Epoch 30, batch 2700, loss[loss=0.01406, audio_tagging_loss=0.01406, over 25000.00 frames. ], tot_loss[loss=0.01254, audio_tagging_loss=0.01254, over 4943882.30 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 03:56:33,655 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.06 vs. limit=15.0 2023-12-23 03:56:34,857 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.30 vs. limit=15.0 2023-12-23 03:56:55,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=939560.0, ans=0.0 2023-12-23 03:57:07,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=939626.6666666666, ans=0.125 2023-12-23 03:57:13,946 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:57:16,650 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.960e+01 3.256e+01 3.372e+01 3.519e+01 4.379e+01, threshold=6.744e+01, percent-clipped=0.0 2023-12-23 03:57:19,295 INFO [train.py:886] (1/4) Epoch 30, batch 2750, loss[loss=0.01087, audio_tagging_loss=0.01087, over 25000.00 frames. ], tot_loss[loss=0.01254, audio_tagging_loss=0.01254, over 4951526.06 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 03:57:24,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=939760.0, ans=0.125 2023-12-23 03:57:25,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=939760.0, ans=0.1 2023-12-23 03:57:25,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=939760.0, ans=0.07 2023-12-23 03:57:29,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=15.0 2023-12-23 03:58:10,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=940026.6666666666, ans=0.125 2023-12-23 03:58:11,695 INFO [train.py:886] (1/4) Epoch 30, batch 2800, loss[loss=0.01502, audio_tagging_loss=0.01502, over 25000.00 frames. ], tot_loss[loss=0.01267, audio_tagging_loss=0.01267, over 4948019.37 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 03:58:34,582 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.75 vs. limit=12.0 2023-12-23 03:59:01,825 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.885e+01 3.252e+01 3.401e+01 3.539e+01 4.080e+01, threshold=6.801e+01, percent-clipped=0.0 2023-12-23 03:59:03,686 INFO [train.py:886] (1/4) Epoch 30, batch 2850, loss[loss=0.01273, audio_tagging_loss=0.01273, over 24750.00 frames. ], tot_loss[loss=0.0127, audio_tagging_loss=0.0127, over 4943223.01 frames. ], batch size: 99, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 03:59:11,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=940426.6666666666, ans=0.125 2023-12-23 03:59:37,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=940626.6666666666, ans=0.0 2023-12-23 03:59:54,502 INFO [train.py:886] (1/4) Epoch 30, batch 2900, loss[loss=0.01075, audio_tagging_loss=0.01075, over 25000.00 frames. ], tot_loss[loss=0.01264, audio_tagging_loss=0.01264, over 4941267.61 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 04:00:08,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=940826.6666666666, ans=0.125 2023-12-23 04:00:22,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=940893.3333333334, ans=0.1 2023-12-23 04:00:23,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=940893.3333333334, ans=0.125 2023-12-23 04:00:43,900 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.852e+01 3.200e+01 3.351e+01 3.510e+01 3.990e+01, threshold=6.702e+01, percent-clipped=0.0 2023-12-23 04:00:45,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=941093.3333333334, ans=0.125 2023-12-23 04:00:45,812 INFO [train.py:886] (1/4) Epoch 30, batch 2950, loss[loss=0.01102, audio_tagging_loss=0.01102, over 25000.00 frames. ], tot_loss[loss=0.01252, audio_tagging_loss=0.01252, over 4949883.44 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 04:00:54,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=941093.3333333334, ans=0.125 2023-12-23 04:01:10,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=941226.6666666666, ans=0.125 2023-12-23 04:01:13,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=941226.6666666666, ans=0.0 2023-12-23 04:01:23,746 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.09 vs. limit=10.0 2023-12-23 04:01:37,760 INFO [train.py:886] (1/4) Epoch 30, batch 3000, loss[loss=0.01143, audio_tagging_loss=0.01143, over 25000.00 frames. ], tot_loss[loss=0.01249, audio_tagging_loss=0.01249, over 4949428.71 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 04:01:37,761 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 04:01:50,388 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5675, 3.9599, 4.0775, 3.7502], device='cuda:1') 2023-12-23 04:01:59,246 INFO [train.py:917] (1/4) Epoch 30, validation: loss=0.03287, audio_tagging_loss=0.03287, over 3737520.00 frames. 2023-12-23 04:01:59,246 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 04:02:15,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=941493.3333333334, ans=0.09899494936611666 2023-12-23 04:02:16,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=941493.3333333334, ans=0.07 2023-12-23 04:02:47,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=941693.3333333334, ans=0.125 2023-12-23 04:02:48,227 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.895e+01 3.215e+01 3.346e+01 3.573e+01 4.041e+01, threshold=6.693e+01, percent-clipped=0.0 2023-12-23 04:02:50,145 INFO [train.py:886] (1/4) Epoch 30, batch 3050, loss[loss=0.01294, audio_tagging_loss=0.01294, over 25000.00 frames. ], tot_loss[loss=0.01244, audio_tagging_loss=0.01244, over 4959328.94 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 04:02:52,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=941760.0, ans=0.1 2023-12-23 04:03:35,793 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.78 vs. limit=15.0 2023-12-23 04:03:41,699 INFO [train.py:886] (1/4) Epoch 30, batch 3100, loss[loss=0.01303, audio_tagging_loss=0.01303, over 24750.00 frames. ], tot_loss[loss=0.01253, audio_tagging_loss=0.01253, over 4959872.40 frames. ], batch size: 99, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 04:03:45,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=942093.3333333334, ans=0.025 2023-12-23 04:03:58,978 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.61 vs. limit=22.5 2023-12-23 04:04:01,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=942226.6666666666, ans=0.125 2023-12-23 04:04:05,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.83 vs. limit=12.0 2023-12-23 04:04:07,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=942226.6666666666, ans=0.95 2023-12-23 04:04:17,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=942293.3333333334, ans=0.1 2023-12-23 04:04:30,575 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.053e+01 3.332e+01 3.451e+01 3.634e+01 4.318e+01, threshold=6.902e+01, percent-clipped=0.0 2023-12-23 04:04:32,501 INFO [train.py:886] (1/4) Epoch 30, batch 3150, loss[loss=0.01165, audio_tagging_loss=0.01165, over 24750.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4956383.04 frames. ], batch size: 99, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 04:04:50,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=942493.3333333334, ans=15.0 2023-12-23 04:05:04,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=942626.6666666666, ans=0.125 2023-12-23 04:05:06,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=942626.6666666666, ans=0.1 2023-12-23 04:05:06,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=942626.6666666666, ans=0.125 2023-12-23 04:05:10,047 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.60 vs. limit=12.0 2023-12-23 04:05:13,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=942693.3333333334, ans=0.125 2023-12-23 04:05:22,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=942693.3333333334, ans=0.125 2023-12-23 04:05:24,516 INFO [train.py:886] (1/4) Epoch 30, batch 3200, loss[loss=0.0127, audio_tagging_loss=0.0127, over 24750.00 frames. ], tot_loss[loss=0.01256, audio_tagging_loss=0.01256, over 4954114.50 frames. ], batch size: 99, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:05:38,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=942826.6666666666, ans=0.2 2023-12-23 04:05:38,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=942826.6666666666, ans=0.1 2023-12-23 04:05:46,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=942893.3333333334, ans=0.125 2023-12-23 04:06:00,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=942960.0, ans=0.1 2023-12-23 04:06:02,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=942960.0, ans=0.125 2023-12-23 04:06:12,111 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.760e+01 3.281e+01 3.417e+01 3.634e+01 4.167e+01, threshold=6.835e+01, percent-clipped=0.0 2023-12-23 04:06:14,696 INFO [train.py:886] (1/4) Epoch 30, batch 3250, loss[loss=0.0123, audio_tagging_loss=0.0123, over 25000.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 4951100.42 frames. ], batch size: 100, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:06:34,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=943160.0, ans=0.125 2023-12-23 04:06:44,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=943293.3333333334, ans=0.125 2023-12-23 04:06:45,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=943293.3333333334, ans=0.0 2023-12-23 04:07:06,665 INFO [train.py:886] (1/4) Epoch 30, batch 3300, loss[loss=0.01036, audio_tagging_loss=0.01036, over 22058.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4946625.24 frames. ], batch size: 107, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:07:28,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=943560.0, ans=0.125 2023-12-23 04:07:36,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=943626.6666666666, ans=0.0 2023-12-23 04:07:50,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=943693.3333333334, ans=0.0 2023-12-23 04:07:55,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=943693.3333333334, ans=0.125 2023-12-23 04:07:55,901 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.982e+01 3.270e+01 3.395e+01 3.544e+01 4.890e+01, threshold=6.790e+01, percent-clipped=0.0 2023-12-23 04:07:58,516 INFO [train.py:886] (1/4) Epoch 30, batch 3350, loss[loss=0.01271, audio_tagging_loss=0.01271, over 25000.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4956141.60 frames. ], batch size: 100, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:08:07,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=943826.6666666666, ans=0.0 2023-12-23 04:08:15,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=943826.6666666666, ans=0.2 2023-12-23 04:08:18,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=943893.3333333334, ans=0.125 2023-12-23 04:08:19,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=943893.3333333334, ans=0.0 2023-12-23 04:08:34,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=943960.0, ans=0.125 2023-12-23 04:08:35,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=943960.0, ans=0.125 2023-12-23 04:08:46,794 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:08:48,482 INFO [train.py:886] (1/4) Epoch 30, batch 3400, loss[loss=0.01064, audio_tagging_loss=0.01064, over 25000.00 frames. ], tot_loss[loss=0.01256, audio_tagging_loss=0.01256, over 4954764.78 frames. ], batch size: 100, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:08:58,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=944160.0, ans=0.2 2023-12-23 04:09:02,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=944160.0, ans=0.0 2023-12-23 04:09:07,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=944160.0, ans=0.125 2023-12-23 04:09:37,166 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.991e+01 3.324e+01 3.423e+01 3.624e+01 4.414e+01, threshold=6.845e+01, percent-clipped=0.0 2023-12-23 04:09:39,083 INFO [train.py:886] (1/4) Epoch 30, batch 3450, loss[loss=0.01286, audio_tagging_loss=0.01286, over 23944.00 frames. ], tot_loss[loss=0.01266, audio_tagging_loss=0.01266, over 4950886.60 frames. ], batch size: 100, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:10:01,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=944560.0, ans=0.0 2023-12-23 04:10:06,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=944560.0, ans=0.125 2023-12-23 04:10:07,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=944560.0, ans=0.125 2023-12-23 04:10:07,406 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.64 vs. limit=6.0 2023-12-23 04:10:08,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=944626.6666666666, ans=0.125 2023-12-23 04:10:09,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=944626.6666666666, ans=0.125 2023-12-23 04:10:11,154 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.88 vs. limit=15.0 2023-12-23 04:10:13,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=944626.6666666666, ans=0.125 2023-12-23 04:10:13,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=944626.6666666666, ans=0.0 2023-12-23 04:10:30,259 INFO [train.py:886] (1/4) Epoch 30, batch 3500, loss[loss=0.01183, audio_tagging_loss=0.01183, over 24750.00 frames. ], tot_loss[loss=0.01275, audio_tagging_loss=0.01275, over 4942169.12 frames. ], batch size: 99, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:10:51,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=944893.3333333334, ans=0.2 2023-12-23 04:11:04,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=944960.0, ans=0.125 2023-12-23 04:11:05,363 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.44 vs. limit=15.0 2023-12-23 04:11:16,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=945026.6666666666, ans=0.1 2023-12-23 04:11:19,705 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.755e+01 3.226e+01 3.344e+01 3.619e+01 4.102e+01, threshold=6.688e+01, percent-clipped=0.0 2023-12-23 04:11:21,611 INFO [train.py:886] (1/4) Epoch 30, batch 3550, loss[loss=0.01156, audio_tagging_loss=0.01156, over 24750.00 frames. ], tot_loss[loss=0.01262, audio_tagging_loss=0.01262, over 4945731.44 frames. ], batch size: 99, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:11:36,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=945160.0, ans=0.1 2023-12-23 04:11:38,089 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.67 vs. limit=15.0 2023-12-23 04:11:42,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=945226.6666666666, ans=0.125 2023-12-23 04:11:47,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=945226.6666666666, ans=0.1 2023-12-23 04:11:55,874 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.05 vs. limit=12.0 2023-12-23 04:11:59,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=945293.3333333334, ans=0.2 2023-12-23 04:12:05,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=945360.0, ans=0.1 2023-12-23 04:12:14,352 INFO [train.py:886] (1/4) Epoch 30, batch 3600, loss[loss=0.01381, audio_tagging_loss=0.01381, over 24750.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4948281.91 frames. ], batch size: 99, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:12:14,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=945426.6666666666, ans=0.0 2023-12-23 04:12:23,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=945493.3333333334, ans=0.0 2023-12-23 04:12:35,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=945560.0, ans=0.125 2023-12-23 04:12:38,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=945560.0, ans=0.125 2023-12-23 04:12:47,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=945626.6666666666, ans=0.0 2023-12-23 04:12:49,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=945626.6666666666, ans=0.09899494936611666 2023-12-23 04:12:51,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=945626.6666666666, ans=0.1 2023-12-23 04:12:57,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=945693.3333333334, ans=0.125 2023-12-23 04:13:02,253 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.867e+01 3.251e+01 3.392e+01 3.502e+01 4.191e+01, threshold=6.784e+01, percent-clipped=0.0 2023-12-23 04:13:04,189 INFO [train.py:886] (1/4) Epoch 30, batch 3650, loss[loss=0.01296, audio_tagging_loss=0.01296, over 25000.00 frames. ], tot_loss[loss=0.01249, audio_tagging_loss=0.01249, over 4950189.13 frames. ], batch size: 100, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:13:04,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=945760.0, ans=0.035 2023-12-23 04:13:18,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.72 vs. limit=22.5 2023-12-23 04:13:20,187 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=15.0 2023-12-23 04:13:25,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.73 vs. limit=15.0 2023-12-23 04:13:26,766 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.32 vs. limit=22.5 2023-12-23 04:13:40,152 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.26 vs. limit=22.5 2023-12-23 04:13:44,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=945960.0, ans=0.0 2023-12-23 04:13:49,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=946026.6666666666, ans=0.125 2023-12-23 04:13:51,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=946026.6666666666, ans=0.125 2023-12-23 04:13:57,400 INFO [train.py:886] (1/4) Epoch 30, batch 3700, loss[loss=0.01264, audio_tagging_loss=0.01264, over 25000.00 frames. ], tot_loss[loss=0.0125, audio_tagging_loss=0.0125, over 4957936.59 frames. ], batch size: 100, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:14:20,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=946226.6666666666, ans=0.0 2023-12-23 04:14:24,480 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:14:32,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=946293.3333333334, ans=0.125 2023-12-23 04:14:43,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=946360.0, ans=0.0 2023-12-23 04:14:44,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=946360.0, ans=0.0 2023-12-23 04:14:45,608 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.920e+01 3.248e+01 3.376e+01 3.563e+01 4.172e+01, threshold=6.751e+01, percent-clipped=0.0 2023-12-23 04:14:47,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.06 vs. limit=12.0 2023-12-23 04:14:47,534 INFO [train.py:886] (1/4) Epoch 30, batch 3750, loss[loss=0.01318, audio_tagging_loss=0.01318, over 24750.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4955867.11 frames. ], batch size: 99, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:14:53,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=946426.6666666666, ans=0.1 2023-12-23 04:15:05,255 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.64 vs. limit=15.0 2023-12-23 04:15:10,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=946560.0, ans=0.125 2023-12-23 04:15:30,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=946693.3333333334, ans=0.125 2023-12-23 04:15:35,928 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.96 vs. limit=12.0 2023-12-23 04:15:39,178 INFO [train.py:886] (1/4) Epoch 30, batch 3800, loss[loss=0.01108, audio_tagging_loss=0.01108, over 25000.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4954027.16 frames. ], batch size: 100, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:15:52,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=946826.6666666666, ans=0.125 2023-12-23 04:15:52,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=946826.6666666666, ans=0.125 2023-12-23 04:15:54,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=946826.6666666666, ans=15.0 2023-12-23 04:16:04,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=946893.3333333334, ans=0.2 2023-12-23 04:16:04,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=946893.3333333334, ans=0.125 2023-12-23 04:16:29,394 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.901e+01 3.289e+01 3.427e+01 3.573e+01 5.060e+01, threshold=6.854e+01, percent-clipped=0.0 2023-12-23 04:16:29,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=947026.6666666666, ans=0.125 2023-12-23 04:16:31,294 INFO [train.py:886] (1/4) Epoch 30, batch 3850, loss[loss=0.01098, audio_tagging_loss=0.01098, over 24750.00 frames. ], tot_loss[loss=0.01253, audio_tagging_loss=0.01253, over 4946804.32 frames. ], batch size: 99, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:17:11,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=947293.3333333334, ans=0.0 2023-12-23 04:17:13,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=947360.0, ans=0.04949747468305833 2023-12-23 04:17:22,783 INFO [train.py:886] (1/4) Epoch 30, batch 3900, loss[loss=0.01081, audio_tagging_loss=0.01081, over 25000.00 frames. ], tot_loss[loss=0.0125, audio_tagging_loss=0.0125, over 4948686.75 frames. ], batch size: 100, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:17:24,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=947426.6666666666, ans=0.05 2023-12-23 04:17:28,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=947426.6666666666, ans=0.125 2023-12-23 04:17:29,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=947426.6666666666, ans=10.0 2023-12-23 04:17:43,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=947560.0, ans=0.125 2023-12-23 04:17:51,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=947560.0, ans=0.125 2023-12-23 04:18:12,359 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.816e+01 3.238e+01 3.413e+01 3.556e+01 4.213e+01, threshold=6.826e+01, percent-clipped=0.0 2023-12-23 04:18:14,275 INFO [train.py:886] (1/4) Epoch 30, batch 3950, loss[loss=0.01283, audio_tagging_loss=0.01283, over 25000.00 frames. ], tot_loss[loss=0.01251, audio_tagging_loss=0.01251, over 4953608.60 frames. ], batch size: 100, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:18:21,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=947760.0, ans=0.125 2023-12-23 04:18:25,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=947826.6666666666, ans=0.0 2023-12-23 04:18:27,602 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.23 vs. limit=15.0 2023-12-23 04:18:27,684 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.60 vs. limit=22.5 2023-12-23 04:18:33,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=947826.6666666666, ans=0.0 2023-12-23 04:18:37,536 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.74 vs. limit=22.5 2023-12-23 04:18:57,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=948026.6666666666, ans=0.125 2023-12-23 04:18:58,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=948026.6666666666, ans=0.07 2023-12-23 04:19:07,785 INFO [train.py:886] (1/4) Epoch 30, batch 4000, loss[loss=0.01296, audio_tagging_loss=0.01296, over 25000.00 frames. ], tot_loss[loss=0.01247, audio_tagging_loss=0.01247, over 4961020.51 frames. ], batch size: 100, lr: 3.57e-03, grad_scale: 128.0 2023-12-23 04:19:31,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=948226.6666666666, ans=0.2 2023-12-23 04:19:45,187 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.50 vs. limit=12.0 2023-12-23 04:19:50,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=948360.0, ans=0.0 2023-12-23 04:19:57,731 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.995e+01 3.306e+01 3.420e+01 3.603e+01 4.772e+01, threshold=6.841e+01, percent-clipped=0.0 2023-12-23 04:19:58,692 INFO [train.py:886] (1/4) Epoch 30, batch 4050, loss[loss=0.01312, audio_tagging_loss=0.01312, over 24750.00 frames. ], tot_loss[loss=0.01252, audio_tagging_loss=0.01252, over 4958948.50 frames. ], batch size: 99, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:20:03,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=948426.6666666666, ans=0.125 2023-12-23 04:20:05,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=948426.6666666666, ans=0.0 2023-12-23 04:20:30,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=948626.6666666666, ans=0.2 2023-12-23 04:20:51,075 INFO [train.py:886] (1/4) Epoch 30, batch 4100, loss[loss=0.01138, audio_tagging_loss=0.01138, over 24750.00 frames. ], tot_loss[loss=0.01262, audio_tagging_loss=0.01262, over 4951332.51 frames. ], batch size: 99, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:20:53,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=948760.0, ans=0.2 2023-12-23 04:20:53,332 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.93 vs. limit=15.0 2023-12-23 04:21:07,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=948826.6666666666, ans=0.125 2023-12-23 04:21:07,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=948826.6666666666, ans=0.2 2023-12-23 04:21:31,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=949026.6666666666, ans=0.0 2023-12-23 04:21:41,985 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.936e+01 3.264e+01 3.402e+01 3.586e+01 4.088e+01, threshold=6.804e+01, percent-clipped=0.0 2023-12-23 04:21:43,649 INFO [train.py:886] (1/4) Epoch 30, batch 4150, loss[loss=0.01163, audio_tagging_loss=0.01163, over 24750.00 frames. ], tot_loss[loss=0.01262, audio_tagging_loss=0.01262, over 4950911.25 frames. ], batch size: 99, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:21:53,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=949160.0, ans=0.125 2023-12-23 04:22:04,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=949226.6666666666, ans=0.125 2023-12-23 04:22:15,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=949293.3333333334, ans=0.1 2023-12-23 04:22:34,546 INFO [train.py:886] (1/4) Epoch 30, batch 4200, loss[loss=0.01313, audio_tagging_loss=0.01313, over 25000.00 frames. ], tot_loss[loss=0.01253, audio_tagging_loss=0.01253, over 4949293.19 frames. ], batch size: 100, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:22:36,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=949426.6666666666, ans=0.125 2023-12-23 04:22:41,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=949426.6666666666, ans=0.1 2023-12-23 04:22:46,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=949493.3333333334, ans=0.2 2023-12-23 04:22:50,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=949493.3333333334, ans=0.1 2023-12-23 04:22:51,158 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.05 vs. limit=15.0 2023-12-23 04:22:54,926 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.52 vs. limit=15.0 2023-12-23 04:23:25,798 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.859e+01 3.208e+01 3.393e+01 3.521e+01 4.162e+01, threshold=6.786e+01, percent-clipped=0.0 2023-12-23 04:23:26,753 INFO [train.py:886] (1/4) Epoch 30, batch 4250, loss[loss=0.01196, audio_tagging_loss=0.01196, over 25000.00 frames. ], tot_loss[loss=0.01249, audio_tagging_loss=0.01249, over 4951065.07 frames. ], batch size: 100, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:23:27,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=949760.0, ans=0.2 2023-12-23 04:23:30,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=949760.0, ans=0.0 2023-12-23 04:23:34,835 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.62 vs. limit=15.0 2023-12-23 04:23:42,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=949826.6666666666, ans=0.125 2023-12-23 04:24:01,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=949960.0, ans=0.2 2023-12-23 04:24:02,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=949960.0, ans=0.125 2023-12-23 04:24:02,589 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.91 vs. limit=22.5 2023-12-23 04:24:02,714 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.74 vs. limit=15.0 2023-12-23 04:24:04,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=949960.0, ans=0.125 2023-12-23 04:24:04,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=949960.0, ans=0.025 2023-12-23 04:24:05,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=949960.0, ans=0.0 2023-12-23 04:24:17,250 INFO [train.py:886] (1/4) Epoch 30, batch 4300, loss[loss=0.01237, audio_tagging_loss=0.01237, over 25000.00 frames. ], tot_loss[loss=0.01239, audio_tagging_loss=0.01239, over 4949325.96 frames. ], batch size: 100, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:24:47,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.74 vs. limit=6.0 2023-12-23 04:25:08,197 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.009e+01 3.316e+01 3.415e+01 3.581e+01 5.246e+01, threshold=6.831e+01, percent-clipped=0.0 2023-12-23 04:25:09,184 INFO [train.py:886] (1/4) Epoch 30, batch 4350, loss[loss=0.01401, audio_tagging_loss=0.01401, over 24750.00 frames. ], tot_loss[loss=0.01246, audio_tagging_loss=0.01246, over 4950463.80 frames. ], batch size: 99, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:25:16,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=950426.6666666666, ans=0.125 2023-12-23 04:25:24,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=950493.3333333334, ans=0.0 2023-12-23 04:25:37,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=950560.0, ans=0.0 2023-12-23 04:25:48,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=950626.6666666666, ans=0.125 2023-12-23 04:26:02,669 INFO [train.py:886] (1/4) Epoch 30, batch 4400, loss[loss=0.01594, audio_tagging_loss=0.01594, over 25000.00 frames. ], tot_loss[loss=0.01264, audio_tagging_loss=0.01264, over 4947909.02 frames. ], batch size: 100, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:26:02,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=950760.0, ans=0.1 2023-12-23 04:26:04,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=950760.0, ans=0.125 2023-12-23 04:26:04,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=950760.0, ans=0.125 2023-12-23 04:26:40,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=950960.0, ans=0.0 2023-12-23 04:26:41,595 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.72 vs. limit=22.5 2023-12-23 04:26:48,911 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.52 vs. limit=6.0 2023-12-23 04:26:51,212 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.941e+01 3.268e+01 3.436e+01 3.608e+01 4.178e+01, threshold=6.872e+01, percent-clipped=0.0 2023-12-23 04:26:52,195 INFO [train.py:886] (1/4) Epoch 30, batch 4450, loss[loss=0.01291, audio_tagging_loss=0.01291, over 25000.00 frames. ], tot_loss[loss=0.01268, audio_tagging_loss=0.01268, over 4947552.14 frames. ], batch size: 100, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:27:10,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=951160.0, ans=0.1 2023-12-23 04:27:22,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=951293.3333333334, ans=0.2 2023-12-23 04:27:23,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=951293.3333333334, ans=0.125 2023-12-23 04:27:40,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=951360.0, ans=0.125 2023-12-23 04:27:45,079 INFO [train.py:886] (1/4) Epoch 30, batch 4500, loss[loss=0.01174, audio_tagging_loss=0.01174, over 24750.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4953943.81 frames. ], batch size: 99, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:28:29,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=951693.3333333334, ans=0.2 2023-12-23 04:28:34,499 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.987e+01 3.270e+01 3.347e+01 3.646e+01 4.061e+01, threshold=6.694e+01, percent-clipped=0.0 2023-12-23 04:28:34,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=951760.0, ans=0.125 2023-12-23 04:28:35,444 INFO [train.py:886] (1/4) Epoch 30, batch 4550, loss[loss=0.01298, audio_tagging_loss=0.01298, over 23956.00 frames. ], tot_loss[loss=0.01246, audio_tagging_loss=0.01246, over 4955251.41 frames. ], batch size: 100, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:28:42,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=951760.0, ans=0.1 2023-12-23 04:28:45,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=951760.0, ans=0.0 2023-12-23 04:28:59,940 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=15.0 2023-12-23 04:29:20,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=952026.6666666666, ans=0.0 2023-12-23 04:29:21,415 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=5.94 vs. limit=15.0 2023-12-23 04:29:27,439 INFO [train.py:886] (1/4) Epoch 30, batch 4600, loss[loss=0.01257, audio_tagging_loss=0.01257, over 25000.00 frames. ], tot_loss[loss=0.01245, audio_tagging_loss=0.01245, over 4958900.41 frames. ], batch size: 100, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:29:29,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=952093.3333333334, ans=22.5 2023-12-23 04:29:51,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=952226.6666666666, ans=0.2 2023-12-23 04:29:59,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=952293.3333333334, ans=0.0 2023-12-23 04:30:02,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=952293.3333333334, ans=0.0 2023-12-23 04:30:19,064 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.884e+01 3.239e+01 3.388e+01 3.549e+01 4.736e+01, threshold=6.775e+01, percent-clipped=0.0 2023-12-23 04:30:20,034 INFO [train.py:886] (1/4) Epoch 30, batch 4650, loss[loss=0.01136, audio_tagging_loss=0.01136, over 22864.00 frames. ], tot_loss[loss=0.01249, audio_tagging_loss=0.01249, over 4958767.55 frames. ], batch size: 107, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:30:38,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=952560.0, ans=0.0 2023-12-23 04:30:45,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=952560.0, ans=0.125 2023-12-23 04:30:59,001 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.24 vs. limit=15.0 2023-12-23 04:30:59,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=952626.6666666666, ans=0.125 2023-12-23 04:31:02,653 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.66 vs. limit=10.0 2023-12-23 04:31:10,390 INFO [train.py:886] (1/4) Epoch 30, batch 4700, loss[loss=0.01615, audio_tagging_loss=0.01615, over 24750.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4954505.42 frames. ], batch size: 99, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:31:15,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=952760.0, ans=0.95 2023-12-23 04:31:28,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=952893.3333333334, ans=0.1 2023-12-23 04:31:43,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=952960.0, ans=0.035 2023-12-23 04:31:56,733 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.041e+01 3.362e+01 3.489e+01 3.665e+01 4.317e+01, threshold=6.977e+01, percent-clipped=0.0 2023-12-23 04:31:57,649 INFO [train.py:886] (1/4) Epoch 30, batch 4750, loss[loss=0.01225, audio_tagging_loss=0.01225, over 24750.00 frames. ], tot_loss[loss=0.01267, audio_tagging_loss=0.01267, over 4951052.78 frames. ], batch size: 99, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:32:32,840 INFO [train.py:886] (1/4) Epoch 31, batch 0, loss[loss=0.02705, audio_tagging_loss=0.02705, over 24041.00 frames. ], tot_loss[loss=0.02705, audio_tagging_loss=0.02705, over 24041.00 frames. ], batch size: 100, lr: 3.51e-03, grad_scale: 32.0 2023-12-23 04:32:32,841 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 04:32:43,536 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.6877, 2.8509, 2.4654, 2.2541, 3.8452, 3.3887, 4.0453, 2.3369], device='cuda:1') 2023-12-23 04:32:44,153 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.5506, 3.3962, 3.9561, 4.1285], device='cuda:1') 2023-12-23 04:32:54,307 INFO [train.py:917] (1/4) Epoch 31, validation: loss=0.03297, audio_tagging_loss=0.03297, over 3737520.00 frames. 2023-12-23 04:32:54,307 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 04:33:20,983 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.83 vs. limit=6.0 2023-12-23 04:33:30,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=953400.0, ans=0.125 2023-12-23 04:33:43,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=953466.6666666666, ans=10.0 2023-12-23 04:33:43,539 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2023-12-23 04:33:44,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=953533.3333333334, ans=0.2 2023-12-23 04:33:44,880 INFO [train.py:886] (1/4) Epoch 31, batch 50, loss[loss=0.018, audio_tagging_loss=0.018, over 25000.00 frames. ], tot_loss[loss=0.01997, audio_tagging_loss=0.01997, over 1117444.44 frames. ], batch size: 100, lr: 3.51e-03, grad_scale: 32.0 2023-12-23 04:33:45,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=953533.3333333334, ans=0.09899494936611666 2023-12-23 04:34:11,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=953666.6666666666, ans=0.5 2023-12-23 04:34:18,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=953733.3333333334, ans=0.0 2023-12-23 04:34:20,593 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.241e+01 3.845e+01 4.121e+01 4.670e+01 9.872e+01, threshold=8.242e+01, percent-clipped=8.0 2023-12-23 04:34:27,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=953800.0, ans=0.0 2023-12-23 04:34:36,656 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.94 vs. limit=15.0 2023-12-23 04:34:37,996 INFO [train.py:886] (1/4) Epoch 31, batch 100, loss[loss=0.01607, audio_tagging_loss=0.01607, over 25000.00 frames. ], tot_loss[loss=0.01732, audio_tagging_loss=0.01732, over 1973553.64 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:34:47,026 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.63 vs. limit=15.0 2023-12-23 04:34:53,674 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.12 vs. limit=22.5 2023-12-23 04:34:54,488 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.19 vs. limit=15.0 2023-12-23 04:34:55,447 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.91 vs. limit=22.5 2023-12-23 04:35:08,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.44 vs. limit=15.0 2023-12-23 04:35:16,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=954066.6666666666, ans=0.125 2023-12-23 04:35:17,459 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.17 vs. limit=22.5 2023-12-23 04:35:18,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=954133.3333333334, ans=0.0 2023-12-23 04:35:25,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=954133.3333333334, ans=0.0 2023-12-23 04:35:29,340 INFO [train.py:886] (1/4) Epoch 31, batch 150, loss[loss=0.01294, audio_tagging_loss=0.01294, over 25000.00 frames. ], tot_loss[loss=0.01576, audio_tagging_loss=0.01576, over 2633631.50 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:35:34,137 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:35:38,679 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.26 vs. limit=15.0 2023-12-23 04:35:39,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=954266.6666666666, ans=0.125 2023-12-23 04:36:05,258 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.100e+01 3.397e+01 3.570e+01 3.705e+01 4.340e+01, threshold=7.141e+01, percent-clipped=0.0 2023-12-23 04:36:05,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=954400.0, ans=0.2 2023-12-23 04:36:18,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=954466.6666666666, ans=0.0 2023-12-23 04:36:22,020 INFO [train.py:886] (1/4) Epoch 31, batch 200, loss[loss=0.01377, audio_tagging_loss=0.01377, over 25000.00 frames. ], tot_loss[loss=0.01485, audio_tagging_loss=0.01485, over 3154839.84 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:36:36,455 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.48 vs. limit=15.0 2023-12-23 04:36:52,107 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.00 vs. limit=12.0 2023-12-23 04:36:52,849 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:37:14,763 INFO [train.py:886] (1/4) Epoch 31, batch 250, loss[loss=0.01144, audio_tagging_loss=0.01144, over 22214.00 frames. ], tot_loss[loss=0.0141, audio_tagging_loss=0.0141, over 3543659.40 frames. ], batch size: 107, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:37:21,982 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.43 vs. limit=15.0 2023-12-23 04:37:30,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=954933.3333333334, ans=0.125 2023-12-23 04:37:46,819 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.94 vs. limit=15.0 2023-12-23 04:37:50,813 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.801e+01 3.236e+01 3.404e+01 3.581e+01 4.137e+01, threshold=6.809e+01, percent-clipped=0.0 2023-12-23 04:37:51,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=955066.6666666666, ans=6.0 2023-12-23 04:38:01,649 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.88 vs. limit=22.5 2023-12-23 04:38:01,673 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.16 vs. limit=15.0 2023-12-23 04:38:06,950 INFO [train.py:886] (1/4) Epoch 31, batch 300, loss[loss=0.01195, audio_tagging_loss=0.01195, over 24750.00 frames. ], tot_loss[loss=0.01376, audio_tagging_loss=0.01376, over 3852966.72 frames. ], batch size: 99, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:38:19,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=955266.6666666666, ans=0.125 2023-12-23 04:38:30,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=955333.3333333334, ans=0.025 2023-12-23 04:38:32,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=955333.3333333334, ans=0.125 2023-12-23 04:38:35,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=955333.3333333334, ans=0.1 2023-12-23 04:38:44,138 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:38:54,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=955466.6666666666, ans=0.125 2023-12-23 04:38:58,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=955533.3333333334, ans=0.125 2023-12-23 04:38:59,467 INFO [train.py:886] (1/4) Epoch 31, batch 350, loss[loss=0.01417, audio_tagging_loss=0.01417, over 24750.00 frames. ], tot_loss[loss=0.01351, audio_tagging_loss=0.01351, over 4093890.30 frames. ], batch size: 99, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:39:30,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=955733.3333333334, ans=0.1 2023-12-23 04:39:35,046 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.893e+01 3.321e+01 3.436e+01 3.601e+01 4.104e+01, threshold=6.873e+01, percent-clipped=0.0 2023-12-23 04:39:52,397 INFO [train.py:886] (1/4) Epoch 31, batch 400, loss[loss=0.01385, audio_tagging_loss=0.01385, over 25000.00 frames. ], tot_loss[loss=0.01324, audio_tagging_loss=0.01324, over 4282125.72 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:40:13,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=956000.0, ans=10.0 2023-12-23 04:40:27,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=956066.6666666666, ans=0.1 2023-12-23 04:40:44,911 INFO [train.py:886] (1/4) Epoch 31, batch 450, loss[loss=0.01513, audio_tagging_loss=0.01513, over 24750.00 frames. ], tot_loss[loss=0.013, audio_tagging_loss=0.013, over 4426831.78 frames. ], batch size: 99, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:41:02,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=956266.6666666666, ans=0.125 2023-12-23 04:41:09,478 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:41:19,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=956400.0, ans=0.125 2023-12-23 04:41:19,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=956400.0, ans=0.125 2023-12-23 04:41:20,613 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.863e+01 3.195e+01 3.396e+01 3.625e+01 4.055e+01, threshold=6.792e+01, percent-clipped=0.0 2023-12-23 04:41:27,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=956466.6666666666, ans=0.125 2023-12-23 04:41:32,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=956466.6666666666, ans=0.125 2023-12-23 04:41:38,116 INFO [train.py:886] (1/4) Epoch 31, batch 500, loss[loss=0.01228, audio_tagging_loss=0.01228, over 24750.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 4548440.41 frames. ], batch size: 99, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:41:42,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=956533.3333333334, ans=0.2 2023-12-23 04:41:51,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=956600.0, ans=0.015 2023-12-23 04:42:09,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=956733.3333333334, ans=0.125 2023-12-23 04:42:10,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=956733.3333333334, ans=0.125 2023-12-23 04:42:11,147 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.02 vs. limit=10.0 2023-12-23 04:42:11,956 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.84 vs. limit=15.0 2023-12-23 04:42:13,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=956733.3333333334, ans=0.125 2023-12-23 04:42:19,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=956800.0, ans=0.2 2023-12-23 04:42:30,460 INFO [train.py:886] (1/4) Epoch 31, batch 550, loss[loss=0.01025, audio_tagging_loss=0.01025, over 25000.00 frames. ], tot_loss[loss=0.01278, audio_tagging_loss=0.01278, over 4636312.19 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:42:31,062 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.41 vs. limit=6.0 2023-12-23 04:42:55,171 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.43 vs. limit=15.0 2023-12-23 04:43:05,868 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.926e+01 3.280e+01 3.449e+01 3.593e+01 5.125e+01, threshold=6.898e+01, percent-clipped=0.0 2023-12-23 04:43:15,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=957133.3333333334, ans=0.0 2023-12-23 04:43:22,642 INFO [train.py:886] (1/4) Epoch 31, batch 600, loss[loss=0.01224, audio_tagging_loss=0.01224, over 24952.00 frames. ], tot_loss[loss=0.01289, audio_tagging_loss=0.01289, over 4708680.64 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:43:46,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=957333.3333333334, ans=0.125 2023-12-23 04:43:47,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=957333.3333333334, ans=0.0 2023-12-23 04:43:53,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=957400.0, ans=0.025 2023-12-23 04:44:01,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=957400.0, ans=0.2 2023-12-23 04:44:07,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=957466.6666666666, ans=0.125 2023-12-23 04:44:15,884 INFO [train.py:886] (1/4) Epoch 31, batch 650, loss[loss=0.01398, audio_tagging_loss=0.01398, over 24750.00 frames. ], tot_loss[loss=0.01283, audio_tagging_loss=0.01283, over 4755134.65 frames. ], batch size: 99, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:44:30,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=957600.0, ans=0.0 2023-12-23 04:44:39,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=957666.6666666666, ans=0.125 2023-12-23 04:44:51,804 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.011e+01 3.328e+01 3.481e+01 3.624e+01 4.466e+01, threshold=6.961e+01, percent-clipped=0.0 2023-12-23 04:44:52,228 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2023-12-23 04:44:57,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=957800.0, ans=0.1 2023-12-23 04:45:01,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=957800.0, ans=0.125 2023-12-23 04:45:05,771 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.97 vs. limit=15.0 2023-12-23 04:45:07,158 INFO [train.py:886] (1/4) Epoch 31, batch 700, loss[loss=0.01384, audio_tagging_loss=0.01384, over 25000.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4797667.42 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:45:07,767 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.34 vs. limit=15.0 2023-12-23 04:45:13,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=957866.6666666666, ans=0.125 2023-12-23 04:45:30,908 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.99 vs. limit=15.0 2023-12-23 04:45:32,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=958000.0, ans=0.1 2023-12-23 04:45:37,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=958000.0, ans=0.0 2023-12-23 04:45:55,755 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.48 vs. limit=22.5 2023-12-23 04:45:56,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=958133.3333333334, ans=0.125 2023-12-23 04:45:57,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=958133.3333333334, ans=6.0 2023-12-23 04:46:00,016 INFO [train.py:886] (1/4) Epoch 31, batch 750, loss[loss=0.01009, audio_tagging_loss=0.01009, over 25000.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4837153.53 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:46:03,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=958200.0, ans=0.125 2023-12-23 04:46:14,773 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.91 vs. limit=12.0 2023-12-23 04:46:27,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=958333.3333333334, ans=0.125 2023-12-23 04:46:28,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=958333.3333333334, ans=0.0 2023-12-23 04:46:28,417 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.30 vs. limit=15.0 2023-12-23 04:46:35,654 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.016e+01 3.246e+01 3.400e+01 3.570e+01 4.142e+01, threshold=6.799e+01, percent-clipped=0.0 2023-12-23 04:46:48,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=958466.6666666666, ans=0.125 2023-12-23 04:46:52,362 INFO [train.py:886] (1/4) Epoch 31, batch 800, loss[loss=0.01121, audio_tagging_loss=0.01121, over 25000.00 frames. ], tot_loss[loss=0.01248, audio_tagging_loss=0.01248, over 4865524.91 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:47:03,896 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.07 vs. limit=12.0 2023-12-23 04:47:09,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=958600.0, ans=0.125 2023-12-23 04:47:14,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=958666.6666666666, ans=0.125 2023-12-23 04:47:22,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=958733.3333333334, ans=0.1 2023-12-23 04:47:23,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=958733.3333333334, ans=0.125 2023-12-23 04:47:27,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=958733.3333333334, ans=0.0 2023-12-23 04:47:44,845 INFO [train.py:886] (1/4) Epoch 31, batch 850, loss[loss=0.01116, audio_tagging_loss=0.01116, over 25000.00 frames. ], tot_loss[loss=0.01241, audio_tagging_loss=0.01241, over 4883673.60 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:47:46,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=958866.6666666666, ans=0.125 2023-12-23 04:47:47,139 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.49 vs. limit=22.5 2023-12-23 04:48:16,150 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.68 vs. limit=6.0 2023-12-23 04:48:19,411 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.919e+01 3.282e+01 3.408e+01 3.536e+01 4.077e+01, threshold=6.816e+01, percent-clipped=0.0 2023-12-23 04:48:21,665 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.63 vs. limit=15.0 2023-12-23 04:48:36,941 INFO [train.py:886] (1/4) Epoch 31, batch 900, loss[loss=0.01254, audio_tagging_loss=0.01254, over 24750.00 frames. ], tot_loss[loss=0.01241, audio_tagging_loss=0.01241, over 4901903.53 frames. ], batch size: 99, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:48:58,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=959333.3333333334, ans=0.025 2023-12-23 04:49:00,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=959333.3333333334, ans=0.125 2023-12-23 04:49:00,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=959333.3333333334, ans=0.2 2023-12-23 04:49:06,810 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.94 vs. limit=12.0 2023-12-23 04:49:12,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2023-12-23 04:49:26,588 INFO [train.py:886] (1/4) Epoch 31, batch 950, loss[loss=0.01199, audio_tagging_loss=0.01199, over 24750.00 frames. ], tot_loss[loss=0.01261, audio_tagging_loss=0.01261, over 4910108.16 frames. ], batch size: 99, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:49:29,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=959533.3333333334, ans=0.125 2023-12-23 04:49:45,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=959600.0, ans=0.0 2023-12-23 04:49:54,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=959666.6666666666, ans=0.1 2023-12-23 04:50:00,364 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:50:02,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=959733.3333333334, ans=10.0 2023-12-23 04:50:02,953 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.982e+01 3.282e+01 3.466e+01 3.627e+01 4.372e+01, threshold=6.931e+01, percent-clipped=0.0 2023-12-23 04:50:18,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=959800.0, ans=0.125 2023-12-23 04:50:20,524 INFO [train.py:886] (1/4) Epoch 31, batch 1000, loss[loss=0.01374, audio_tagging_loss=0.01374, over 22094.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4910318.84 frames. ], batch size: 107, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:50:20,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=959866.6666666666, ans=0.0 2023-12-23 04:50:24,002 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=15.0 2023-12-23 04:51:13,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=960133.3333333334, ans=0.125 2023-12-23 04:51:16,547 INFO [train.py:886] (1/4) Epoch 31, batch 1050, loss[loss=0.01088, audio_tagging_loss=0.01088, over 25000.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4917967.98 frames. ], batch size: 100, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:51:25,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=960200.0, ans=0.1 2023-12-23 04:51:32,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=960266.6666666666, ans=0.2 2023-12-23 04:51:36,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=960333.3333333334, ans=0.1 2023-12-23 04:51:42,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=960333.3333333334, ans=0.125 2023-12-23 04:51:46,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=960400.0, ans=0.125 2023-12-23 04:51:52,741 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.955e+01 3.261e+01 3.393e+01 3.606e+01 4.316e+01, threshold=6.786e+01, percent-clipped=0.0 2023-12-23 04:51:54,945 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:52:00,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.95 vs. limit=15.0 2023-12-23 04:52:04,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=960466.6666666666, ans=0.125 2023-12-23 04:52:07,922 INFO [train.py:886] (1/4) Epoch 31, batch 1100, loss[loss=0.01258, audio_tagging_loss=0.01258, over 25000.00 frames. ], tot_loss[loss=0.01244, audio_tagging_loss=0.01244, over 4930413.40 frames. ], batch size: 100, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:52:16,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=960533.3333333334, ans=0.125 2023-12-23 04:52:17,843 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.79 vs. limit=15.0 2023-12-23 04:52:41,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=960733.3333333334, ans=0.0 2023-12-23 04:52:47,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=960733.3333333334, ans=0.0 2023-12-23 04:52:50,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=960800.0, ans=0.125 2023-12-23 04:52:51,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=960800.0, ans=0.125 2023-12-23 04:52:55,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=960800.0, ans=0.09899494936611666 2023-12-23 04:53:01,547 INFO [train.py:886] (1/4) Epoch 31, batch 1150, loss[loss=0.01503, audio_tagging_loss=0.01503, over 25000.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4940898.16 frames. ], batch size: 100, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:53:07,005 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.92 vs. limit=22.5 2023-12-23 04:53:10,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=960933.3333333334, ans=0.125 2023-12-23 04:53:17,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=960933.3333333334, ans=0.2 2023-12-23 04:53:28,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=961000.0, ans=0.0 2023-12-23 04:53:31,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=961066.6666666666, ans=0.0 2023-12-23 04:53:36,112 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.954e+01 3.303e+01 3.399e+01 3.563e+01 3.907e+01, threshold=6.798e+01, percent-clipped=0.0 2023-12-23 04:53:49,684 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.024e-02 2023-12-23 04:53:51,376 INFO [train.py:886] (1/4) Epoch 31, batch 1200, loss[loss=0.01314, audio_tagging_loss=0.01314, over 25000.00 frames. ], tot_loss[loss=0.01251, audio_tagging_loss=0.01251, over 4953100.39 frames. ], batch size: 100, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:54:35,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=961466.6666666666, ans=0.125 2023-12-23 04:54:44,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=961533.3333333334, ans=0.125 2023-12-23 04:54:44,884 INFO [train.py:886] (1/4) Epoch 31, batch 1250, loss[loss=0.01429, audio_tagging_loss=0.01429, over 24750.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 4950350.74 frames. ], batch size: 99, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:54:53,009 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.17 vs. limit=22.5 2023-12-23 04:55:00,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=961600.0, ans=0.0 2023-12-23 04:55:19,939 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.041e+01 3.396e+01 3.498e+01 3.625e+01 4.153e+01, threshold=6.995e+01, percent-clipped=0.0 2023-12-23 04:55:34,345 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.91 vs. limit=15.0 2023-12-23 04:55:36,758 INFO [train.py:886] (1/4) Epoch 31, batch 1300, loss[loss=0.01179, audio_tagging_loss=0.01179, over 24750.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 4944972.70 frames. ], batch size: 99, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:55:44,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=961866.6666666666, ans=0.2 2023-12-23 04:55:55,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=961933.3333333334, ans=0.05 2023-12-23 04:55:59,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=962000.0, ans=0.125 2023-12-23 04:56:04,405 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.23 vs. limit=22.5 2023-12-23 04:56:06,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=962000.0, ans=0.0 2023-12-23 04:56:20,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=962133.3333333334, ans=0.0 2023-12-23 04:56:24,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=962133.3333333334, ans=0.125 2023-12-23 04:56:28,652 INFO [train.py:886] (1/4) Epoch 31, batch 1350, loss[loss=0.01363, audio_tagging_loss=0.01363, over 23978.00 frames. ], tot_loss[loss=0.01266, audio_tagging_loss=0.01266, over 4942739.20 frames. ], batch size: 100, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:56:35,899 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.04 vs. limit=10.0 2023-12-23 04:56:50,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=962333.3333333334, ans=0.0 2023-12-23 04:56:54,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=962333.3333333334, ans=0.125 2023-12-23 04:56:59,015 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.43 vs. limit=15.0 2023-12-23 04:57:04,305 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.922e+01 3.247e+01 3.434e+01 3.583e+01 4.287e+01, threshold=6.867e+01, percent-clipped=0.0 2023-12-23 04:57:18,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=962466.6666666666, ans=0.0 2023-12-23 04:57:18,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=962466.6666666666, ans=0.125 2023-12-23 04:57:22,559 INFO [train.py:886] (1/4) Epoch 31, batch 1400, loss[loss=0.01149, audio_tagging_loss=0.01149, over 25000.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4943625.10 frames. ], batch size: 100, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:57:54,295 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2023-12-23 04:57:56,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=962733.3333333334, ans=10.0 2023-12-23 04:58:14,714 INFO [train.py:886] (1/4) Epoch 31, batch 1450, loss[loss=0.01426, audio_tagging_loss=0.01426, over 25000.00 frames. ], tot_loss[loss=0.01249, audio_tagging_loss=0.01249, over 4947609.96 frames. ], batch size: 100, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:58:18,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=962866.6666666666, ans=0.0 2023-12-23 04:58:34,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=962933.3333333334, ans=0.125 2023-12-23 04:58:35,477 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.86 vs. limit=15.0 2023-12-23 04:58:41,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=963000.0, ans=0.0 2023-12-23 04:58:44,248 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.16 vs. limit=12.0 2023-12-23 04:58:47,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=963066.6666666666, ans=0.1 2023-12-23 04:58:50,669 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.018e+01 3.252e+01 3.359e+01 3.465e+01 3.821e+01, threshold=6.718e+01, percent-clipped=0.0 2023-12-23 04:59:00,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=963133.3333333334, ans=0.0 2023-12-23 04:59:06,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=963200.0, ans=0.125 2023-12-23 04:59:07,367 INFO [train.py:886] (1/4) Epoch 31, batch 1500, loss[loss=0.01478, audio_tagging_loss=0.01478, over 25000.00 frames. ], tot_loss[loss=0.01247, audio_tagging_loss=0.01247, over 4943912.16 frames. ], batch size: 100, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:59:11,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=963200.0, ans=0.2 2023-12-23 04:59:18,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=963266.6666666666, ans=0.125 2023-12-23 04:59:49,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.74 vs. limit=22.5 2023-12-23 04:59:54,464 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=15.0 2023-12-23 04:59:59,920 INFO [train.py:886] (1/4) Epoch 31, batch 1550, loss[loss=0.01211, audio_tagging_loss=0.01211, over 24750.00 frames. ], tot_loss[loss=0.01252, audio_tagging_loss=0.01252, over 4937761.89 frames. ], batch size: 99, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 05:00:03,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=963533.3333333334, ans=0.0 2023-12-23 05:00:11,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=963600.0, ans=0.125 2023-12-23 05:00:22,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.47 vs. limit=15.0 2023-12-23 05:00:35,103 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.972e+01 3.345e+01 3.487e+01 3.654e+01 4.145e+01, threshold=6.974e+01, percent-clipped=0.0 2023-12-23 05:00:47,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=963800.0, ans=0.0 2023-12-23 05:00:50,973 INFO [train.py:886] (1/4) Epoch 31, batch 1600, loss[loss=0.01171, audio_tagging_loss=0.01171, over 24750.00 frames. ], tot_loss[loss=0.01258, audio_tagging_loss=0.01258, over 4937806.29 frames. ], batch size: 99, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 05:00:52,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=963866.6666666666, ans=0.125 2023-12-23 05:01:12,555 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:01:20,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=964000.0, ans=0.125 2023-12-23 05:01:26,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=964066.6666666666, ans=0.1 2023-12-23 05:01:27,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2023-12-23 05:01:29,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=964066.6666666666, ans=0.125 2023-12-23 05:01:30,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=964066.6666666666, ans=0.125 2023-12-23 05:01:38,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=964133.3333333334, ans=0.125 2023-12-23 05:01:43,457 INFO [train.py:886] (1/4) Epoch 31, batch 1650, loss[loss=0.01396, audio_tagging_loss=0.01396, over 24750.00 frames. ], tot_loss[loss=0.01248, audio_tagging_loss=0.01248, over 4938033.16 frames. ], batch size: 99, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 05:01:51,348 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2023-12-23 05:02:01,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.41 vs. limit=6.0 2023-12-23 05:02:18,489 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.893e+01 3.266e+01 3.423e+01 3.548e+01 4.336e+01, threshold=6.845e+01, percent-clipped=0.0 2023-12-23 05:02:18,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=964400.0, ans=0.0 2023-12-23 05:02:20,650 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.24 vs. limit=22.5 2023-12-23 05:02:29,866 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.54 vs. limit=15.0 2023-12-23 05:02:35,992 INFO [train.py:886] (1/4) Epoch 31, batch 1700, loss[loss=0.01233, audio_tagging_loss=0.01233, over 24750.00 frames. ], tot_loss[loss=0.01241, audio_tagging_loss=0.01241, over 4943162.08 frames. ], batch size: 99, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 05:02:45,791 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.19 vs. limit=15.0 2023-12-23 05:03:13,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=964733.3333333334, ans=0.1 2023-12-23 05:03:13,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=964733.3333333334, ans=0.125 2023-12-23 05:03:13,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=964733.3333333334, ans=0.125 2023-12-23 05:03:23,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=964800.0, ans=0.07 2023-12-23 05:03:27,696 INFO [train.py:886] (1/4) Epoch 31, batch 1750, loss[loss=0.01101, audio_tagging_loss=0.01101, over 25000.00 frames. ], tot_loss[loss=0.01244, audio_tagging_loss=0.01244, over 4949812.02 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 32.0 2023-12-23 05:03:28,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=964866.6666666666, ans=0.1 2023-12-23 05:03:31,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=964866.6666666666, ans=0.2 2023-12-23 05:03:36,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=964866.6666666666, ans=0.125 2023-12-23 05:03:50,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=965000.0, ans=0.125 2023-12-23 05:04:03,084 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.955e+01 3.266e+01 3.414e+01 3.599e+01 4.382e+01, threshold=6.827e+01, percent-clipped=0.0 2023-12-23 05:04:03,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=965066.6666666666, ans=0.1 2023-12-23 05:04:09,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=965133.3333333334, ans=0.1 2023-12-23 05:04:12,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=965133.3333333334, ans=0.09899494936611666 2023-12-23 05:04:20,476 INFO [train.py:886] (1/4) Epoch 31, batch 1800, loss[loss=0.01167, audio_tagging_loss=0.01167, over 25000.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4947892.51 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 32.0 2023-12-23 05:04:34,542 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.68 vs. limit=15.0 2023-12-23 05:05:11,668 INFO [train.py:886] (1/4) Epoch 31, batch 1850, loss[loss=0.01222, audio_tagging_loss=0.01222, over 24073.00 frames. ], tot_loss[loss=0.01247, audio_tagging_loss=0.01247, over 4954674.45 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 32.0 2023-12-23 05:05:28,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=965600.0, ans=0.05 2023-12-23 05:05:34,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=965666.6666666666, ans=0.2 2023-12-23 05:05:47,723 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.028e+01 3.346e+01 3.530e+01 3.672e+01 4.173e+01, threshold=7.061e+01, percent-clipped=0.0 2023-12-23 05:05:51,038 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.50 vs. limit=15.0 2023-12-23 05:06:04,207 INFO [train.py:886] (1/4) Epoch 31, batch 1900, loss[loss=0.0165, audio_tagging_loss=0.0165, over 24750.00 frames. ], tot_loss[loss=0.01264, audio_tagging_loss=0.01264, over 4945538.20 frames. ], batch size: 99, lr: 3.48e-03, grad_scale: 32.0 2023-12-23 05:06:14,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=965933.3333333334, ans=0.125 2023-12-23 05:06:14,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=965933.3333333334, ans=0.125 2023-12-23 05:06:34,161 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.25 vs. limit=12.0 2023-12-23 05:06:42,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=966066.6666666666, ans=0.125 2023-12-23 05:06:43,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=966066.6666666666, ans=0.125 2023-12-23 05:06:45,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=966133.3333333334, ans=0.125 2023-12-23 05:06:57,419 INFO [train.py:886] (1/4) Epoch 31, batch 1950, loss[loss=0.01326, audio_tagging_loss=0.01326, over 24750.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4941345.79 frames. ], batch size: 99, lr: 3.48e-03, grad_scale: 32.0 2023-12-23 05:06:58,109 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.13 vs. limit=12.0 2023-12-23 05:07:31,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=966400.0, ans=0.125 2023-12-23 05:07:32,864 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.868e+01 3.281e+01 3.424e+01 3.602e+01 4.114e+01, threshold=6.849e+01, percent-clipped=0.0 2023-12-23 05:07:35,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=966400.0, ans=0.125 2023-12-23 05:07:48,000 INFO [train.py:886] (1/4) Epoch 31, batch 2000, loss[loss=0.01284, audio_tagging_loss=0.01284, over 24750.00 frames. ], tot_loss[loss=0.0125, audio_tagging_loss=0.0125, over 4944155.31 frames. ], batch size: 99, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:07:58,484 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.68 vs. limit=15.0 2023-12-23 05:08:03,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=966600.0, ans=0.125 2023-12-23 05:08:08,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=966666.6666666666, ans=0.125 2023-12-23 05:08:14,278 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.40 vs. limit=15.0 2023-12-23 05:08:31,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=966800.0, ans=0.125 2023-12-23 05:08:31,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=966800.0, ans=0.05 2023-12-23 05:08:41,032 INFO [train.py:886] (1/4) Epoch 31, batch 2050, loss[loss=0.01313, audio_tagging_loss=0.01313, over 25000.00 frames. ], tot_loss[loss=0.01247, audio_tagging_loss=0.01247, over 4951172.87 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:08:47,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=966866.6666666666, ans=0.05 2023-12-23 05:08:55,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=966933.3333333334, ans=0.0 2023-12-23 05:08:56,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=966933.3333333334, ans=0.125 2023-12-23 05:08:58,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=966933.3333333334, ans=0.125 2023-12-23 05:09:15,615 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.914e+01 3.257e+01 3.392e+01 3.574e+01 4.327e+01, threshold=6.783e+01, percent-clipped=0.0 2023-12-23 05:09:26,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=967133.3333333334, ans=0.125 2023-12-23 05:09:29,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=967133.3333333334, ans=0.2 2023-12-23 05:09:30,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=967200.0, ans=0.0 2023-12-23 05:09:31,443 INFO [train.py:886] (1/4) Epoch 31, batch 2100, loss[loss=0.01332, audio_tagging_loss=0.01332, over 25000.00 frames. ], tot_loss[loss=0.01241, audio_tagging_loss=0.01241, over 4953590.86 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:09:39,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=967200.0, ans=0.1 2023-12-23 05:09:43,567 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.57 vs. limit=6.0 2023-12-23 05:10:07,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=967400.0, ans=0.1 2023-12-23 05:10:14,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=967466.6666666666, ans=0.2 2023-12-23 05:10:15,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=967466.6666666666, ans=0.0 2023-12-23 05:10:21,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=967466.6666666666, ans=0.125 2023-12-23 05:10:22,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=967466.6666666666, ans=0.2 2023-12-23 05:10:24,305 INFO [train.py:886] (1/4) Epoch 31, batch 2150, loss[loss=0.01179, audio_tagging_loss=0.01179, over 24750.00 frames. ], tot_loss[loss=0.01248, audio_tagging_loss=0.01248, over 4958375.33 frames. ], batch size: 99, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:10:28,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=967533.3333333334, ans=0.125 2023-12-23 05:10:37,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=967600.0, ans=0.0 2023-12-23 05:10:37,744 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.16 vs. limit=15.0 2023-12-23 05:10:40,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=967600.0, ans=0.1 2023-12-23 05:10:56,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=967733.3333333334, ans=0.2 2023-12-23 05:10:59,513 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.971e+01 3.338e+01 3.483e+01 3.621e+01 4.579e+01, threshold=6.966e+01, percent-clipped=0.0 2023-12-23 05:11:00,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=967733.3333333334, ans=0.0 2023-12-23 05:11:17,277 INFO [train.py:886] (1/4) Epoch 31, batch 2200, loss[loss=0.01119, audio_tagging_loss=0.01119, over 24750.00 frames. ], tot_loss[loss=0.01258, audio_tagging_loss=0.01258, over 4955277.08 frames. ], batch size: 99, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:11:29,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=967933.3333333334, ans=0.2 2023-12-23 05:11:29,487 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.71 vs. limit=10.0 2023-12-23 05:11:35,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=968000.0, ans=0.1 2023-12-23 05:11:56,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=968066.6666666666, ans=10.0 2023-12-23 05:11:56,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=968066.6666666666, ans=0.0 2023-12-23 05:12:08,051 INFO [train.py:886] (1/4) Epoch 31, batch 2250, loss[loss=0.01425, audio_tagging_loss=0.01425, over 24750.00 frames. ], tot_loss[loss=0.01259, audio_tagging_loss=0.01259, over 4950862.87 frames. ], batch size: 99, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:12:16,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=968200.0, ans=0.2 2023-12-23 05:12:17,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=968200.0, ans=0.125 2023-12-23 05:12:32,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=968333.3333333334, ans=0.07 2023-12-23 05:12:34,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=968333.3333333334, ans=0.2 2023-12-23 05:12:44,289 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.002e+01 3.260e+01 3.452e+01 3.625e+01 3.919e+01, threshold=6.903e+01, percent-clipped=0.0 2023-12-23 05:12:48,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=968400.0, ans=0.125 2023-12-23 05:12:55,742 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.09 vs. limit=15.0 2023-12-23 05:12:56,213 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.508e-03 2023-12-23 05:13:01,709 INFO [train.py:886] (1/4) Epoch 31, batch 2300, loss[loss=0.01092, audio_tagging_loss=0.01092, over 25000.00 frames. ], tot_loss[loss=0.01247, audio_tagging_loss=0.01247, over 4953328.02 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:13:02,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=968533.3333333334, ans=0.1 2023-12-23 05:13:03,203 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.36 vs. limit=12.0 2023-12-23 05:13:31,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=968666.6666666666, ans=0.125 2023-12-23 05:13:39,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=968733.3333333334, ans=0.125 2023-12-23 05:13:54,145 INFO [train.py:886] (1/4) Epoch 31, batch 2350, loss[loss=0.01096, audio_tagging_loss=0.01096, over 25000.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4952331.62 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:14:05,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=968933.3333333334, ans=0.0 2023-12-23 05:14:09,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=968933.3333333334, ans=0.0 2023-12-23 05:14:15,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=969000.0, ans=0.125 2023-12-23 05:14:29,949 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.885e+01 3.284e+01 3.418e+01 3.549e+01 4.229e+01, threshold=6.835e+01, percent-clipped=0.0 2023-12-23 05:14:37,556 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:14:45,937 INFO [train.py:886] (1/4) Epoch 31, batch 2400, loss[loss=0.01217, audio_tagging_loss=0.01217, over 25000.00 frames. ], tot_loss[loss=0.01233, audio_tagging_loss=0.01233, over 4954726.63 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:15:07,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=969333.3333333334, ans=0.2 2023-12-23 05:15:09,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=969333.3333333334, ans=0.95 2023-12-23 05:15:09,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=969333.3333333334, ans=0.125 2023-12-23 05:15:20,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=969400.0, ans=0.1 2023-12-23 05:15:23,300 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.91 vs. limit=22.5 2023-12-23 05:15:39,260 INFO [train.py:886] (1/4) Epoch 31, batch 2450, loss[loss=0.0143, audio_tagging_loss=0.0143, over 25000.00 frames. ], tot_loss[loss=0.01241, audio_tagging_loss=0.01241, over 4959095.87 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:15:39,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=969533.3333333334, ans=0.0 2023-12-23 05:15:40,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=969533.3333333334, ans=0.07 2023-12-23 05:15:41,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=969533.3333333334, ans=0.125 2023-12-23 05:15:43,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=969533.3333333334, ans=0.125 2023-12-23 05:15:46,309 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.99 vs. limit=6.0 2023-12-23 05:15:48,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=969600.0, ans=0.1 2023-12-23 05:15:51,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=969600.0, ans=0.1 2023-12-23 05:16:14,736 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.981e+01 3.300e+01 3.451e+01 3.618e+01 3.996e+01, threshold=6.901e+01, percent-clipped=0.0 2023-12-23 05:16:31,309 INFO [train.py:886] (1/4) Epoch 31, batch 2500, loss[loss=0.01365, audio_tagging_loss=0.01365, over 24750.00 frames. ], tot_loss[loss=0.01256, audio_tagging_loss=0.01256, over 4953822.84 frames. ], batch size: 99, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:16:32,429 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:16:41,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=969933.3333333334, ans=0.0 2023-12-23 05:17:03,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=970066.6666666666, ans=0.125 2023-12-23 05:17:09,327 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=15.0 2023-12-23 05:17:11,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=970066.6666666666, ans=0.125 2023-12-23 05:17:13,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=970133.3333333334, ans=10.0 2023-12-23 05:17:16,551 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:17:23,740 INFO [train.py:886] (1/4) Epoch 31, batch 2550, loss[loss=0.01082, audio_tagging_loss=0.01082, over 24750.00 frames. ], tot_loss[loss=0.01264, audio_tagging_loss=0.01264, over 4953365.14 frames. ], batch size: 99, lr: 3.48e-03, grad_scale: 32.0 2023-12-23 05:17:25,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=970200.0, ans=0.2 2023-12-23 05:17:59,856 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.010e+01 3.321e+01 3.428e+01 3.568e+01 4.183e+01, threshold=6.855e+01, percent-clipped=0.0 2023-12-23 05:18:15,525 INFO [train.py:886] (1/4) Epoch 31, batch 2600, loss[loss=0.0113, audio_tagging_loss=0.0113, over 25000.00 frames. ], tot_loss[loss=0.01261, audio_tagging_loss=0.01261, over 4947845.83 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:18:20,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.40 vs. limit=15.0 2023-12-23 05:18:26,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=970600.0, ans=0.1 2023-12-23 05:18:28,101 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:18:44,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.81 vs. limit=15.0 2023-12-23 05:18:55,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=970733.3333333334, ans=0.125 2023-12-23 05:19:05,692 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:19:07,503 INFO [train.py:886] (1/4) Epoch 31, batch 2650, loss[loss=0.01192, audio_tagging_loss=0.01192, over 25000.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4954353.68 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:19:07,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=970866.6666666666, ans=0.125 2023-12-23 05:19:15,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=970866.6666666666, ans=0.0 2023-12-23 05:19:18,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=970933.3333333334, ans=0.125 2023-12-23 05:19:44,615 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.992e+01 3.292e+01 3.427e+01 3.635e+01 4.044e+01, threshold=6.854e+01, percent-clipped=0.0 2023-12-23 05:19:58,862 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2023-12-23 05:20:00,186 INFO [train.py:886] (1/4) Epoch 31, batch 2700, loss[loss=0.01286, audio_tagging_loss=0.01286, over 25000.00 frames. ], tot_loss[loss=0.01251, audio_tagging_loss=0.01251, over 4957680.18 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:20:02,566 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.37 vs. limit=15.0 2023-12-23 05:20:12,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=971266.6666666666, ans=0.125 2023-12-23 05:20:29,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=971333.3333333334, ans=0.1 2023-12-23 05:20:38,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.29 vs. limit=15.0 2023-12-23 05:20:42,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=971466.6666666666, ans=0.0 2023-12-23 05:20:44,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=971466.6666666666, ans=0.5 2023-12-23 05:20:46,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=971466.6666666666, ans=0.025 2023-12-23 05:20:52,246 INFO [train.py:886] (1/4) Epoch 31, batch 2750, loss[loss=0.01038, audio_tagging_loss=0.01038, over 25000.00 frames. ], tot_loss[loss=0.01246, audio_tagging_loss=0.01246, over 4960295.15 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:21:10,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=971600.0, ans=0.1 2023-12-23 05:21:26,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=971733.3333333334, ans=0.2 2023-12-23 05:21:28,308 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.862e+01 3.291e+01 3.432e+01 3.574e+01 3.962e+01, threshold=6.863e+01, percent-clipped=0.0 2023-12-23 05:21:44,122 INFO [train.py:886] (1/4) Epoch 31, batch 2800, loss[loss=0.01218, audio_tagging_loss=0.01218, over 24750.00 frames. ], tot_loss[loss=0.01253, audio_tagging_loss=0.01253, over 4961100.71 frames. ], batch size: 99, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:21:44,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=971866.6666666666, ans=0.2 2023-12-23 05:21:44,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=971866.6666666666, ans=0.0 2023-12-23 05:21:47,381 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.47 vs. limit=15.0 2023-12-23 05:21:58,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=971933.3333333334, ans=0.125 2023-12-23 05:22:13,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=972000.0, ans=0.125 2023-12-23 05:22:19,667 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=17.31 vs. limit=15.0 2023-12-23 05:22:31,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=972133.3333333334, ans=0.0 2023-12-23 05:22:37,109 INFO [train.py:886] (1/4) Epoch 31, batch 2850, loss[loss=0.01236, audio_tagging_loss=0.01236, over 21964.00 frames. ], tot_loss[loss=0.01262, audio_tagging_loss=0.01262, over 4950026.39 frames. ], batch size: 107, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:23:13,338 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.924e+01 3.280e+01 3.439e+01 3.597e+01 4.174e+01, threshold=6.878e+01, percent-clipped=0.0 2023-12-23 05:23:28,955 INFO [train.py:886] (1/4) Epoch 31, batch 2900, loss[loss=0.01091, audio_tagging_loss=0.01091, over 25000.00 frames. ], tot_loss[loss=0.01262, audio_tagging_loss=0.01262, over 4947880.05 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:23:31,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=972533.3333333334, ans=0.0 2023-12-23 05:23:40,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=972600.0, ans=0.125 2023-12-23 05:24:02,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=972733.3333333334, ans=0.1 2023-12-23 05:24:09,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=972800.0, ans=0.0 2023-12-23 05:24:20,316 INFO [train.py:886] (1/4) Epoch 31, batch 2950, loss[loss=0.01331, audio_tagging_loss=0.01331, over 25000.00 frames. ], tot_loss[loss=0.01258, audio_tagging_loss=0.01258, over 4949664.91 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:24:25,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=972866.6666666666, ans=0.125 2023-12-23 05:24:50,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=973000.0, ans=0.125 2023-12-23 05:24:56,485 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.890e+01 3.283e+01 3.439e+01 3.603e+01 3.990e+01, threshold=6.877e+01, percent-clipped=0.0 2023-12-23 05:25:08,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=973133.3333333334, ans=0.0 2023-12-23 05:25:12,295 INFO [train.py:886] (1/4) Epoch 31, batch 3000, loss[loss=0.0142, audio_tagging_loss=0.0142, over 24750.00 frames. ], tot_loss[loss=0.01253, audio_tagging_loss=0.01253, over 4955043.71 frames. ], batch size: 99, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:25:12,295 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 05:25:20,075 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.4420, 3.4531, 3.0177, 0.7885], device='cuda:1') 2023-12-23 05:25:33,497 INFO [train.py:917] (1/4) Epoch 31, validation: loss=0.03277, audio_tagging_loss=0.03277, over 3737520.00 frames. 2023-12-23 05:25:33,498 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 05:25:40,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=973200.0, ans=0.0 2023-12-23 05:25:50,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=973266.6666666666, ans=0.125 2023-12-23 05:26:01,443 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=22.5 2023-12-23 05:26:24,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=973466.6666666666, ans=0.1 2023-12-23 05:26:24,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=973466.6666666666, ans=0.2 2023-12-23 05:26:25,830 INFO [train.py:886] (1/4) Epoch 31, batch 3050, loss[loss=0.01406, audio_tagging_loss=0.01406, over 25000.00 frames. ], tot_loss[loss=0.01241, audio_tagging_loss=0.01241, over 4957726.85 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:26:57,524 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.31 vs. limit=15.0 2023-12-23 05:27:01,677 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.915e+01 3.290e+01 3.424e+01 3.581e+01 4.026e+01, threshold=6.848e+01, percent-clipped=0.0 2023-12-23 05:27:06,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=973800.0, ans=0.0 2023-12-23 05:27:18,191 INFO [train.py:886] (1/4) Epoch 31, batch 3100, loss[loss=0.01089, audio_tagging_loss=0.01089, over 25000.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4954855.06 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:27:32,562 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.58 vs. limit=15.0 2023-12-23 05:27:36,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=973933.3333333334, ans=0.125 2023-12-23 05:27:57,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=974066.6666666666, ans=0.125 2023-12-23 05:27:58,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=974133.3333333334, ans=0.0 2023-12-23 05:28:09,278 INFO [train.py:886] (1/4) Epoch 31, batch 3150, loss[loss=0.01328, audio_tagging_loss=0.01328, over 25000.00 frames. ], tot_loss[loss=0.01253, audio_tagging_loss=0.01253, over 4952379.01 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:28:37,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=974333.3333333334, ans=0.125 2023-12-23 05:28:45,157 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.907e+01 3.356e+01 3.493e+01 3.608e+01 4.155e+01, threshold=6.985e+01, percent-clipped=0.0 2023-12-23 05:28:49,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=974466.6666666666, ans=0.0 2023-12-23 05:28:57,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=974466.6666666666, ans=0.125 2023-12-23 05:28:59,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=974466.6666666666, ans=0.125 2023-12-23 05:29:00,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=974533.3333333334, ans=0.125 2023-12-23 05:29:01,494 INFO [train.py:886] (1/4) Epoch 31, batch 3200, loss[loss=0.01236, audio_tagging_loss=0.01236, over 24750.00 frames. ], tot_loss[loss=0.01254, audio_tagging_loss=0.01254, over 4946560.96 frames. ], batch size: 99, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:29:01,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=974533.3333333334, ans=0.0 2023-12-23 05:29:02,087 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2023-12-23 05:29:16,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=974600.0, ans=0.1 2023-12-23 05:29:17,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=974600.0, ans=0.2 2023-12-23 05:29:18,549 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.60 vs. limit=10.0 2023-12-23 05:29:25,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=974666.6666666666, ans=0.125 2023-12-23 05:29:31,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=974666.6666666666, ans=0.0 2023-12-23 05:29:50,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.52 vs. limit=6.0 2023-12-23 05:29:52,911 INFO [train.py:886] (1/4) Epoch 31, batch 3250, loss[loss=0.01334, audio_tagging_loss=0.01334, over 25000.00 frames. ], tot_loss[loss=0.01246, audio_tagging_loss=0.01246, over 4949146.67 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:29:53,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=974866.6666666666, ans=0.025 2023-12-23 05:30:01,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=974866.6666666666, ans=0.0 2023-12-23 05:30:02,881 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.867e-03 2023-12-23 05:30:16,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=975000.0, ans=0.125 2023-12-23 05:30:17,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=975000.0, ans=0.0 2023-12-23 05:30:17,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.86 vs. limit=15.0 2023-12-23 05:30:19,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=975000.0, ans=0.125 2023-12-23 05:30:20,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=975000.0, ans=0.125 2023-12-23 05:30:30,253 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.928e+01 3.264e+01 3.408e+01 3.519e+01 4.216e+01, threshold=6.815e+01, percent-clipped=0.0 2023-12-23 05:30:37,055 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.40 vs. limit=12.0 2023-12-23 05:30:37,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=975133.3333333334, ans=0.1 2023-12-23 05:30:45,264 INFO [train.py:886] (1/4) Epoch 31, batch 3300, loss[loss=0.01212, audio_tagging_loss=0.01212, over 24750.00 frames. ], tot_loss[loss=0.01237, audio_tagging_loss=0.01237, over 4952602.75 frames. ], batch size: 99, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:30:50,503 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.47 vs. limit=15.0 2023-12-23 05:30:55,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=975266.6666666666, ans=0.125 2023-12-23 05:31:04,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=975266.6666666666, ans=0.0 2023-12-23 05:31:25,544 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.28 vs. limit=10.0 2023-12-23 05:31:34,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=975466.6666666666, ans=0.125 2023-12-23 05:31:37,494 INFO [train.py:886] (1/4) Epoch 31, batch 3350, loss[loss=0.01188, audio_tagging_loss=0.01188, over 25000.00 frames. ], tot_loss[loss=0.01229, audio_tagging_loss=0.01229, over 4953654.77 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:31:39,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=975533.3333333334, ans=0.125 2023-12-23 05:31:39,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=975533.3333333334, ans=0.2 2023-12-23 05:31:43,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=975533.3333333334, ans=0.125 2023-12-23 05:32:00,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=975666.6666666666, ans=0.125 2023-12-23 05:32:02,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=975666.6666666666, ans=0.125 2023-12-23 05:32:06,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=975666.6666666666, ans=0.125 2023-12-23 05:32:06,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=975666.6666666666, ans=0.125 2023-12-23 05:32:12,896 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.48 vs. limit=15.0 2023-12-23 05:32:14,156 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.007e+01 3.277e+01 3.424e+01 3.609e+01 4.697e+01, threshold=6.848e+01, percent-clipped=0.0 2023-12-23 05:32:17,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=975733.3333333334, ans=0.0 2023-12-23 05:32:21,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=975800.0, ans=0.07 2023-12-23 05:32:23,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=975800.0, ans=0.0 2023-12-23 05:32:28,544 INFO [train.py:886] (1/4) Epoch 31, batch 3400, loss[loss=0.01282, audio_tagging_loss=0.01282, over 25000.00 frames. ], tot_loss[loss=0.01239, audio_tagging_loss=0.01239, over 4956721.58 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:32:37,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=975933.3333333334, ans=0.025 2023-12-23 05:32:59,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=976066.6666666666, ans=0.125 2023-12-23 05:33:06,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=976066.6666666666, ans=0.125 2023-12-23 05:33:07,872 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.84 vs. limit=15.0 2023-12-23 05:33:16,039 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.74 vs. limit=22.5 2023-12-23 05:33:21,842 INFO [train.py:886] (1/4) Epoch 31, batch 3450, loss[loss=0.01186, audio_tagging_loss=0.01186, over 24750.00 frames. ], tot_loss[loss=0.01254, audio_tagging_loss=0.01254, over 4951620.25 frames. ], batch size: 99, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:33:24,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=976200.0, ans=0.0 2023-12-23 05:33:33,053 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.40 vs. limit=22.5 2023-12-23 05:33:42,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=976333.3333333334, ans=0.125 2023-12-23 05:33:45,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=976333.3333333334, ans=0.0 2023-12-23 05:33:58,071 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.000e+01 3.343e+01 3.465e+01 3.665e+01 4.176e+01, threshold=6.930e+01, percent-clipped=0.0 2023-12-23 05:34:03,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=976466.6666666666, ans=0.0 2023-12-23 05:34:05,185 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.57 vs. limit=12.0 2023-12-23 05:34:09,751 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.65 vs. limit=15.0 2023-12-23 05:34:13,711 INFO [train.py:886] (1/4) Epoch 31, batch 3500, loss[loss=0.01128, audio_tagging_loss=0.01128, over 24750.00 frames. ], tot_loss[loss=0.01253, audio_tagging_loss=0.01253, over 4949208.71 frames. ], batch size: 99, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:34:26,255 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.95 vs. limit=15.0 2023-12-23 05:34:26,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=976600.0, ans=0.125 2023-12-23 05:34:32,727 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.01 vs. limit=12.0 2023-12-23 05:34:46,539 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.26 vs. limit=15.0 2023-12-23 05:34:48,969 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.42 vs. limit=15.0 2023-12-23 05:35:05,386 INFO [train.py:886] (1/4) Epoch 31, batch 3550, loss[loss=0.01093, audio_tagging_loss=0.01093, over 24750.00 frames. ], tot_loss[loss=0.01251, audio_tagging_loss=0.01251, over 4950365.76 frames. ], batch size: 99, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:35:08,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=976866.6666666666, ans=0.125 2023-12-23 05:35:10,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=976866.6666666666, ans=0.0 2023-12-23 05:35:14,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=976933.3333333334, ans=0.2 2023-12-23 05:35:16,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=976933.3333333334, ans=0.0 2023-12-23 05:35:41,268 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.791e+01 3.288e+01 3.473e+01 3.624e+01 4.174e+01, threshold=6.946e+01, percent-clipped=0.0 2023-12-23 05:35:45,577 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.60 vs. limit=12.0 2023-12-23 05:35:53,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=977133.3333333334, ans=0.125 2023-12-23 05:35:54,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=977133.3333333334, ans=0.5 2023-12-23 05:35:57,775 INFO [train.py:886] (1/4) Epoch 31, batch 3600, loss[loss=0.009172, audio_tagging_loss=0.009172, over 25000.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4955298.98 frames. ], batch size: 100, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:36:24,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=977333.3333333334, ans=0.125 2023-12-23 05:36:32,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=977400.0, ans=0.125 2023-12-23 05:36:32,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=977400.0, ans=0.1 2023-12-23 05:36:33,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=977400.0, ans=0.05 2023-12-23 05:36:50,215 INFO [train.py:886] (1/4) Epoch 31, batch 3650, loss[loss=0.0117, audio_tagging_loss=0.0117, over 25000.00 frames. ], tot_loss[loss=0.01244, audio_tagging_loss=0.01244, over 4956950.46 frames. ], batch size: 100, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:36:55,309 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2023-12-23 05:37:02,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.66 vs. limit=12.0 2023-12-23 05:37:06,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=977600.0, ans=0.125 2023-12-23 05:37:09,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=977666.6666666666, ans=0.125 2023-12-23 05:37:09,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=977666.6666666666, ans=0.2 2023-12-23 05:37:18,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=977666.6666666666, ans=0.2 2023-12-23 05:37:25,684 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.024e+01 3.220e+01 3.394e+01 3.585e+01 4.042e+01, threshold=6.789e+01, percent-clipped=0.0 2023-12-23 05:37:35,580 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:37:38,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=977800.0, ans=0.0 2023-12-23 05:37:41,082 INFO [train.py:886] (1/4) Epoch 31, batch 3700, loss[loss=0.01438, audio_tagging_loss=0.01438, over 25000.00 frames. ], tot_loss[loss=0.01245, audio_tagging_loss=0.01245, over 4959815.41 frames. ], batch size: 100, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:37:50,868 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.95 vs. limit=15.0 2023-12-23 05:38:03,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=978000.0, ans=0.125 2023-12-23 05:38:16,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=978066.6666666666, ans=0.1 2023-12-23 05:38:33,228 INFO [train.py:886] (1/4) Epoch 31, batch 3750, loss[loss=0.01447, audio_tagging_loss=0.01447, over 24750.00 frames. ], tot_loss[loss=0.01251, audio_tagging_loss=0.01251, over 4955255.29 frames. ], batch size: 99, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:38:36,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=978200.0, ans=0.125 2023-12-23 05:38:39,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=978200.0, ans=0.0 2023-12-23 05:38:41,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=978266.6666666666, ans=0.1 2023-12-23 05:38:42,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=978266.6666666666, ans=0.0 2023-12-23 05:39:06,575 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=15.0 2023-12-23 05:39:08,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=978400.0, ans=0.125 2023-12-23 05:39:09,091 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.069e+01 3.376e+01 3.545e+01 3.717e+01 4.224e+01, threshold=7.090e+01, percent-clipped=0.0 2023-12-23 05:39:18,175 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.95 vs. limit=15.0 2023-12-23 05:39:18,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=978466.6666666666, ans=0.125 2023-12-23 05:39:21,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=978466.6666666666, ans=0.0 2023-12-23 05:39:24,137 INFO [train.py:886] (1/4) Epoch 31, batch 3800, loss[loss=0.01281, audio_tagging_loss=0.01281, over 25000.00 frames. ], tot_loss[loss=0.01265, audio_tagging_loss=0.01265, over 4949757.58 frames. ], batch size: 100, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:39:27,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=978533.3333333334, ans=0.125 2023-12-23 05:39:32,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=978533.3333333334, ans=0.125 2023-12-23 05:39:53,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=978666.6666666666, ans=0.05 2023-12-23 05:39:59,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=978733.3333333334, ans=0.1 2023-12-23 05:40:04,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=978800.0, ans=0.1 2023-12-23 05:40:08,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=978800.0, ans=0.2 2023-12-23 05:40:08,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.71 vs. limit=15.0 2023-12-23 05:40:09,889 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.89 vs. limit=15.0 2023-12-23 05:40:15,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=978866.6666666666, ans=0.125 2023-12-23 05:40:15,973 INFO [train.py:886] (1/4) Epoch 31, batch 3850, loss[loss=0.008439, audio_tagging_loss=0.008439, over 24750.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4953634.84 frames. ], batch size: 99, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:40:16,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=978866.6666666666, ans=0.1 2023-12-23 05:40:29,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=978933.3333333334, ans=0.125 2023-12-23 05:40:32,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=978933.3333333334, ans=0.125 2023-12-23 05:40:51,988 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.000e+01 3.321e+01 3.429e+01 3.579e+01 4.062e+01, threshold=6.857e+01, percent-clipped=0.0 2023-12-23 05:40:53,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=979066.6666666666, ans=0.125 2023-12-23 05:40:55,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=979066.6666666666, ans=0.0 2023-12-23 05:41:04,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=979133.3333333334, ans=0.1 2023-12-23 05:41:05,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=979133.3333333334, ans=0.1 2023-12-23 05:41:07,590 INFO [train.py:886] (1/4) Epoch 31, batch 3900, loss[loss=0.01202, audio_tagging_loss=0.01202, over 24750.00 frames. ], tot_loss[loss=0.01249, audio_tagging_loss=0.01249, over 4955205.76 frames. ], batch size: 99, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:41:09,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=979200.0, ans=0.125 2023-12-23 05:41:24,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=979266.6666666666, ans=0.1 2023-12-23 05:41:28,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=979333.3333333334, ans=0.125 2023-12-23 05:41:28,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=979333.3333333334, ans=0.2 2023-12-23 05:41:34,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=979333.3333333334, ans=0.05 2023-12-23 05:41:58,206 INFO [train.py:886] (1/4) Epoch 31, batch 3950, loss[loss=0.01203, audio_tagging_loss=0.01203, over 24750.00 frames. ], tot_loss[loss=0.01244, audio_tagging_loss=0.01244, over 4957300.56 frames. ], batch size: 99, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:42:00,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=979533.3333333334, ans=0.0 2023-12-23 05:42:12,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=979600.0, ans=0.05 2023-12-23 05:42:33,391 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.895e+01 3.318e+01 3.420e+01 3.631e+01 6.052e+01, threshold=6.840e+01, percent-clipped=0.0 2023-12-23 05:42:43,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=979800.0, ans=0.125 2023-12-23 05:42:50,394 INFO [train.py:886] (1/4) Epoch 31, batch 4000, loss[loss=0.0137, audio_tagging_loss=0.0137, over 25000.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4961329.93 frames. ], batch size: 100, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:42:58,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=979866.6666666666, ans=0.2 2023-12-23 05:43:15,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=980000.0, ans=0.125 2023-12-23 05:43:40,171 INFO [train.py:886] (1/4) Epoch 31, batch 4050, loss[loss=0.01194, audio_tagging_loss=0.01194, over 25000.00 frames. ], tot_loss[loss=0.01245, audio_tagging_loss=0.01245, over 4958578.24 frames. ], batch size: 100, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:43:41,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=980200.0, ans=0.1 2023-12-23 05:43:49,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=980266.6666666666, ans=0.0 2023-12-23 05:43:52,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=980266.6666666666, ans=0.5 2023-12-23 05:44:05,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=980333.3333333334, ans=0.125 2023-12-23 05:44:06,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.22 vs. limit=22.5 2023-12-23 05:44:15,582 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.086e+01 3.376e+01 3.511e+01 3.717e+01 4.360e+01, threshold=7.022e+01, percent-clipped=0.0 2023-12-23 05:44:21,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=980466.6666666666, ans=0.125 2023-12-23 05:44:23,086 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.22 vs. limit=15.0 2023-12-23 05:44:27,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=980466.6666666666, ans=0.1 2023-12-23 05:44:31,109 INFO [train.py:886] (1/4) Epoch 31, batch 4100, loss[loss=0.01128, audio_tagging_loss=0.01128, over 24750.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4952927.14 frames. ], batch size: 99, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:44:37,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=980533.3333333334, ans=0.0 2023-12-23 05:44:38,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=980533.3333333334, ans=0.0 2023-12-23 05:44:42,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=980600.0, ans=0.0 2023-12-23 05:44:42,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=980600.0, ans=0.125 2023-12-23 05:45:00,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=980666.6666666666, ans=0.025 2023-12-23 05:45:02,107 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.12 vs. limit=15.0 2023-12-23 05:45:02,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=980733.3333333334, ans=0.125 2023-12-23 05:45:03,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=980733.3333333334, ans=0.125 2023-12-23 05:45:23,698 INFO [train.py:886] (1/4) Epoch 31, batch 4150, loss[loss=0.01477, audio_tagging_loss=0.01477, over 24750.00 frames. ], tot_loss[loss=0.01253, audio_tagging_loss=0.01253, over 4945893.86 frames. ], batch size: 99, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:45:25,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=980866.6666666666, ans=0.0 2023-12-23 05:45:49,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=981000.0, ans=0.1 2023-12-23 05:45:59,588 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.785e+01 3.259e+01 3.391e+01 3.549e+01 4.171e+01, threshold=6.781e+01, percent-clipped=0.0 2023-12-23 05:46:05,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=981133.3333333334, ans=0.125 2023-12-23 05:46:08,739 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.95 vs. limit=6.0 2023-12-23 05:46:13,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=981200.0, ans=0.0 2023-12-23 05:46:13,429 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.40 vs. limit=22.5 2023-12-23 05:46:13,801 INFO [train.py:886] (1/4) Epoch 31, batch 4200, loss[loss=0.01211, audio_tagging_loss=0.01211, over 25000.00 frames. ], tot_loss[loss=0.01245, audio_tagging_loss=0.01245, over 4946511.60 frames. ], batch size: 100, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:46:24,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=981266.6666666666, ans=0.0 2023-12-23 05:46:26,003 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.70 vs. limit=15.0 2023-12-23 05:46:29,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=981266.6666666666, ans=0.125 2023-12-23 05:46:32,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=981266.6666666666, ans=0.09899494936611666 2023-12-23 05:46:33,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=981266.6666666666, ans=0.1 2023-12-23 05:46:34,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=981333.3333333334, ans=0.125 2023-12-23 05:46:40,853 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2023-12-23 05:46:51,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=981400.0, ans=0.2 2023-12-23 05:46:59,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=981466.6666666666, ans=0.125 2023-12-23 05:47:00,920 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.06 vs. limit=15.0 2023-12-23 05:47:06,148 INFO [train.py:886] (1/4) Epoch 31, batch 4250, loss[loss=0.01127, audio_tagging_loss=0.01127, over 25000.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4949513.78 frames. ], batch size: 100, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:47:08,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=981533.3333333334, ans=0.025 2023-12-23 05:47:14,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=981600.0, ans=0.125 2023-12-23 05:47:16,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=981600.0, ans=0.125 2023-12-23 05:47:21,768 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.77 vs. limit=6.0 2023-12-23 05:47:41,477 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.892e+01 3.291e+01 3.423e+01 3.545e+01 3.863e+01, threshold=6.845e+01, percent-clipped=0.0 2023-12-23 05:47:47,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=981800.0, ans=0.0 2023-12-23 05:47:55,772 INFO [train.py:886] (1/4) Epoch 31, batch 4300, loss[loss=0.01303, audio_tagging_loss=0.01303, over 25000.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4950657.12 frames. ], batch size: 100, lr: 3.45e-03, grad_scale: 32.0 2023-12-23 05:48:32,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=982066.6666666666, ans=0.1 2023-12-23 05:48:47,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=982133.3333333334, ans=0.2 2023-12-23 05:48:48,964 INFO [train.py:886] (1/4) Epoch 31, batch 4350, loss[loss=0.009502, audio_tagging_loss=0.009502, over 23975.00 frames. ], tot_loss[loss=0.0124, audio_tagging_loss=0.0124, over 4951784.29 frames. ], batch size: 100, lr: 3.45e-03, grad_scale: 32.0 2023-12-23 05:48:52,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=982200.0, ans=0.0 2023-12-23 05:48:55,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=982200.0, ans=0.125 2023-12-23 05:49:00,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=982266.6666666666, ans=0.125 2023-12-23 05:49:02,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=982266.6666666666, ans=0.95 2023-12-23 05:49:11,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=982333.3333333334, ans=0.0 2023-12-23 05:49:19,662 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.84 vs. limit=15.0 2023-12-23 05:49:24,696 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.048e+01 3.354e+01 3.488e+01 3.602e+01 4.133e+01, threshold=6.977e+01, percent-clipped=0.0 2023-12-23 05:49:35,319 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.56 vs. limit=10.0 2023-12-23 05:49:37,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=982466.6666666666, ans=0.0 2023-12-23 05:49:40,830 INFO [train.py:886] (1/4) Epoch 31, batch 4400, loss[loss=0.01384, audio_tagging_loss=0.01384, over 25000.00 frames. ], tot_loss[loss=0.01245, audio_tagging_loss=0.01245, over 4942626.61 frames. ], batch size: 100, lr: 3.45e-03, grad_scale: 32.0 2023-12-23 05:49:43,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=982533.3333333334, ans=0.0 2023-12-23 05:49:50,324 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:49:52,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=982600.0, ans=0.0 2023-12-23 05:50:17,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=982733.3333333334, ans=0.125 2023-12-23 05:50:23,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=982800.0, ans=0.0 2023-12-23 05:50:31,897 INFO [train.py:886] (1/4) Epoch 31, batch 4450, loss[loss=0.01359, audio_tagging_loss=0.01359, over 25000.00 frames. ], tot_loss[loss=0.01256, audio_tagging_loss=0.01256, over 4944630.38 frames. ], batch size: 100, lr: 3.45e-03, grad_scale: 32.0 2023-12-23 05:50:33,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=982866.6666666666, ans=10.0 2023-12-23 05:50:43,202 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=12.0 2023-12-23 05:50:44,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=982933.3333333334, ans=0.0 2023-12-23 05:50:55,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=983000.0, ans=0.125 2023-12-23 05:50:56,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=983000.0, ans=0.125 2023-12-23 05:51:07,906 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.813e+01 3.353e+01 3.451e+01 3.641e+01 4.281e+01, threshold=6.902e+01, percent-clipped=0.0 2023-12-23 05:51:22,308 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.21 vs. limit=15.0 2023-12-23 05:51:25,203 INFO [train.py:886] (1/4) Epoch 31, batch 4500, loss[loss=0.01307, audio_tagging_loss=0.01307, over 24750.00 frames. ], tot_loss[loss=0.01245, audio_tagging_loss=0.01245, over 4948510.09 frames. ], batch size: 99, lr: 3.45e-03, grad_scale: 32.0 2023-12-23 05:51:36,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=983266.6666666666, ans=0.0 2023-12-23 05:51:36,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=983266.6666666666, ans=0.125 2023-12-23 05:51:55,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=983400.0, ans=0.0 2023-12-23 05:52:16,190 INFO [train.py:886] (1/4) Epoch 31, batch 4550, loss[loss=0.009648, audio_tagging_loss=0.009648, over 24750.00 frames. ], tot_loss[loss=0.01242, audio_tagging_loss=0.01242, over 4947672.62 frames. ], batch size: 99, lr: 3.45e-03, grad_scale: 64.0 2023-12-23 05:52:53,977 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.975e+01 3.298e+01 3.434e+01 3.578e+01 4.234e+01, threshold=6.868e+01, percent-clipped=0.0 2023-12-23 05:53:07,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=983800.0, ans=0.125 2023-12-23 05:53:09,083 INFO [train.py:886] (1/4) Epoch 31, batch 4600, loss[loss=0.0109, audio_tagging_loss=0.0109, over 25000.00 frames. ], tot_loss[loss=0.01241, audio_tagging_loss=0.01241, over 4946949.53 frames. ], batch size: 100, lr: 3.45e-03, grad_scale: 64.0 2023-12-23 05:53:21,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=983933.3333333334, ans=0.1 2023-12-23 05:53:38,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=984000.0, ans=0.1 2023-12-23 05:53:58,511 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.49 vs. limit=6.0 2023-12-23 05:54:01,452 INFO [train.py:886] (1/4) Epoch 31, batch 4650, loss[loss=0.01199, audio_tagging_loss=0.01199, over 25000.00 frames. ], tot_loss[loss=0.01241, audio_tagging_loss=0.01241, over 4951557.44 frames. ], batch size: 100, lr: 3.45e-03, grad_scale: 64.0 2023-12-23 05:54:02,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=984200.0, ans=0.5 2023-12-23 05:54:16,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=984266.6666666666, ans=0.125 2023-12-23 05:54:25,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=984333.3333333334, ans=0.0 2023-12-23 05:54:32,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=984400.0, ans=0.0 2023-12-23 05:54:32,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=984400.0, ans=0.125 2023-12-23 05:54:33,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=984400.0, ans=0.0 2023-12-23 05:54:33,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=984400.0, ans=0.1 2023-12-23 05:54:34,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=984400.0, ans=0.1 2023-12-23 05:54:37,331 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.989e+01 3.324e+01 3.438e+01 3.563e+01 4.326e+01, threshold=6.876e+01, percent-clipped=0.0 2023-12-23 05:54:41,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=984466.6666666666, ans=0.0 2023-12-23 05:54:43,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=984466.6666666666, ans=0.125 2023-12-23 05:54:52,073 INFO [train.py:886] (1/4) Epoch 31, batch 4700, loss[loss=0.01518, audio_tagging_loss=0.01518, over 24750.00 frames. ], tot_loss[loss=0.01248, audio_tagging_loss=0.01248, over 4942785.10 frames. ], batch size: 99, lr: 3.45e-03, grad_scale: 64.0 2023-12-23 05:55:04,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=984600.0, ans=0.125 2023-12-23 05:55:09,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=984666.6666666666, ans=0.0 2023-12-23 05:55:16,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=984666.6666666666, ans=0.125 2023-12-23 05:55:17,662 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.03 vs. limit=12.0 2023-12-23 05:55:32,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=984800.0, ans=0.0 2023-12-23 05:55:39,298 INFO [train.py:886] (1/4) Epoch 31, batch 4750, loss[loss=0.01412, audio_tagging_loss=0.01412, over 24750.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4940099.37 frames. ], batch size: 99, lr: 3.45e-03, grad_scale: 64.0 2023-12-23 05:55:46,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=984866.6666666666, ans=0.2 2023-12-23 05:56:15,678 INFO [train.py:886] (1/4) Epoch 32, batch 0, loss[loss=0.03135, audio_tagging_loss=0.03135, over 23944.00 frames. ], tot_loss[loss=0.03135, audio_tagging_loss=0.03135, over 23944.00 frames. ], batch size: 100, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 05:56:15,678 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 05:56:33,177 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5915, 4.0142, 4.0964, 3.7657], device='cuda:1') 2023-12-23 05:56:36,812 INFO [train.py:917] (1/4) Epoch 32, validation: loss=0.03288, audio_tagging_loss=0.03288, over 3737520.00 frames. 2023-12-23 05:56:36,813 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 05:56:46,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=985040.0, ans=0.125 2023-12-23 05:56:57,223 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.142e+01 3.370e+01 3.559e+01 3.900e+01 9.561e+01, threshold=7.118e+01, percent-clipped=7.0 2023-12-23 05:57:03,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=985106.6666666666, ans=0.125 2023-12-23 05:57:04,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=985106.6666666666, ans=0.0 2023-12-23 05:57:06,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=985173.3333333334, ans=0.2 2023-12-23 05:57:09,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=985173.3333333334, ans=0.95 2023-12-23 05:57:14,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=985173.3333333334, ans=0.125 2023-12-23 05:57:22,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=985240.0, ans=0.1 2023-12-23 05:57:23,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=985240.0, ans=0.1 2023-12-23 05:57:24,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=985240.0, ans=0.2 2023-12-23 05:57:26,982 INFO [train.py:886] (1/4) Epoch 32, batch 50, loss[loss=0.01821, audio_tagging_loss=0.01821, over 25000.00 frames. ], tot_loss[loss=0.01967, audio_tagging_loss=0.01967, over 1118871.89 frames. ], batch size: 100, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 05:57:29,181 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.88 vs. limit=22.5 2023-12-23 05:57:55,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=985440.0, ans=0.1 2023-12-23 05:57:57,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=985506.6666666666, ans=0.125 2023-12-23 05:58:18,035 INFO [train.py:886] (1/4) Epoch 32, batch 100, loss[loss=0.01609, audio_tagging_loss=0.01609, over 25000.00 frames. ], tot_loss[loss=0.01687, audio_tagging_loss=0.01687, over 1975957.99 frames. ], batch size: 100, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 05:58:27,650 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:58:38,572 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.265e+01 3.723e+01 3.980e+01 4.376e+01 5.362e+01, threshold=7.961e+01, percent-clipped=0.0 2023-12-23 05:58:43,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=985773.3333333334, ans=0.0 2023-12-23 05:58:47,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=985773.3333333334, ans=0.125 2023-12-23 05:58:53,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=985840.0, ans=0.125 2023-12-23 05:58:54,557 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.03 vs. limit=15.0 2023-12-23 05:58:59,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=985906.6666666666, ans=0.1 2023-12-23 05:59:00,915 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.31 vs. limit=22.5 2023-12-23 05:59:09,722 INFO [train.py:886] (1/4) Epoch 32, batch 150, loss[loss=0.01601, audio_tagging_loss=0.01601, over 25000.00 frames. ], tot_loss[loss=0.01528, audio_tagging_loss=0.01528, over 2637063.42 frames. ], batch size: 100, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 05:59:27,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=986040.0, ans=15.0 2023-12-23 05:59:41,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=986173.3333333334, ans=0.2 2023-12-23 05:59:58,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=986240.0, ans=15.0 2023-12-23 06:00:01,208 INFO [train.py:886] (1/4) Epoch 32, batch 200, loss[loss=0.01295, audio_tagging_loss=0.01295, over 24750.00 frames. ], tot_loss[loss=0.01443, audio_tagging_loss=0.01443, over 3150995.75 frames. ], batch size: 99, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:00:03,661 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.42 vs. limit=10.0 2023-12-23 06:00:18,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=986373.3333333334, ans=0.1 2023-12-23 06:00:19,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=986373.3333333334, ans=0.125 2023-12-23 06:00:21,578 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.980e+01 3.390e+01 3.500e+01 3.693e+01 4.218e+01, threshold=7.000e+01, percent-clipped=0.0 2023-12-23 06:00:34,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=986506.6666666666, ans=0.2 2023-12-23 06:00:40,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=986573.3333333334, ans=0.07 2023-12-23 06:00:51,470 INFO [train.py:886] (1/4) Epoch 32, batch 250, loss[loss=0.01189, audio_tagging_loss=0.01189, over 24750.00 frames. ], tot_loss[loss=0.01388, audio_tagging_loss=0.01388, over 3549723.89 frames. ], batch size: 99, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:01:15,477 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.98 vs. limit=12.0 2023-12-23 06:01:23,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=986840.0, ans=0.2 2023-12-23 06:01:32,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=986840.0, ans=0.1 2023-12-23 06:01:43,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=986906.6666666666, ans=0.0 2023-12-23 06:01:44,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=986973.3333333334, ans=0.1 2023-12-23 06:01:45,012 INFO [train.py:886] (1/4) Epoch 32, batch 300, loss[loss=0.0122, audio_tagging_loss=0.0122, over 24750.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 3854531.17 frames. ], batch size: 99, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:01:48,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=986973.3333333334, ans=0.1 2023-12-23 06:02:06,008 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.926e+01 3.374e+01 3.482e+01 3.656e+01 5.710e+01, threshold=6.964e+01, percent-clipped=0.0 2023-12-23 06:02:19,025 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:02:21,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=987173.3333333334, ans=0.1 2023-12-23 06:02:25,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=987240.0, ans=0.0 2023-12-23 06:02:37,257 INFO [train.py:886] (1/4) Epoch 32, batch 350, loss[loss=0.01169, audio_tagging_loss=0.01169, over 24750.00 frames. ], tot_loss[loss=0.01332, audio_tagging_loss=0.01332, over 4093269.13 frames. ], batch size: 99, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:02:38,568 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.73 vs. limit=12.0 2023-12-23 06:02:48,712 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.62 vs. limit=22.5 2023-12-23 06:03:07,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=987506.6666666666, ans=0.125 2023-12-23 06:03:28,790 INFO [train.py:886] (1/4) Epoch 32, batch 400, loss[loss=0.01258, audio_tagging_loss=0.01258, over 24750.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 4281599.95 frames. ], batch size: 99, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:03:32,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=987640.0, ans=0.07 2023-12-23 06:03:38,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=987640.0, ans=0.0 2023-12-23 06:03:40,205 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.61 vs. limit=15.0 2023-12-23 06:03:41,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=987706.6666666666, ans=0.0 2023-12-23 06:03:49,900 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.966e+01 3.292e+01 3.425e+01 3.603e+01 4.421e+01, threshold=6.851e+01, percent-clipped=0.0 2023-12-23 06:04:12,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=987906.6666666666, ans=0.125 2023-12-23 06:04:15,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=987906.6666666666, ans=0.2 2023-12-23 06:04:17,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=987906.6666666666, ans=0.2 2023-12-23 06:04:18,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=987906.6666666666, ans=0.04949747468305833 2023-12-23 06:04:20,510 INFO [train.py:886] (1/4) Epoch 32, batch 450, loss[loss=0.01416, audio_tagging_loss=0.01416, over 25000.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 4433124.50 frames. ], batch size: 100, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:04:29,225 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.78 vs. limit=15.0 2023-12-23 06:04:31,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=988040.0, ans=0.125 2023-12-23 06:04:47,674 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:04:56,349 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.62 vs. limit=22.5 2023-12-23 06:05:05,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=988240.0, ans=0.1 2023-12-23 06:05:13,757 INFO [train.py:886] (1/4) Epoch 32, batch 500, loss[loss=0.01164, audio_tagging_loss=0.01164, over 25000.00 frames. ], tot_loss[loss=0.01258, audio_tagging_loss=0.01258, over 4548564.23 frames. ], batch size: 100, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:05:23,615 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.47 vs. limit=15.0 2023-12-23 06:05:34,074 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.920e+01 3.279e+01 3.431e+01 3.556e+01 4.457e+01, threshold=6.862e+01, percent-clipped=0.0 2023-12-23 06:05:35,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=988440.0, ans=0.5 2023-12-23 06:05:43,811 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.64 vs. limit=22.5 2023-12-23 06:05:45,316 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.70 vs. limit=15.0 2023-12-23 06:05:52,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=988506.6666666666, ans=0.0 2023-12-23 06:05:55,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=988573.3333333334, ans=0.125 2023-12-23 06:06:04,441 INFO [train.py:886] (1/4) Epoch 32, batch 550, loss[loss=0.01161, audio_tagging_loss=0.01161, over 25000.00 frames. ], tot_loss[loss=0.01253, audio_tagging_loss=0.01253, over 4647167.75 frames. ], batch size: 100, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:06:27,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=988773.3333333334, ans=0.125 2023-12-23 06:06:31,756 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.86 vs. limit=15.0 2023-12-23 06:06:31,920 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.85 vs. limit=22.5 2023-12-23 06:06:33,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=988773.3333333334, ans=0.2 2023-12-23 06:06:36,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=988840.0, ans=0.02 2023-12-23 06:06:56,695 INFO [train.py:886] (1/4) Epoch 32, batch 600, loss[loss=0.01549, audio_tagging_loss=0.01549, over 24750.00 frames. ], tot_loss[loss=0.01258, audio_tagging_loss=0.01258, over 4712433.05 frames. ], batch size: 99, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:06:57,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=988973.3333333334, ans=0.125 2023-12-23 06:06:59,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=988973.3333333334, ans=0.1 2023-12-23 06:07:02,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=988973.3333333334, ans=0.05 2023-12-23 06:07:03,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=988973.3333333334, ans=0.0 2023-12-23 06:07:16,464 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.995e+01 3.360e+01 3.500e+01 3.662e+01 4.640e+01, threshold=6.999e+01, percent-clipped=0.0 2023-12-23 06:07:16,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=989106.6666666666, ans=0.125 2023-12-23 06:07:36,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=989240.0, ans=0.125 2023-12-23 06:07:37,896 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.75 vs. limit=22.5 2023-12-23 06:07:44,026 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.44 vs. limit=15.0 2023-12-23 06:07:47,462 INFO [train.py:886] (1/4) Epoch 32, batch 650, loss[loss=0.01198, audio_tagging_loss=0.01198, over 24750.00 frames. ], tot_loss[loss=0.01258, audio_tagging_loss=0.01258, over 4760715.07 frames. ], batch size: 99, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:08:35,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=989573.3333333334, ans=0.125 2023-12-23 06:08:36,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=989573.3333333334, ans=0.125 2023-12-23 06:08:38,946 INFO [train.py:886] (1/4) Epoch 32, batch 700, loss[loss=0.013, audio_tagging_loss=0.013, over 25000.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4802276.18 frames. ], batch size: 100, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:08:39,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=989640.0, ans=0.125 2023-12-23 06:08:57,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=989706.6666666666, ans=0.2 2023-12-23 06:09:00,792 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.021e+01 3.361e+01 3.483e+01 3.638e+01 4.133e+01, threshold=6.965e+01, percent-clipped=0.0 2023-12-23 06:09:10,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=989840.0, ans=0.125 2023-12-23 06:09:12,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=989840.0, ans=0.125 2023-12-23 06:09:17,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=989840.0, ans=0.1 2023-12-23 06:09:30,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=989906.6666666666, ans=0.0 2023-12-23 06:09:32,138 INFO [train.py:886] (1/4) Epoch 32, batch 750, loss[loss=0.01189, audio_tagging_loss=0.01189, over 25000.00 frames. ], tot_loss[loss=0.0125, audio_tagging_loss=0.0125, over 4838564.58 frames. ], batch size: 100, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:09:33,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=989973.3333333334, ans=0.125 2023-12-23 06:09:34,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=989973.3333333334, ans=0.125 2023-12-23 06:09:35,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=989973.3333333334, ans=0.0 2023-12-23 06:09:37,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=989973.3333333334, ans=0.1 2023-12-23 06:09:52,405 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.97 vs. limit=15.0 2023-12-23 06:09:54,804 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.37 vs. limit=22.5 2023-12-23 06:10:13,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=990240.0, ans=0.1 2023-12-23 06:10:13,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=990240.0, ans=0.125 2023-12-23 06:10:23,088 INFO [train.py:886] (1/4) Epoch 32, batch 800, loss[loss=0.009964, audio_tagging_loss=0.009964, over 24011.00 frames. ], tot_loss[loss=0.0124, audio_tagging_loss=0.0124, over 4866636.74 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:10:39,140 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.17 vs. limit=15.0 2023-12-23 06:10:44,495 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.016e+01 3.268e+01 3.423e+01 3.577e+01 3.951e+01, threshold=6.846e+01, percent-clipped=0.0 2023-12-23 06:10:46,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=990440.0, ans=0.125 2023-12-23 06:11:05,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=990573.3333333334, ans=0.1 2023-12-23 06:11:08,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=990573.3333333334, ans=0.1 2023-12-23 06:11:16,038 INFO [train.py:886] (1/4) Epoch 32, batch 850, loss[loss=0.01367, audio_tagging_loss=0.01367, over 25000.00 frames. ], tot_loss[loss=0.01241, audio_tagging_loss=0.01241, over 4889925.34 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:11:49,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=990840.0, ans=0.0 2023-12-23 06:11:52,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=990840.0, ans=0.025 2023-12-23 06:11:54,383 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:11:54,580 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.42 vs. limit=15.0 2023-12-23 06:12:07,826 INFO [train.py:886] (1/4) Epoch 32, batch 900, loss[loss=0.0107, audio_tagging_loss=0.0107, over 24750.00 frames. ], tot_loss[loss=0.01249, audio_tagging_loss=0.01249, over 4905727.68 frames. ], batch size: 99, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:12:20,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=991040.0, ans=0.125 2023-12-23 06:12:26,907 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.34 vs. limit=15.0 2023-12-23 06:12:28,211 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.920e+01 3.389e+01 3.537e+01 3.671e+01 4.365e+01, threshold=7.073e+01, percent-clipped=0.0 2023-12-23 06:12:45,241 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.21 vs. limit=15.0 2023-12-23 06:12:45,262 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.52 vs. limit=22.5 2023-12-23 06:12:48,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=991240.0, ans=0.0 2023-12-23 06:12:52,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=991240.0, ans=0.0 2023-12-23 06:12:58,818 INFO [train.py:886] (1/4) Epoch 32, batch 950, loss[loss=0.0131, audio_tagging_loss=0.0131, over 24750.00 frames. ], tot_loss[loss=0.01248, audio_tagging_loss=0.01248, over 4915510.81 frames. ], batch size: 99, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:13:09,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=991373.3333333334, ans=0.125 2023-12-23 06:13:09,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=991373.3333333334, ans=0.5 2023-12-23 06:13:14,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=991373.3333333334, ans=0.125 2023-12-23 06:13:20,505 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.88 vs. limit=6.0 2023-12-23 06:13:24,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=991440.0, ans=0.0 2023-12-23 06:13:32,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=991506.6666666666, ans=0.125 2023-12-23 06:13:35,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=991506.6666666666, ans=0.125 2023-12-23 06:13:36,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=991506.6666666666, ans=0.125 2023-12-23 06:13:47,743 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.07 vs. limit=12.0 2023-12-23 06:13:49,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=991573.3333333334, ans=0.0 2023-12-23 06:13:51,872 INFO [train.py:886] (1/4) Epoch 32, batch 1000, loss[loss=0.0128, audio_tagging_loss=0.0128, over 24750.00 frames. ], tot_loss[loss=0.0125, audio_tagging_loss=0.0125, over 4918661.49 frames. ], batch size: 99, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:13:52,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=991640.0, ans=0.125 2023-12-23 06:14:08,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=991706.6666666666, ans=0.1 2023-12-23 06:14:11,544 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.978e+01 3.269e+01 3.394e+01 3.560e+01 3.959e+01, threshold=6.787e+01, percent-clipped=0.0 2023-12-23 06:14:22,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=991840.0, ans=0.125 2023-12-23 06:14:23,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=991840.0, ans=0.125 2023-12-23 06:14:25,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=991840.0, ans=0.0 2023-12-23 06:14:33,302 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.33 vs. limit=15.0 2023-12-23 06:14:40,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=991906.6666666666, ans=0.0 2023-12-23 06:14:42,958 INFO [train.py:886] (1/4) Epoch 32, batch 1050, loss[loss=0.01227, audio_tagging_loss=0.01227, over 25000.00 frames. ], tot_loss[loss=0.01242, audio_tagging_loss=0.01242, over 4929540.87 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:14:49,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=991973.3333333334, ans=0.125 2023-12-23 06:14:49,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=991973.3333333334, ans=0.2 2023-12-23 06:14:55,572 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.24 vs. limit=15.0 2023-12-23 06:15:17,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=992173.3333333334, ans=0.125 2023-12-23 06:15:31,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=992240.0, ans=0.125 2023-12-23 06:15:31,353 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:15:33,861 INFO [train.py:886] (1/4) Epoch 32, batch 1100, loss[loss=0.01385, audio_tagging_loss=0.01385, over 25000.00 frames. ], tot_loss[loss=0.01238, audio_tagging_loss=0.01238, over 4932658.89 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:15:54,838 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.843e+01 3.284e+01 3.426e+01 3.635e+01 4.027e+01, threshold=6.852e+01, percent-clipped=0.0 2023-12-23 06:15:55,527 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.21 vs. limit=15.0 2023-12-23 06:16:26,217 INFO [train.py:886] (1/4) Epoch 32, batch 1150, loss[loss=0.01355, audio_tagging_loss=0.01355, over 24750.00 frames. ], tot_loss[loss=0.01236, audio_tagging_loss=0.01236, over 4937331.70 frames. ], batch size: 99, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:16:26,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=992640.0, ans=0.1 2023-12-23 06:16:48,113 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.95 vs. limit=10.0 2023-12-23 06:16:49,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=992773.3333333334, ans=10.0 2023-12-23 06:17:16,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=992973.3333333334, ans=0.07 2023-12-23 06:17:17,331 INFO [train.py:886] (1/4) Epoch 32, batch 1200, loss[loss=0.01549, audio_tagging_loss=0.01549, over 25000.00 frames. ], tot_loss[loss=0.01238, audio_tagging_loss=0.01238, over 4949594.60 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:17:39,160 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.986e+01 3.358e+01 3.522e+01 3.691e+01 4.259e+01, threshold=7.044e+01, percent-clipped=0.0 2023-12-23 06:17:39,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=993106.6666666666, ans=0.125 2023-12-23 06:17:39,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=993106.6666666666, ans=0.125 2023-12-23 06:18:10,292 INFO [train.py:886] (1/4) Epoch 32, batch 1250, loss[loss=0.01437, audio_tagging_loss=0.01437, over 24750.00 frames. ], tot_loss[loss=0.01241, audio_tagging_loss=0.01241, over 4941025.13 frames. ], batch size: 99, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:18:39,456 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.12 vs. limit=22.5 2023-12-23 06:18:41,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=993506.6666666666, ans=0.2 2023-12-23 06:18:57,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=993573.3333333334, ans=0.125 2023-12-23 06:19:01,456 INFO [train.py:886] (1/4) Epoch 32, batch 1300, loss[loss=0.01013, audio_tagging_loss=0.01013, over 24750.00 frames. ], tot_loss[loss=0.01247, audio_tagging_loss=0.01247, over 4944816.25 frames. ], batch size: 99, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:19:07,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=993640.0, ans=0.125 2023-12-23 06:19:15,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=993706.6666666666, ans=0.125 2023-12-23 06:19:18,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=993706.6666666666, ans=0.125 2023-12-23 06:19:22,692 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.042e+01 3.376e+01 3.531e+01 3.672e+01 4.244e+01, threshold=7.062e+01, percent-clipped=0.0 2023-12-23 06:19:24,976 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.30 vs. limit=12.0 2023-12-23 06:19:25,146 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.68 vs. limit=15.0 2023-12-23 06:19:25,251 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.75 vs. limit=15.0 2023-12-23 06:19:37,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=993840.0, ans=0.1 2023-12-23 06:19:40,343 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.80 vs. limit=22.5 2023-12-23 06:19:45,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=993906.6666666666, ans=0.1 2023-12-23 06:19:48,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=993906.6666666666, ans=0.0 2023-12-23 06:19:53,914 INFO [train.py:886] (1/4) Epoch 32, batch 1350, loss[loss=0.01078, audio_tagging_loss=0.01078, over 25000.00 frames. ], tot_loss[loss=0.0125, audio_tagging_loss=0.0125, over 4948418.90 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:20:03,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.20 vs. limit=22.5 2023-12-23 06:20:14,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=994106.6666666666, ans=0.2 2023-12-23 06:20:18,298 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.17 vs. limit=22.5 2023-12-23 06:20:21,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.09 vs. limit=10.0 2023-12-23 06:20:22,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=994106.6666666666, ans=0.04949747468305833 2023-12-23 06:20:26,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=994173.3333333334, ans=0.2 2023-12-23 06:20:39,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=994240.0, ans=0.1 2023-12-23 06:20:41,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=994240.0, ans=0.0 2023-12-23 06:20:44,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=994240.0, ans=0.125 2023-12-23 06:20:46,244 INFO [train.py:886] (1/4) Epoch 32, batch 1400, loss[loss=0.01116, audio_tagging_loss=0.01116, over 25000.00 frames. ], tot_loss[loss=0.01235, audio_tagging_loss=0.01235, over 4949345.85 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:20:46,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=994306.6666666666, ans=0.125 2023-12-23 06:20:48,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=994306.6666666666, ans=0.015 2023-12-23 06:20:49,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=994306.6666666666, ans=0.125 2023-12-23 06:20:54,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=994306.6666666666, ans=0.125 2023-12-23 06:21:06,624 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.912e+01 3.298e+01 3.472e+01 3.566e+01 4.099e+01, threshold=6.943e+01, percent-clipped=0.0 2023-12-23 06:21:17,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=994506.6666666666, ans=0.0 2023-12-23 06:21:27,648 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.48 vs. limit=22.5 2023-12-23 06:21:30,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=994573.3333333334, ans=0.0 2023-12-23 06:21:38,020 INFO [train.py:886] (1/4) Epoch 32, batch 1450, loss[loss=0.01394, audio_tagging_loss=0.01394, over 25000.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4952007.02 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:21:45,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=994640.0, ans=0.05 2023-12-23 06:21:47,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=994706.6666666666, ans=0.125 2023-12-23 06:22:01,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=994773.3333333334, ans=0.2 2023-12-23 06:22:12,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=994840.0, ans=0.125 2023-12-23 06:22:18,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=994906.6666666666, ans=0.2 2023-12-23 06:22:30,198 INFO [train.py:886] (1/4) Epoch 32, batch 1500, loss[loss=0.01202, audio_tagging_loss=0.01202, over 25000.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4953542.03 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:22:45,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=995040.0, ans=0.1 2023-12-23 06:22:49,991 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.96 vs. limit=15.0 2023-12-23 06:22:50,504 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.068e+01 3.293e+01 3.421e+01 3.550e+01 3.928e+01, threshold=6.843e+01, percent-clipped=0.0 2023-12-23 06:22:57,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=995106.6666666666, ans=0.125 2023-12-23 06:22:58,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=995106.6666666666, ans=0.05 2023-12-23 06:22:58,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=995106.6666666666, ans=0.0 2023-12-23 06:23:01,119 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.54 vs. limit=22.5 2023-12-23 06:23:03,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=995173.3333333334, ans=0.125 2023-12-23 06:23:14,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=995240.0, ans=0.125 2023-12-23 06:23:15,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=995240.0, ans=0.04949747468305833 2023-12-23 06:23:21,275 INFO [train.py:886] (1/4) Epoch 32, batch 1550, loss[loss=0.01197, audio_tagging_loss=0.01197, over 24750.00 frames. ], tot_loss[loss=0.01242, audio_tagging_loss=0.01242, over 4952938.79 frames. ], batch size: 99, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:23:43,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=995440.0, ans=0.125 2023-12-23 06:23:43,789 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.69 vs. limit=15.0 2023-12-23 06:23:47,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=995440.0, ans=0.2 2023-12-23 06:24:08,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=995573.3333333334, ans=0.1 2023-12-23 06:24:10,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=995573.3333333334, ans=0.125 2023-12-23 06:24:13,319 INFO [train.py:886] (1/4) Epoch 32, batch 1600, loss[loss=0.01359, audio_tagging_loss=0.01359, over 24750.00 frames. ], tot_loss[loss=0.01242, audio_tagging_loss=0.01242, over 4947981.81 frames. ], batch size: 99, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:24:13,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=995640.0, ans=0.0 2023-12-23 06:24:20,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=995640.0, ans=0.125 2023-12-23 06:24:29,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=995706.6666666666, ans=0.95 2023-12-23 06:24:34,478 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.084e+01 3.339e+01 3.484e+01 3.655e+01 4.537e+01, threshold=6.968e+01, percent-clipped=0.0 2023-12-23 06:24:51,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=995840.0, ans=0.125 2023-12-23 06:24:55,340 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.63 vs. limit=22.5 2023-12-23 06:24:55,910 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:25:00,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=995906.6666666666, ans=0.125 2023-12-23 06:25:04,880 INFO [train.py:886] (1/4) Epoch 32, batch 1650, loss[loss=0.0121, audio_tagging_loss=0.0121, over 24750.00 frames. ], tot_loss[loss=0.01238, audio_tagging_loss=0.01238, over 4946097.23 frames. ], batch size: 99, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:25:07,198 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.30 vs. limit=15.0 2023-12-23 06:25:12,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=995973.3333333334, ans=22.5 2023-12-23 06:25:23,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=996040.0, ans=0.125 2023-12-23 06:25:23,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=996040.0, ans=0.0 2023-12-23 06:25:36,056 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.68 vs. limit=22.5 2023-12-23 06:25:48,256 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.65 vs. limit=12.0 2023-12-23 06:25:56,158 INFO [train.py:886] (1/4) Epoch 32, batch 1700, loss[loss=0.01028, audio_tagging_loss=0.01028, over 25000.00 frames. ], tot_loss[loss=0.01233, audio_tagging_loss=0.01233, over 4950658.72 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 32.0 2023-12-23 06:26:16,299 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.047e+01 3.316e+01 3.484e+01 3.622e+01 4.382e+01, threshold=6.969e+01, percent-clipped=0.0 2023-12-23 06:26:26,792 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=15.0 2023-12-23 06:26:29,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=996506.6666666666, ans=0.0 2023-12-23 06:26:46,896 INFO [train.py:886] (1/4) Epoch 32, batch 1750, loss[loss=0.01142, audio_tagging_loss=0.01142, over 25000.00 frames. ], tot_loss[loss=0.01221, audio_tagging_loss=0.01221, over 4954375.08 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 32.0 2023-12-23 06:27:16,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=996773.3333333334, ans=0.125 2023-12-23 06:27:16,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=996773.3333333334, ans=0.125 2023-12-23 06:27:29,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=996906.6666666666, ans=0.0 2023-12-23 06:27:40,120 INFO [train.py:886] (1/4) Epoch 32, batch 1800, loss[loss=0.01274, audio_tagging_loss=0.01274, over 25000.00 frames. ], tot_loss[loss=0.01216, audio_tagging_loss=0.01216, over 4957602.65 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 32.0 2023-12-23 06:27:42,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=996973.3333333334, ans=0.2 2023-12-23 06:27:43,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=996973.3333333334, ans=0.0 2023-12-23 06:27:44,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=996973.3333333334, ans=0.125 2023-12-23 06:27:51,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=997040.0, ans=0.0 2023-12-23 06:27:59,180 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.973e+01 3.314e+01 3.468e+01 3.623e+01 4.214e+01, threshold=6.936e+01, percent-clipped=0.0 2023-12-23 06:28:05,756 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:28:12,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=997173.3333333334, ans=0.125 2023-12-23 06:28:21,647 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.21 vs. limit=15.0 2023-12-23 06:28:22,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=997240.0, ans=0.0 2023-12-23 06:28:29,602 INFO [train.py:886] (1/4) Epoch 32, batch 1850, loss[loss=0.01623, audio_tagging_loss=0.01623, over 24750.00 frames. ], tot_loss[loss=0.01227, audio_tagging_loss=0.01227, over 4955137.23 frames. ], batch size: 99, lr: 3.37e-03, grad_scale: 32.0 2023-12-23 06:28:30,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=997306.6666666666, ans=0.1 2023-12-23 06:29:22,522 INFO [train.py:886] (1/4) Epoch 32, batch 1900, loss[loss=0.01067, audio_tagging_loss=0.01067, over 24750.00 frames. ], tot_loss[loss=0.01238, audio_tagging_loss=0.01238, over 4952639.08 frames. ], batch size: 99, lr: 3.37e-03, grad_scale: 32.0 2023-12-23 06:29:22,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=997640.0, ans=0.125 2023-12-23 06:29:27,613 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:29:35,177 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.34 vs. limit=10.0 2023-12-23 06:29:39,523 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.21 vs. limit=15.0 2023-12-23 06:29:42,703 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.010e+01 3.383e+01 3.531e+01 3.681e+01 4.206e+01, threshold=7.062e+01, percent-clipped=0.0 2023-12-23 06:30:05,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=997906.6666666666, ans=0.1 2023-12-23 06:30:09,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=997906.6666666666, ans=0.1 2023-12-23 06:30:13,314 INFO [train.py:886] (1/4) Epoch 32, batch 1950, loss[loss=0.01136, audio_tagging_loss=0.01136, over 24750.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4951117.07 frames. ], batch size: 99, lr: 3.37e-03, grad_scale: 32.0 2023-12-23 06:30:17,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=997973.3333333334, ans=0.125 2023-12-23 06:30:29,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=998040.0, ans=0.1 2023-12-23 06:31:02,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=998240.0, ans=0.125 2023-12-23 06:31:04,959 INFO [train.py:886] (1/4) Epoch 32, batch 2000, loss[loss=0.01122, audio_tagging_loss=0.01122, over 25000.00 frames. ], tot_loss[loss=0.01233, audio_tagging_loss=0.01233, over 4950068.20 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:31:06,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=998306.6666666666, ans=0.125 2023-12-23 06:31:18,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=998373.3333333334, ans=0.125 2023-12-23 06:31:26,068 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.968e+01 3.338e+01 3.486e+01 3.657e+01 4.428e+01, threshold=6.972e+01, percent-clipped=0.0 2023-12-23 06:31:29,158 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:31:33,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=998440.0, ans=0.1 2023-12-23 06:31:33,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=998440.0, ans=0.125 2023-12-23 06:31:42,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=998506.6666666666, ans=0.125 2023-12-23 06:31:43,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=998506.6666666666, ans=0.0 2023-12-23 06:31:44,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=998573.3333333334, ans=0.2 2023-12-23 06:31:45,154 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.32 vs. limit=22.5 2023-12-23 06:31:50,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=998573.3333333334, ans=0.125 2023-12-23 06:31:56,447 INFO [train.py:886] (1/4) Epoch 32, batch 2050, loss[loss=0.01144, audio_tagging_loss=0.01144, over 25000.00 frames. ], tot_loss[loss=0.01232, audio_tagging_loss=0.01232, over 4955780.84 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:32:01,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=998640.0, ans=0.1 2023-12-23 06:32:11,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=998706.6666666666, ans=0.125 2023-12-23 06:32:21,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=998773.3333333334, ans=0.125 2023-12-23 06:32:40,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=998906.6666666666, ans=0.125 2023-12-23 06:32:46,716 INFO [train.py:886] (1/4) Epoch 32, batch 2100, loss[loss=0.01278, audio_tagging_loss=0.01278, over 25000.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4956014.56 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:33:04,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=999040.0, ans=0.1 2023-12-23 06:33:08,555 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.022e+01 3.328e+01 3.487e+01 3.635e+01 4.227e+01, threshold=6.974e+01, percent-clipped=0.0 2023-12-23 06:33:12,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=999106.6666666666, ans=0.125 2023-12-23 06:33:37,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=999240.0, ans=0.125 2023-12-23 06:33:39,876 INFO [train.py:886] (1/4) Epoch 32, batch 2150, loss[loss=0.01502, audio_tagging_loss=0.01502, over 25000.00 frames. ], tot_loss[loss=0.01232, audio_tagging_loss=0.01232, over 4961552.33 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:33:48,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=999373.3333333334, ans=0.125 2023-12-23 06:34:08,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=999440.0, ans=0.125 2023-12-23 06:34:10,726 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.54 vs. limit=22.5 2023-12-23 06:34:18,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=999506.6666666666, ans=0.0 2023-12-23 06:34:30,593 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.72 vs. limit=15.0 2023-12-23 06:34:31,150 INFO [train.py:886] (1/4) Epoch 32, batch 2200, loss[loss=0.01071, audio_tagging_loss=0.01071, over 24750.00 frames. ], tot_loss[loss=0.01242, audio_tagging_loss=0.01242, over 4957381.57 frames. ], batch size: 99, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:34:51,673 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.973e+01 3.381e+01 3.492e+01 3.667e+01 4.618e+01, threshold=6.983e+01, percent-clipped=0.0 2023-12-23 06:34:51,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=999773.3333333334, ans=0.0 2023-12-23 06:34:53,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=999773.3333333334, ans=0.1 2023-12-23 06:34:55,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=999773.3333333334, ans=0.2 2023-12-23 06:35:00,825 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.51 vs. limit=5.0 2023-12-23 06:35:12,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=999906.6666666666, ans=0.125 2023-12-23 06:35:22,730 INFO [train.py:886] (1/4) Epoch 32, batch 2250, loss[loss=0.01255, audio_tagging_loss=0.01255, over 25000.00 frames. ], tot_loss[loss=0.01253, audio_tagging_loss=0.01253, over 4951671.08 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:35:24,915 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:35:32,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1000040.0, ans=0.125 2023-12-23 06:35:40,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1000040.0, ans=0.125 2023-12-23 06:35:55,610 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=15.0 2023-12-23 06:36:12,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1000240.0, ans=0.0 2023-12-23 06:36:15,253 INFO [train.py:886] (1/4) Epoch 32, batch 2300, loss[loss=0.0156, audio_tagging_loss=0.0156, over 22402.00 frames. ], tot_loss[loss=0.01249, audio_tagging_loss=0.01249, over 4953099.46 frames. ], batch size: 107, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:36:16,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1000306.6666666666, ans=0.1 2023-12-23 06:36:17,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1000306.6666666666, ans=0.2 2023-12-23 06:36:35,438 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.075e+01 3.334e+01 3.452e+01 3.612e+01 4.472e+01, threshold=6.904e+01, percent-clipped=0.0 2023-12-23 06:36:46,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1000506.6666666666, ans=0.2 2023-12-23 06:36:51,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1000506.6666666666, ans=0.125 2023-12-23 06:36:58,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1000573.3333333334, ans=0.0 2023-12-23 06:37:06,013 INFO [train.py:886] (1/4) Epoch 32, batch 2350, loss[loss=0.009544, audio_tagging_loss=0.009544, over 24750.00 frames. ], tot_loss[loss=0.01241, audio_tagging_loss=0.01241, over 4953098.18 frames. ], batch size: 99, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:37:15,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1000640.0, ans=0.2 2023-12-23 06:37:16,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1000706.6666666666, ans=0.125 2023-12-23 06:37:21,177 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.71 vs. limit=6.0 2023-12-23 06:37:24,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1000706.6666666666, ans=0.1 2023-12-23 06:37:30,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=1000773.3333333334, ans=0.05 2023-12-23 06:37:39,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1000840.0, ans=0.125 2023-12-23 06:37:42,631 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.08 vs. limit=10.0 2023-12-23 06:37:43,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1000840.0, ans=0.125 2023-12-23 06:37:45,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1000840.0, ans=0.1 2023-12-23 06:37:50,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1000906.6666666666, ans=0.025 2023-12-23 06:37:58,440 INFO [train.py:886] (1/4) Epoch 32, batch 2400, loss[loss=0.01178, audio_tagging_loss=0.01178, over 25000.00 frames. ], tot_loss[loss=0.01244, audio_tagging_loss=0.01244, over 4949939.41 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:38:02,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1000973.3333333334, ans=0.1 2023-12-23 06:38:02,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1000973.3333333334, ans=0.1 2023-12-23 06:38:08,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1001040.0, ans=0.125 2023-12-23 06:38:09,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1001040.0, ans=0.125 2023-12-23 06:38:19,331 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.045e+01 3.328e+01 3.460e+01 3.635e+01 4.169e+01, threshold=6.920e+01, percent-clipped=0.0 2023-12-23 06:38:20,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1001106.6666666666, ans=0.0 2023-12-23 06:38:37,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1001173.3333333334, ans=0.125 2023-12-23 06:38:37,571 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.66 vs. limit=10.0 2023-12-23 06:38:50,374 INFO [train.py:886] (1/4) Epoch 32, batch 2450, loss[loss=0.01291, audio_tagging_loss=0.01291, over 25000.00 frames. ], tot_loss[loss=0.01246, audio_tagging_loss=0.01246, over 4954631.78 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:39:01,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1001373.3333333334, ans=0.1 2023-12-23 06:39:28,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1001506.6666666666, ans=0.0 2023-12-23 06:39:31,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1001573.3333333334, ans=0.125 2023-12-23 06:39:35,290 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:39:35,590 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.19 vs. limit=15.0 2023-12-23 06:39:41,482 INFO [train.py:886] (1/4) Epoch 32, batch 2500, loss[loss=0.0123, audio_tagging_loss=0.0123, over 24750.00 frames. ], tot_loss[loss=0.01251, audio_tagging_loss=0.01251, over 4951158.40 frames. ], batch size: 99, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:39:43,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1001640.0, ans=0.0 2023-12-23 06:40:02,498 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.169e+01 3.349e+01 3.493e+01 3.683e+01 6.550e+01, threshold=6.987e+01, percent-clipped=0.0 2023-12-23 06:40:25,615 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.15 vs. limit=15.0 2023-12-23 06:40:26,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1001906.6666666666, ans=0.125 2023-12-23 06:40:26,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1001906.6666666666, ans=0.125 2023-12-23 06:40:33,441 INFO [train.py:886] (1/4) Epoch 32, batch 2550, loss[loss=0.01233, audio_tagging_loss=0.01233, over 24034.00 frames. ], tot_loss[loss=0.01258, audio_tagging_loss=0.01258, over 4951344.31 frames. ], batch size: 100, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:40:39,803 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.31 vs. limit=15.0 2023-12-23 06:41:01,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1002106.6666666666, ans=0.2 2023-12-23 06:41:03,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1002173.3333333334, ans=0.125 2023-12-23 06:41:12,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=1002173.3333333334, ans=12.0 2023-12-23 06:41:25,728 INFO [train.py:886] (1/4) Epoch 32, batch 2600, loss[loss=0.01238, audio_tagging_loss=0.01238, over 24750.00 frames. ], tot_loss[loss=0.01248, audio_tagging_loss=0.01248, over 4948099.87 frames. ], batch size: 99, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:41:35,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1002373.3333333334, ans=0.125 2023-12-23 06:41:42,180 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.05 vs. limit=10.0 2023-12-23 06:41:45,337 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.988e+01 3.350e+01 3.516e+01 3.667e+01 4.402e+01, threshold=7.033e+01, percent-clipped=0.0 2023-12-23 06:41:52,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1002440.0, ans=0.0 2023-12-23 06:42:01,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1002506.6666666666, ans=0.1 2023-12-23 06:42:16,568 INFO [train.py:886] (1/4) Epoch 32, batch 2650, loss[loss=0.01163, audio_tagging_loss=0.01163, over 25000.00 frames. ], tot_loss[loss=0.01238, audio_tagging_loss=0.01238, over 4951882.96 frames. ], batch size: 100, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:42:41,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1002773.3333333334, ans=0.125 2023-12-23 06:42:46,775 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.03 vs. limit=12.0 2023-12-23 06:43:07,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1002906.6666666666, ans=0.125 2023-12-23 06:43:10,036 INFO [train.py:886] (1/4) Epoch 32, batch 2700, loss[loss=0.01067, audio_tagging_loss=0.01067, over 21636.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4952264.34 frames. ], batch size: 107, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:43:29,821 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.987e+01 3.300e+01 3.397e+01 3.576e+01 4.184e+01, threshold=6.794e+01, percent-clipped=0.0 2023-12-23 06:44:01,192 INFO [train.py:886] (1/4) Epoch 32, batch 2750, loss[loss=0.01359, audio_tagging_loss=0.01359, over 24750.00 frames. ], tot_loss[loss=0.01228, audio_tagging_loss=0.01228, over 4950901.30 frames. ], batch size: 99, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:44:12,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1003373.3333333334, ans=0.125 2023-12-23 06:44:34,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1003506.6666666666, ans=0.125 2023-12-23 06:44:34,638 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2023-12-23 06:44:37,483 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.35 vs. limit=15.0 2023-12-23 06:44:49,575 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.36 vs. limit=6.0 2023-12-23 06:44:53,696 INFO [train.py:886] (1/4) Epoch 32, batch 2800, loss[loss=0.01155, audio_tagging_loss=0.01155, over 24750.00 frames. ], tot_loss[loss=0.01226, audio_tagging_loss=0.01226, over 4951854.22 frames. ], batch size: 99, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:45:14,668 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.950e+01 3.316e+01 3.538e+01 3.680e+01 4.583e+01, threshold=7.076e+01, percent-clipped=0.0 2023-12-23 06:45:46,246 INFO [train.py:886] (1/4) Epoch 32, batch 2850, loss[loss=0.01433, audio_tagging_loss=0.01433, over 24750.00 frames. ], tot_loss[loss=0.01228, audio_tagging_loss=0.01228, over 4944673.71 frames. ], batch size: 99, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:45:51,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1003973.3333333334, ans=0.125 2023-12-23 06:45:52,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1003973.3333333334, ans=0.09899494936611666 2023-12-23 06:45:53,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1003973.3333333334, ans=0.125 2023-12-23 06:46:10,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1004106.6666666666, ans=0.125 2023-12-23 06:46:14,697 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.45 vs. limit=22.5 2023-12-23 06:46:18,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1004173.3333333334, ans=0.125 2023-12-23 06:46:28,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1004240.0, ans=0.1 2023-12-23 06:46:34,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1004240.0, ans=0.125 2023-12-23 06:46:36,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1004306.6666666666, ans=0.0 2023-12-23 06:46:37,742 INFO [train.py:886] (1/4) Epoch 32, batch 2900, loss[loss=0.01301, audio_tagging_loss=0.01301, over 25000.00 frames. ], tot_loss[loss=0.01237, audio_tagging_loss=0.01237, over 4943845.31 frames. ], batch size: 100, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:46:45,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1004306.6666666666, ans=0.125 2023-12-23 06:46:48,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1004373.3333333334, ans=0.0 2023-12-23 06:46:57,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1004373.3333333334, ans=0.125 2023-12-23 06:46:59,936 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.913e+01 3.284e+01 3.453e+01 3.590e+01 5.160e+01, threshold=6.905e+01, percent-clipped=0.0 2023-12-23 06:47:02,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1004440.0, ans=0.125 2023-12-23 06:47:21,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1004573.3333333334, ans=0.125 2023-12-23 06:47:26,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1004573.3333333334, ans=0.0 2023-12-23 06:47:30,612 INFO [train.py:886] (1/4) Epoch 32, batch 2950, loss[loss=0.01222, audio_tagging_loss=0.01222, over 25000.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4943988.25 frames. ], batch size: 100, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:47:42,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1004706.6666666666, ans=0.09899494936611666 2023-12-23 06:47:42,352 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.66 vs. limit=15.0 2023-12-23 06:47:45,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1004706.6666666666, ans=0.0 2023-12-23 06:48:07,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1004840.0, ans=0.0 2023-12-23 06:48:11,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1004906.6666666666, ans=0.1 2023-12-23 06:48:13,442 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:48:15,415 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:48:20,541 INFO [train.py:886] (1/4) Epoch 32, batch 3000, loss[loss=0.01077, audio_tagging_loss=0.01077, over 25000.00 frames. ], tot_loss[loss=0.01224, audio_tagging_loss=0.01224, over 4951051.56 frames. ], batch size: 100, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:48:20,542 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 06:48:33,318 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.3695, 2.0244, 3.1811, 2.2969, 3.8267, 2.7779, 1.2867, 2.2150], device='cuda:1') 2023-12-23 06:48:41,459 INFO [train.py:917] (1/4) Epoch 32, validation: loss=0.03345, audio_tagging_loss=0.03345, over 3737520.00 frames. 2023-12-23 06:48:41,459 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 06:48:42,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1004973.3333333334, ans=0.1 2023-12-23 06:48:43,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1004973.3333333334, ans=0.2 2023-12-23 06:49:01,047 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.39 vs. limit=12.0 2023-12-23 06:49:02,379 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.007e+01 3.356e+01 3.479e+01 3.651e+01 4.265e+01, threshold=6.959e+01, percent-clipped=0.0 2023-12-23 06:49:14,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1005173.3333333334, ans=0.125 2023-12-23 06:49:18,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1005173.3333333334, ans=0.0 2023-12-23 06:49:28,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1005240.0, ans=0.0 2023-12-23 06:49:29,136 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.69 vs. limit=15.0 2023-12-23 06:49:33,531 INFO [train.py:886] (1/4) Epoch 32, batch 3050, loss[loss=0.009939, audio_tagging_loss=0.009939, over 25000.00 frames. ], tot_loss[loss=0.01221, audio_tagging_loss=0.01221, over 4957122.95 frames. ], batch size: 100, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:49:37,883 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.96 vs. limit=15.0 2023-12-23 06:49:55,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1005440.0, ans=0.1 2023-12-23 06:50:10,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1005506.6666666666, ans=0.0 2023-12-23 06:50:12,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1005506.6666666666, ans=0.125 2023-12-23 06:50:22,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1005573.3333333334, ans=0.1 2023-12-23 06:50:24,659 INFO [train.py:886] (1/4) Epoch 32, batch 3100, loss[loss=0.01246, audio_tagging_loss=0.01246, over 24750.00 frames. ], tot_loss[loss=0.01229, audio_tagging_loss=0.01229, over 4958608.84 frames. ], batch size: 99, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:50:27,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1005640.0, ans=0.07 2023-12-23 06:50:36,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1005706.6666666666, ans=0.125 2023-12-23 06:50:41,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1005706.6666666666, ans=0.09899494936611666 2023-12-23 06:50:45,084 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.066e+01 3.353e+01 3.474e+01 3.650e+01 4.005e+01, threshold=6.948e+01, percent-clipped=0.0 2023-12-23 06:50:48,595 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.53 vs. limit=15.0 2023-12-23 06:50:59,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1005840.0, ans=0.2 2023-12-23 06:51:07,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1005906.6666666666, ans=0.125 2023-12-23 06:51:16,244 INFO [train.py:886] (1/4) Epoch 32, batch 3150, loss[loss=0.01289, audio_tagging_loss=0.01289, over 24750.00 frames. ], tot_loss[loss=0.01246, audio_tagging_loss=0.01246, over 4950600.85 frames. ], batch size: 99, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:51:17,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1005973.3333333334, ans=0.1 2023-12-23 06:51:51,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1006173.3333333334, ans=0.0 2023-12-23 06:51:53,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1006173.3333333334, ans=0.1 2023-12-23 06:52:09,060 INFO [train.py:886] (1/4) Epoch 32, batch 3200, loss[loss=0.01074, audio_tagging_loss=0.01074, over 24750.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4941016.05 frames. ], batch size: 99, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:52:19,983 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.92 vs. limit=15.0 2023-12-23 06:52:28,451 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.025e+01 3.282e+01 3.455e+01 3.590e+01 4.189e+01, threshold=6.910e+01, percent-clipped=0.0 2023-12-23 06:52:33,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1006440.0, ans=0.125 2023-12-23 06:52:37,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1006440.0, ans=0.0 2023-12-23 06:52:43,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1006506.6666666666, ans=0.2 2023-12-23 06:53:00,208 INFO [train.py:886] (1/4) Epoch 32, batch 3250, loss[loss=0.01476, audio_tagging_loss=0.01476, over 25000.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4938628.57 frames. ], batch size: 100, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:53:18,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1006706.6666666666, ans=0.0 2023-12-23 06:53:30,512 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.17 vs. limit=22.5 2023-12-23 06:53:32,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1006840.0, ans=0.05 2023-12-23 06:53:52,677 INFO [train.py:886] (1/4) Epoch 32, batch 3300, loss[loss=0.01199, audio_tagging_loss=0.01199, over 25000.00 frames. ], tot_loss[loss=0.01233, audio_tagging_loss=0.01233, over 4945554.64 frames. ], batch size: 100, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:54:08,143 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:54:13,385 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.804e+01 3.291e+01 3.461e+01 3.674e+01 4.146e+01, threshold=6.923e+01, percent-clipped=0.0 2023-12-23 06:54:25,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1007173.3333333334, ans=0.2 2023-12-23 06:54:44,514 INFO [train.py:886] (1/4) Epoch 32, batch 3350, loss[loss=0.01109, audio_tagging_loss=0.01109, over 24750.00 frames. ], tot_loss[loss=0.01233, audio_tagging_loss=0.01233, over 4951529.07 frames. ], batch size: 99, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:54:51,139 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.35 vs. limit=22.5 2023-12-23 06:54:57,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1007373.3333333334, ans=0.125 2023-12-23 06:55:13,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1007440.0, ans=0.2 2023-12-23 06:55:22,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1007506.6666666666, ans=0.125 2023-12-23 06:55:36,291 INFO [train.py:886] (1/4) Epoch 32, batch 3400, loss[loss=0.01103, audio_tagging_loss=0.01103, over 25000.00 frames. ], tot_loss[loss=0.01236, audio_tagging_loss=0.01236, over 4956135.77 frames. ], batch size: 100, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:55:51,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1007706.6666666666, ans=0.1 2023-12-23 06:55:57,394 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.946e+01 3.386e+01 3.545e+01 3.714e+01 5.112e+01, threshold=7.090e+01, percent-clipped=0.0 2023-12-23 06:55:57,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1007773.3333333334, ans=0.125 2023-12-23 06:55:58,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1007773.3333333334, ans=0.0 2023-12-23 06:56:11,955 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.36 vs. limit=15.0 2023-12-23 06:56:28,826 INFO [train.py:886] (1/4) Epoch 32, batch 3450, loss[loss=0.01158, audio_tagging_loss=0.01158, over 24101.00 frames. ], tot_loss[loss=0.01251, audio_tagging_loss=0.01251, over 4950583.06 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 64.0 2023-12-23 06:56:33,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1007973.3333333334, ans=0.125 2023-12-23 06:56:43,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1008040.0, ans=15.0 2023-12-23 06:56:44,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1008040.0, ans=0.125 2023-12-23 06:56:49,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1008106.6666666666, ans=0.2 2023-12-23 06:57:04,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1008173.3333333334, ans=0.125 2023-12-23 06:57:08,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1008173.3333333334, ans=0.07 2023-12-23 06:57:14,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1008240.0, ans=0.0 2023-12-23 06:57:20,536 INFO [train.py:886] (1/4) Epoch 32, batch 3500, loss[loss=0.0146, audio_tagging_loss=0.0146, over 24750.00 frames. ], tot_loss[loss=0.01256, audio_tagging_loss=0.01256, over 4941024.98 frames. ], batch size: 99, lr: 3.35e-03, grad_scale: 64.0 2023-12-23 06:57:24,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1008306.6666666666, ans=0.1 2023-12-23 06:57:39,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1008373.3333333334, ans=0.125 2023-12-23 06:57:42,418 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.074e+01 3.340e+01 3.505e+01 3.678e+01 3.978e+01, threshold=7.011e+01, percent-clipped=0.0 2023-12-23 06:57:52,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1008506.6666666666, ans=0.125 2023-12-23 06:58:12,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=1008640.0, ans=15.0 2023-12-23 06:58:12,683 INFO [train.py:886] (1/4) Epoch 32, batch 3550, loss[loss=0.01242, audio_tagging_loss=0.01242, over 24750.00 frames. ], tot_loss[loss=0.01248, audio_tagging_loss=0.01248, over 4944268.94 frames. ], batch size: 99, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 06:58:15,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1008640.0, ans=0.2 2023-12-23 06:58:32,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1008706.6666666666, ans=0.05 2023-12-23 06:58:50,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1008840.0, ans=0.2 2023-12-23 06:58:54,848 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.10 vs. limit=15.0 2023-12-23 06:59:01,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1008906.6666666666, ans=0.125 2023-12-23 06:59:05,019 INFO [train.py:886] (1/4) Epoch 32, batch 3600, loss[loss=0.01249, audio_tagging_loss=0.01249, over 24750.00 frames. ], tot_loss[loss=0.01235, audio_tagging_loss=0.01235, over 4951852.09 frames. ], batch size: 99, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 06:59:16,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1009040.0, ans=0.0 2023-12-23 06:59:18,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1009040.0, ans=0.0 2023-12-23 06:59:18,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1009040.0, ans=0.0 2023-12-23 06:59:25,498 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.038e+01 3.316e+01 3.432e+01 3.603e+01 4.151e+01, threshold=6.864e+01, percent-clipped=0.0 2023-12-23 06:59:31,972 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.57 vs. limit=12.0 2023-12-23 06:59:48,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1009240.0, ans=0.125 2023-12-23 06:59:55,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.91 vs. limit=10.0 2023-12-23 06:59:55,910 INFO [train.py:886] (1/4) Epoch 32, batch 3650, loss[loss=0.01422, audio_tagging_loss=0.01422, over 25000.00 frames. ], tot_loss[loss=0.01227, audio_tagging_loss=0.01227, over 4952557.84 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:00:00,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1009306.6666666666, ans=0.1 2023-12-23 07:00:05,276 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.94 vs. limit=15.0 2023-12-23 07:00:16,366 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.255e-03 2023-12-23 07:00:19,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1009440.0, ans=0.125 2023-12-23 07:00:26,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1009506.6666666666, ans=0.125 2023-12-23 07:00:36,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1009506.6666666666, ans=0.015 2023-12-23 07:00:40,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1009573.3333333334, ans=0.0 2023-12-23 07:00:48,394 INFO [train.py:886] (1/4) Epoch 32, batch 3700, loss[loss=0.01219, audio_tagging_loss=0.01219, over 24750.00 frames. ], tot_loss[loss=0.01239, audio_tagging_loss=0.01239, over 4958310.22 frames. ], batch size: 99, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:00:48,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1009640.0, ans=0.125 2023-12-23 07:01:10,309 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.017e+01 3.377e+01 3.513e+01 3.629e+01 4.104e+01, threshold=7.025e+01, percent-clipped=0.0 2023-12-23 07:01:24,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1009840.0, ans=0.125 2023-12-23 07:01:31,464 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.43 vs. limit=22.5 2023-12-23 07:01:39,867 INFO [train.py:886] (1/4) Epoch 32, batch 3750, loss[loss=0.0119, audio_tagging_loss=0.0119, over 24750.00 frames. ], tot_loss[loss=0.01244, audio_tagging_loss=0.01244, over 4954915.02 frames. ], batch size: 99, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:01:44,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1009973.3333333334, ans=0.2 2023-12-23 07:02:31,704 INFO [train.py:886] (1/4) Epoch 32, batch 3800, loss[loss=0.01523, audio_tagging_loss=0.01523, over 25000.00 frames. ], tot_loss[loss=0.01248, audio_tagging_loss=0.01248, over 4948355.61 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:02:40,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1010306.6666666666, ans=0.1 2023-12-23 07:02:48,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1010373.3333333334, ans=0.0 2023-12-23 07:02:49,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1010373.3333333334, ans=0.1 2023-12-23 07:02:53,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1010440.0, ans=0.125 2023-12-23 07:02:54,398 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.001e+01 3.373e+01 3.530e+01 3.741e+01 4.683e+01, threshold=7.061e+01, percent-clipped=0.0 2023-12-23 07:03:04,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1010506.6666666666, ans=0.125 2023-12-23 07:03:04,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1010506.6666666666, ans=0.125 2023-12-23 07:03:12,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1010506.6666666666, ans=0.0 2023-12-23 07:03:24,695 INFO [train.py:886] (1/4) Epoch 32, batch 3850, loss[loss=0.01304, audio_tagging_loss=0.01304, over 24750.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4951036.96 frames. ], batch size: 99, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:03:24,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1010640.0, ans=0.125 2023-12-23 07:03:32,899 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.40 vs. limit=15.0 2023-12-23 07:03:38,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1010706.6666666666, ans=0.125 2023-12-23 07:03:40,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1010706.6666666666, ans=0.0 2023-12-23 07:03:45,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1010773.3333333334, ans=0.0 2023-12-23 07:04:10,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1010906.6666666666, ans=0.1 2023-12-23 07:04:11,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1010906.6666666666, ans=0.0 2023-12-23 07:04:15,176 INFO [train.py:886] (1/4) Epoch 32, batch 3900, loss[loss=0.01186, audio_tagging_loss=0.01186, over 25000.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4950674.24 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:04:20,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1010973.3333333334, ans=0.0 2023-12-23 07:04:23,818 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.26 vs. limit=15.0 2023-12-23 07:04:28,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1011040.0, ans=0.1 2023-12-23 07:04:31,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1011040.0, ans=0.125 2023-12-23 07:04:34,269 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.95 vs. limit=15.0 2023-12-23 07:04:36,351 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.088e+01 3.332e+01 3.481e+01 3.670e+01 4.273e+01, threshold=6.961e+01, percent-clipped=0.0 2023-12-23 07:04:49,931 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.30 vs. limit=22.5 2023-12-23 07:05:04,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=1011240.0, ans=0.1 2023-12-23 07:05:06,143 INFO [train.py:886] (1/4) Epoch 32, batch 3950, loss[loss=0.0128, audio_tagging_loss=0.0128, over 25000.00 frames. ], tot_loss[loss=0.01229, audio_tagging_loss=0.01229, over 4954765.62 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:05:52,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1011573.3333333334, ans=0.1 2023-12-23 07:05:58,515 INFO [train.py:886] (1/4) Epoch 32, batch 4000, loss[loss=0.01162, audio_tagging_loss=0.01162, over 25000.00 frames. ], tot_loss[loss=0.01227, audio_tagging_loss=0.01227, over 4958203.92 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:06:04,306 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.14 vs. limit=15.0 2023-12-23 07:06:11,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1011706.6666666666, ans=0.125 2023-12-23 07:06:19,224 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.094e+01 3.350e+01 3.494e+01 3.630e+01 4.145e+01, threshold=6.988e+01, percent-clipped=0.0 2023-12-23 07:06:32,073 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.54 vs. limit=22.5 2023-12-23 07:06:41,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1011906.6666666666, ans=0.0 2023-12-23 07:06:49,336 INFO [train.py:886] (1/4) Epoch 32, batch 4050, loss[loss=0.01256, audio_tagging_loss=0.01256, over 25000.00 frames. ], tot_loss[loss=0.01236, audio_tagging_loss=0.01236, over 4956598.05 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:07:14,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1012106.6666666666, ans=0.0 2023-12-23 07:07:30,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.33 vs. limit=15.0 2023-12-23 07:07:34,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1012240.0, ans=0.0 2023-12-23 07:07:41,202 INFO [train.py:886] (1/4) Epoch 32, batch 4100, loss[loss=0.01188, audio_tagging_loss=0.01188, over 24750.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4951881.39 frames. ], batch size: 99, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:07:59,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.57 vs. limit=15.0 2023-12-23 07:08:02,287 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.038e+01 3.388e+01 3.515e+01 3.658e+01 4.060e+01, threshold=7.030e+01, percent-clipped=0.0 2023-12-23 07:08:20,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1012506.6666666666, ans=0.0 2023-12-23 07:08:23,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1012573.3333333334, ans=0.125 2023-12-23 07:08:24,248 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.17 vs. limit=22.5 2023-12-23 07:08:32,033 INFO [train.py:886] (1/4) Epoch 32, batch 4150, loss[loss=0.01193, audio_tagging_loss=0.01193, over 25000.00 frames. ], tot_loss[loss=0.01238, audio_tagging_loss=0.01238, over 4949953.29 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:08:37,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1012640.0, ans=0.0 2023-12-23 07:08:41,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1012640.0, ans=0.0 2023-12-23 07:09:04,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1012840.0, ans=0.2 2023-12-23 07:09:05,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.95 vs. limit=10.0 2023-12-23 07:09:07,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1012840.0, ans=0.0 2023-12-23 07:09:17,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1012906.6666666666, ans=0.0 2023-12-23 07:09:19,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1012906.6666666666, ans=0.125 2023-12-23 07:09:24,408 INFO [train.py:886] (1/4) Epoch 32, batch 4200, loss[loss=0.01276, audio_tagging_loss=0.01276, over 25000.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4953102.17 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:09:24,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1012973.3333333334, ans=0.1 2023-12-23 07:09:47,514 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.997e+01 3.331e+01 3.526e+01 3.677e+01 4.548e+01, threshold=7.052e+01, percent-clipped=0.0 2023-12-23 07:09:51,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1013106.6666666666, ans=0.1 2023-12-23 07:09:55,772 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.52 vs. limit=6.0 2023-12-23 07:10:00,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=1013173.3333333334, ans=0.025 2023-12-23 07:10:03,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1013173.3333333334, ans=0.125 2023-12-23 07:10:17,395 INFO [train.py:886] (1/4) Epoch 32, batch 4250, loss[loss=0.009352, audio_tagging_loss=0.009352, over 25000.00 frames. ], tot_loss[loss=0.01232, audio_tagging_loss=0.01232, over 4951142.56 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:10:25,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1013306.6666666666, ans=0.1 2023-12-23 07:10:29,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1013373.3333333334, ans=0.0 2023-12-23 07:10:33,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1013373.3333333334, ans=0.09899494936611666 2023-12-23 07:10:36,526 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.94 vs. limit=15.0 2023-12-23 07:10:39,286 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:10:47,412 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:10:53,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1013506.6666666666, ans=0.125 2023-12-23 07:10:56,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1013506.6666666666, ans=0.0 2023-12-23 07:10:57,538 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.20 vs. limit=15.0 2023-12-23 07:11:10,489 INFO [train.py:886] (1/4) Epoch 32, batch 4300, loss[loss=0.01339, audio_tagging_loss=0.01339, over 25000.00 frames. ], tot_loss[loss=0.01229, audio_tagging_loss=0.01229, over 4954697.25 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:11:25,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=1013706.6666666666, ans=0.05 2023-12-23 07:11:32,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1013773.3333333334, ans=0.0 2023-12-23 07:11:33,502 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.071e+01 3.371e+01 3.452e+01 3.602e+01 4.385e+01, threshold=6.904e+01, percent-clipped=0.0 2023-12-23 07:11:39,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1013773.3333333334, ans=0.0 2023-12-23 07:11:43,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1013840.0, ans=0.2 2023-12-23 07:11:54,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1013906.6666666666, ans=0.125 2023-12-23 07:12:03,712 INFO [train.py:886] (1/4) Epoch 32, batch 4350, loss[loss=0.01254, audio_tagging_loss=0.01254, over 24750.00 frames. ], tot_loss[loss=0.01235, audio_tagging_loss=0.01235, over 4955217.91 frames. ], batch size: 99, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:12:06,104 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.10 vs. limit=15.0 2023-12-23 07:12:27,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1014106.6666666666, ans=0.2 2023-12-23 07:12:39,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1014173.3333333334, ans=0.1 2023-12-23 07:12:41,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1014173.3333333334, ans=0.2 2023-12-23 07:12:45,706 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.02 vs. limit=15.0 2023-12-23 07:12:55,255 INFO [train.py:886] (1/4) Epoch 32, batch 4400, loss[loss=0.01428, audio_tagging_loss=0.01428, over 24947.00 frames. ], tot_loss[loss=0.01252, audio_tagging_loss=0.01252, over 4948296.49 frames. ], batch size: 100, lr: 3.34e-03, grad_scale: 32.0 2023-12-23 07:13:00,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1014306.6666666666, ans=0.1 2023-12-23 07:13:06,384 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.81 vs. limit=15.0 2023-12-23 07:13:16,352 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.989e+01 3.370e+01 3.596e+01 3.730e+01 4.489e+01, threshold=7.192e+01, percent-clipped=0.0 2023-12-23 07:13:19,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1014440.0, ans=0.0 2023-12-23 07:13:28,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1014506.6666666666, ans=0.125 2023-12-23 07:13:29,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1014506.6666666666, ans=0.2 2023-12-23 07:13:40,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1014573.3333333334, ans=0.125 2023-12-23 07:13:43,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.25 vs. limit=22.5 2023-12-23 07:13:46,497 INFO [train.py:886] (1/4) Epoch 32, batch 4450, loss[loss=0.01038, audio_tagging_loss=0.01038, over 25000.00 frames. ], tot_loss[loss=0.01247, audio_tagging_loss=0.01247, over 4945975.49 frames. ], batch size: 100, lr: 3.34e-03, grad_scale: 32.0 2023-12-23 07:13:55,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1014640.0, ans=0.125 2023-12-23 07:14:18,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1014840.0, ans=0.1 2023-12-23 07:14:19,080 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.53 vs. limit=10.0 2023-12-23 07:14:22,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1014840.0, ans=0.0 2023-12-23 07:14:23,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1014840.0, ans=0.125 2023-12-23 07:14:28,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1014906.6666666666, ans=0.1 2023-12-23 07:14:32,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1014906.6666666666, ans=0.0 2023-12-23 07:14:33,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1014906.6666666666, ans=0.1 2023-12-23 07:14:34,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1014906.6666666666, ans=0.125 2023-12-23 07:14:37,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1014906.6666666666, ans=0.125 2023-12-23 07:14:38,863 INFO [train.py:886] (1/4) Epoch 32, batch 4500, loss[loss=0.01324, audio_tagging_loss=0.01324, over 25000.00 frames. ], tot_loss[loss=0.01245, audio_tagging_loss=0.01245, over 4946167.76 frames. ], batch size: 100, lr: 3.34e-03, grad_scale: 32.0 2023-12-23 07:14:46,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1014973.3333333334, ans=0.125 2023-12-23 07:14:51,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=1015040.0, ans=0.2 2023-12-23 07:15:00,271 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.070e+01 3.361e+01 3.496e+01 3.729e+01 4.409e+01, threshold=6.991e+01, percent-clipped=0.0 2023-12-23 07:15:00,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1015106.6666666666, ans=0.0 2023-12-23 07:15:02,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1015106.6666666666, ans=0.09899494936611666 2023-12-23 07:15:03,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1015106.6666666666, ans=0.125 2023-12-23 07:15:09,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1015173.3333333334, ans=0.0 2023-12-23 07:15:09,978 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.85 vs. limit=15.0 2023-12-23 07:15:16,942 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.90 vs. limit=15.0 2023-12-23 07:15:24,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1015240.0, ans=0.0 2023-12-23 07:15:30,780 INFO [train.py:886] (1/4) Epoch 32, batch 4550, loss[loss=0.009522, audio_tagging_loss=0.009522, over 25000.00 frames. ], tot_loss[loss=0.01236, audio_tagging_loss=0.01236, over 4949740.62 frames. ], batch size: 100, lr: 3.34e-03, grad_scale: 32.0 2023-12-23 07:15:33,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1015306.6666666666, ans=0.1 2023-12-23 07:15:40,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1015373.3333333334, ans=0.0 2023-12-23 07:16:15,554 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:16:18,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1015573.3333333334, ans=0.125 2023-12-23 07:16:23,690 INFO [train.py:886] (1/4) Epoch 32, batch 4600, loss[loss=0.01507, audio_tagging_loss=0.01507, over 25000.00 frames. ], tot_loss[loss=0.01233, audio_tagging_loss=0.01233, over 4954105.87 frames. ], batch size: 100, lr: 3.34e-03, grad_scale: 32.0 2023-12-23 07:16:32,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1015706.6666666666, ans=0.2 2023-12-23 07:16:32,785 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.06 vs. limit=15.0 2023-12-23 07:16:45,513 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.990e+01 3.339e+01 3.480e+01 3.659e+01 4.110e+01, threshold=6.960e+01, percent-clipped=0.0 2023-12-23 07:16:56,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1015840.0, ans=0.025 2023-12-23 07:17:04,313 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:17:15,912 INFO [train.py:886] (1/4) Epoch 32, batch 4650, loss[loss=0.01189, audio_tagging_loss=0.01189, over 25000.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4957456.45 frames. ], batch size: 100, lr: 3.34e-03, grad_scale: 32.0 2023-12-23 07:17:18,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1015973.3333333334, ans=0.035 2023-12-23 07:17:22,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1015973.3333333334, ans=0.025 2023-12-23 07:17:38,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1016106.6666666666, ans=0.125 2023-12-23 07:17:44,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1016106.6666666666, ans=0.1 2023-12-23 07:17:47,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1016173.3333333334, ans=0.0 2023-12-23 07:17:48,646 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.93 vs. limit=22.5 2023-12-23 07:17:49,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1016173.3333333334, ans=0.2 2023-12-23 07:17:50,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1016173.3333333334, ans=0.1 2023-12-23 07:18:04,198 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.52 vs. limit=10.0 2023-12-23 07:18:06,524 INFO [train.py:886] (1/4) Epoch 32, batch 4700, loss[loss=0.01301, audio_tagging_loss=0.01301, over 24750.00 frames. ], tot_loss[loss=0.01242, audio_tagging_loss=0.01242, over 4956333.17 frames. ], batch size: 99, lr: 3.34e-03, grad_scale: 32.0 2023-12-23 07:18:14,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1016306.6666666666, ans=0.125 2023-12-23 07:18:21,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1016373.3333333334, ans=0.0 2023-12-23 07:18:21,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1016373.3333333334, ans=0.1 2023-12-23 07:18:23,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1016373.3333333334, ans=0.2 2023-12-23 07:18:26,132 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.128e+01 3.383e+01 3.517e+01 3.637e+01 4.392e+01, threshold=7.033e+01, percent-clipped=0.0 2023-12-23 07:18:27,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1016440.0, ans=0.1 2023-12-23 07:18:32,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1016440.0, ans=0.0 2023-12-23 07:18:43,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1016573.3333333334, ans=0.125 2023-12-23 07:18:53,541 INFO [train.py:886] (1/4) Epoch 32, batch 4750, loss[loss=0.01151, audio_tagging_loss=0.01151, over 24750.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4950639.22 frames. ], batch size: 99, lr: 3.34e-03, grad_scale: 32.0 2023-12-23 07:18:53,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1016640.0, ans=0.2 2023-12-23 07:19:00,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1016640.0, ans=0.125 2023-12-23 07:19:04,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1016706.6666666666, ans=0.125 2023-12-23 07:19:25,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1016746.6666666666, ans=0.1 2023-12-23 07:19:29,350 INFO [train.py:886] (1/4) Epoch 33, batch 0, loss[loss=0.02854, audio_tagging_loss=0.02854, over 20361.00 frames. ], tot_loss[loss=0.02854, audio_tagging_loss=0.02854, over 20361.00 frames. ], batch size: 107, lr: 3.29e-03, grad_scale: 32.0 2023-12-23 07:19:29,351 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 07:19:50,856 INFO [train.py:917] (1/4) Epoch 33, validation: loss=0.03278, audio_tagging_loss=0.03278, over 3737520.00 frames. 2023-12-23 07:19:50,857 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 07:19:53,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1016746.6666666666, ans=0.035 2023-12-23 07:19:56,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1016746.6666666666, ans=0.0 2023-12-23 07:19:58,718 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:20:13,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1016880.0, ans=0.125 2023-12-23 07:20:19,393 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.52 vs. limit=15.0 2023-12-23 07:20:32,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1017013.3333333334, ans=0.2 2023-12-23 07:20:32,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1017013.3333333334, ans=0.0 2023-12-23 07:20:34,355 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.10 vs. limit=12.0 2023-12-23 07:20:41,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1017080.0, ans=0.125 2023-12-23 07:20:42,067 INFO [train.py:886] (1/4) Epoch 33, batch 50, loss[loss=0.01359, audio_tagging_loss=0.01359, over 25000.00 frames. ], tot_loss[loss=0.01912, audio_tagging_loss=0.01912, over 1118895.26 frames. ], batch size: 100, lr: 3.29e-03, grad_scale: 32.0 2023-12-23 07:20:42,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1017080.0, ans=0.2 2023-12-23 07:20:46,457 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=4.71 vs. limit=15.0 2023-12-23 07:20:46,767 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.141e+01 3.573e+01 4.216e+01 4.740e+01 9.407e+01, threshold=8.432e+01, percent-clipped=7.0 2023-12-23 07:20:47,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1017080.0, ans=0.09899494936611666 2023-12-23 07:20:49,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1017080.0, ans=0.0 2023-12-23 07:20:58,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1017146.6666666666, ans=0.125 2023-12-23 07:21:05,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1017213.3333333334, ans=0.1 2023-12-23 07:21:08,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1017213.3333333334, ans=0.125 2023-12-23 07:21:08,418 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.29 vs. limit=6.0 2023-12-23 07:21:34,673 INFO [train.py:886] (1/4) Epoch 33, batch 100, loss[loss=0.01561, audio_tagging_loss=0.01561, over 25000.00 frames. ], tot_loss[loss=0.01667, audio_tagging_loss=0.01667, over 1971521.01 frames. ], batch size: 100, lr: 3.29e-03, grad_scale: 32.0 2023-12-23 07:21:38,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1017413.3333333334, ans=0.125 2023-12-23 07:21:40,272 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.44 vs. limit=6.0 2023-12-23 07:21:45,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1017480.0, ans=0.125 2023-12-23 07:21:45,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1017480.0, ans=0.125 2023-12-23 07:21:47,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1017480.0, ans=0.125 2023-12-23 07:21:48,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1017480.0, ans=0.07 2023-12-23 07:21:48,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1017480.0, ans=0.125 2023-12-23 07:21:59,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1017546.6666666666, ans=0.125 2023-12-23 07:22:03,127 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.69 vs. limit=15.0 2023-12-23 07:22:11,214 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.032e-02 2023-12-23 07:22:15,132 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.03 vs. limit=22.5 2023-12-23 07:22:18,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1017680.0, ans=0.2 2023-12-23 07:22:24,861 INFO [train.py:886] (1/4) Epoch 33, batch 150, loss[loss=0.01209, audio_tagging_loss=0.01209, over 25000.00 frames. ], tot_loss[loss=0.0152, audio_tagging_loss=0.0152, over 2637626.85 frames. ], batch size: 100, lr: 3.29e-03, grad_scale: 32.0 2023-12-23 07:22:30,334 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.203e+01 3.570e+01 3.774e+01 4.009e+01 4.712e+01, threshold=7.548e+01, percent-clipped=0.0 2023-12-23 07:22:32,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1017746.6666666666, ans=0.125 2023-12-23 07:22:36,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1017813.3333333334, ans=0.0 2023-12-23 07:22:41,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1017813.3333333334, ans=0.125 2023-12-23 07:23:05,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1018013.3333333334, ans=0.1 2023-12-23 07:23:10,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1018013.3333333334, ans=0.125 2023-12-23 07:23:11,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1018013.3333333334, ans=0.125 2023-12-23 07:23:16,433 INFO [train.py:886] (1/4) Epoch 33, batch 200, loss[loss=0.01307, audio_tagging_loss=0.01307, over 24027.00 frames. ], tot_loss[loss=0.01426, audio_tagging_loss=0.01426, over 3149755.98 frames. ], batch size: 100, lr: 3.29e-03, grad_scale: 32.0 2023-12-23 07:23:25,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1018146.6666666666, ans=0.0 2023-12-23 07:23:30,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1018146.6666666666, ans=0.125 2023-12-23 07:23:55,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1018346.6666666666, ans=0.2 2023-12-23 07:23:57,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1018346.6666666666, ans=0.125 2023-12-23 07:24:06,999 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.33 vs. limit=15.0 2023-12-23 07:24:07,460 INFO [train.py:886] (1/4) Epoch 33, batch 250, loss[loss=0.0134, audio_tagging_loss=0.0134, over 25000.00 frames. ], tot_loss[loss=0.01364, audio_tagging_loss=0.01364, over 3548175.40 frames. ], batch size: 100, lr: 3.29e-03, grad_scale: 32.0 2023-12-23 07:24:07,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1018413.3333333334, ans=0.1 2023-12-23 07:24:12,233 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.107e+01 3.419e+01 3.522e+01 3.696e+01 4.416e+01, threshold=7.043e+01, percent-clipped=0.0 2023-12-23 07:24:12,771 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.00 vs. limit=15.0 2023-12-23 07:24:31,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1018546.6666666666, ans=0.2 2023-12-23 07:24:34,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1018546.6666666666, ans=0.1 2023-12-23 07:24:44,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1018613.3333333334, ans=0.1 2023-12-23 07:24:54,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1018680.0, ans=0.125 2023-12-23 07:24:58,550 INFO [train.py:886] (1/4) Epoch 33, batch 300, loss[loss=0.01447, audio_tagging_loss=0.01447, over 24750.00 frames. ], tot_loss[loss=0.01338, audio_tagging_loss=0.01338, over 3853087.43 frames. ], batch size: 99, lr: 3.29e-03, grad_scale: 32.0 2023-12-23 07:25:03,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1018746.6666666666, ans=0.05 2023-12-23 07:25:51,269 INFO [train.py:886] (1/4) Epoch 33, batch 350, loss[loss=0.01262, audio_tagging_loss=0.01262, over 24750.00 frames. ], tot_loss[loss=0.01324, audio_tagging_loss=0.01324, over 4091706.09 frames. ], batch size: 99, lr: 3.28e-03, grad_scale: 32.0 2023-12-23 07:25:56,051 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.021e+01 3.335e+01 3.546e+01 3.715e+01 4.310e+01, threshold=7.092e+01, percent-clipped=0.0 2023-12-23 07:26:08,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1019146.6666666666, ans=0.1 2023-12-23 07:26:25,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1019280.0, ans=0.025 2023-12-23 07:26:33,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1019346.6666666666, ans=0.1 2023-12-23 07:26:42,793 INFO [train.py:886] (1/4) Epoch 33, batch 400, loss[loss=0.01104, audio_tagging_loss=0.01104, over 25000.00 frames. ], tot_loss[loss=0.01299, audio_tagging_loss=0.01299, over 4284581.34 frames. ], batch size: 100, lr: 3.28e-03, grad_scale: 32.0 2023-12-23 07:26:45,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1019413.3333333334, ans=0.125 2023-12-23 07:26:52,464 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.11 vs. limit=15.0 2023-12-23 07:26:55,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1019480.0, ans=0.0 2023-12-23 07:27:03,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1019546.6666666666, ans=0.0 2023-12-23 07:27:03,567 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.31 vs. limit=10.0 2023-12-23 07:27:09,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1019546.6666666666, ans=0.04949747468305833 2023-12-23 07:27:11,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1019546.6666666666, ans=0.2 2023-12-23 07:27:11,604 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.262e-03 2023-12-23 07:27:26,194 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.71 vs. limit=15.0 2023-12-23 07:27:29,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1019680.0, ans=0.0 2023-12-23 07:27:34,269 INFO [train.py:886] (1/4) Epoch 33, batch 450, loss[loss=0.01275, audio_tagging_loss=0.01275, over 25000.00 frames. ], tot_loss[loss=0.0127, audio_tagging_loss=0.0127, over 4429011.57 frames. ], batch size: 100, lr: 3.28e-03, grad_scale: 32.0 2023-12-23 07:27:38,961 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.932e+01 3.272e+01 3.469e+01 3.632e+01 4.131e+01, threshold=6.938e+01, percent-clipped=0.0 2023-12-23 07:27:39,731 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.49 vs. limit=15.0 2023-12-23 07:27:43,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1019813.3333333334, ans=0.125 2023-12-23 07:27:46,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1019813.3333333334, ans=0.1 2023-12-23 07:27:52,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1019813.3333333334, ans=0.125 2023-12-23 07:28:11,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.99 vs. limit=15.0 2023-12-23 07:28:23,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1020013.3333333334, ans=0.125 2023-12-23 07:28:27,369 INFO [train.py:886] (1/4) Epoch 33, batch 500, loss[loss=0.01309, audio_tagging_loss=0.01309, over 25000.00 frames. ], tot_loss[loss=0.01256, audio_tagging_loss=0.01256, over 4543500.99 frames. ], batch size: 100, lr: 3.28e-03, grad_scale: 32.0 2023-12-23 07:28:27,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1020080.0, ans=0.125 2023-12-23 07:28:27,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1020080.0, ans=0.125 2023-12-23 07:28:30,246 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:28:32,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1020080.0, ans=0.0 2023-12-23 07:28:39,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1020146.6666666666, ans=0.0 2023-12-23 07:28:44,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1020146.6666666666, ans=0.1 2023-12-23 07:29:10,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1020346.6666666666, ans=0.125 2023-12-23 07:29:11,381 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.31 vs. limit=22.5 2023-12-23 07:29:18,733 INFO [train.py:886] (1/4) Epoch 33, batch 550, loss[loss=0.01717, audio_tagging_loss=0.01717, over 24750.00 frames. ], tot_loss[loss=0.01248, audio_tagging_loss=0.01248, over 4636501.73 frames. ], batch size: 99, lr: 3.28e-03, grad_scale: 32.0 2023-12-23 07:29:23,355 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.066e+01 3.338e+01 3.495e+01 3.646e+01 4.151e+01, threshold=6.991e+01, percent-clipped=0.0 2023-12-23 07:29:26,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1020413.3333333334, ans=0.2 2023-12-23 07:29:30,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1020480.0, ans=0.1 2023-12-23 07:29:40,299 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.71 vs. limit=12.0 2023-12-23 07:30:02,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1020680.0, ans=0.2 2023-12-23 07:30:07,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1020680.0, ans=0.0 2023-12-23 07:30:11,211 INFO [train.py:886] (1/4) Epoch 33, batch 600, loss[loss=0.01139, audio_tagging_loss=0.01139, over 24750.00 frames. ], tot_loss[loss=0.01256, audio_tagging_loss=0.01256, over 4703710.99 frames. ], batch size: 99, lr: 3.28e-03, grad_scale: 32.0 2023-12-23 07:30:12,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.25 vs. limit=22.5 2023-12-23 07:30:30,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1020880.0, ans=0.125 2023-12-23 07:30:44,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1020946.6666666666, ans=0.025 2023-12-23 07:30:50,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1021013.3333333334, ans=0.125 2023-12-23 07:30:52,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1021013.3333333334, ans=0.0 2023-12-23 07:31:01,908 INFO [train.py:886] (1/4) Epoch 33, batch 650, loss[loss=0.01282, audio_tagging_loss=0.01282, over 23957.00 frames. ], tot_loss[loss=0.01258, audio_tagging_loss=0.01258, over 4754250.65 frames. ], batch size: 100, lr: 3.28e-03, grad_scale: 32.0 2023-12-23 07:31:04,195 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.24 vs. limit=15.0 2023-12-23 07:31:07,458 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.031e+01 3.397e+01 3.533e+01 3.690e+01 3.984e+01, threshold=7.067e+01, percent-clipped=0.0 2023-12-23 07:31:21,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1021213.3333333334, ans=0.125 2023-12-23 07:31:22,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1021213.3333333334, ans=0.1 2023-12-23 07:31:30,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1021213.3333333334, ans=0.2 2023-12-23 07:31:34,503 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:31:54,009 INFO [train.py:886] (1/4) Epoch 33, batch 700, loss[loss=0.009728, audio_tagging_loss=0.009728, over 24038.00 frames. ], tot_loss[loss=0.01254, audio_tagging_loss=0.01254, over 4792543.90 frames. ], batch size: 100, lr: 3.28e-03, grad_scale: 32.0 2023-12-23 07:32:04,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1021480.0, ans=0.0 2023-12-23 07:32:07,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1021480.0, ans=0.05 2023-12-23 07:32:09,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1021480.0, ans=0.125 2023-12-23 07:32:15,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1021546.6666666666, ans=0.125 2023-12-23 07:32:29,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1021613.3333333334, ans=0.0 2023-12-23 07:32:43,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1021680.0, ans=0.125 2023-12-23 07:32:46,957 INFO [train.py:886] (1/4) Epoch 33, batch 750, loss[loss=0.01265, audio_tagging_loss=0.01265, over 25000.00 frames. ], tot_loss[loss=0.0124, audio_tagging_loss=0.0124, over 4830654.08 frames. ], batch size: 100, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:32:51,661 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.077e+01 3.365e+01 3.500e+01 3.684e+01 4.096e+01, threshold=7.001e+01, percent-clipped=0.0 2023-12-23 07:33:06,479 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.27 vs. limit=22.5 2023-12-23 07:33:10,549 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.99 vs. limit=6.0 2023-12-23 07:33:10,809 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.18 vs. limit=22.5 2023-12-23 07:33:24,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1021946.6666666666, ans=10.0 2023-12-23 07:33:26,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1022013.3333333334, ans=0.125 2023-12-23 07:33:37,153 INFO [train.py:886] (1/4) Epoch 33, batch 800, loss[loss=0.0115, audio_tagging_loss=0.0115, over 25000.00 frames. ], tot_loss[loss=0.01232, audio_tagging_loss=0.01232, over 4853315.84 frames. ], batch size: 100, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:33:44,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1022080.0, ans=0.0 2023-12-23 07:34:22,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1022346.6666666666, ans=0.125 2023-12-23 07:34:27,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1022346.6666666666, ans=0.09899494936611666 2023-12-23 07:34:30,568 INFO [train.py:886] (1/4) Epoch 33, batch 850, loss[loss=0.0129, audio_tagging_loss=0.0129, over 25000.00 frames. ], tot_loss[loss=0.01228, audio_tagging_loss=0.01228, over 4881039.77 frames. ], batch size: 100, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:34:30,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1022413.3333333334, ans=0.09899494936611666 2023-12-23 07:34:35,229 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.933e+01 3.306e+01 3.423e+01 3.606e+01 5.967e+01, threshold=6.845e+01, percent-clipped=0.0 2023-12-23 07:34:35,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1022413.3333333334, ans=0.0 2023-12-23 07:34:35,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1022413.3333333334, ans=0.125 2023-12-23 07:35:09,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1022613.3333333334, ans=0.125 2023-12-23 07:35:16,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1022680.0, ans=0.0 2023-12-23 07:35:19,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1022680.0, ans=0.0 2023-12-23 07:35:21,366 INFO [train.py:886] (1/4) Epoch 33, batch 900, loss[loss=0.01126, audio_tagging_loss=0.01126, over 24750.00 frames. ], tot_loss[loss=0.01232, audio_tagging_loss=0.01232, over 4895109.19 frames. ], batch size: 99, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:35:37,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1022813.3333333334, ans=0.0 2023-12-23 07:35:48,134 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.35 vs. limit=22.5 2023-12-23 07:35:54,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1022946.6666666666, ans=0.125 2023-12-23 07:35:57,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1022946.6666666666, ans=0.125 2023-12-23 07:36:00,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1022946.6666666666, ans=0.1 2023-12-23 07:36:03,646 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.99 vs. limit=15.0 2023-12-23 07:36:10,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1023013.3333333334, ans=0.125 2023-12-23 07:36:12,623 INFO [train.py:886] (1/4) Epoch 33, batch 950, loss[loss=0.01153, audio_tagging_loss=0.01153, over 24750.00 frames. ], tot_loss[loss=0.01246, audio_tagging_loss=0.01246, over 4905083.59 frames. ], batch size: 99, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:36:17,322 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.098e+01 3.441e+01 3.582e+01 3.744e+01 4.324e+01, threshold=7.165e+01, percent-clipped=0.0 2023-12-23 07:36:17,982 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.09 vs. limit=22.5 2023-12-23 07:36:35,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.01 vs. limit=15.0 2023-12-23 07:36:43,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1023280.0, ans=0.07 2023-12-23 07:36:54,166 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.93 vs. limit=15.0 2023-12-23 07:37:01,405 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:37:04,745 INFO [train.py:886] (1/4) Epoch 33, batch 1000, loss[loss=0.01353, audio_tagging_loss=0.01353, over 22005.00 frames. ], tot_loss[loss=0.01254, audio_tagging_loss=0.01254, over 4906791.76 frames. ], batch size: 107, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:37:27,898 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.57 vs. limit=15.0 2023-12-23 07:37:39,896 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.03 vs. limit=22.5 2023-12-23 07:37:55,592 INFO [train.py:886] (1/4) Epoch 33, batch 1050, loss[loss=0.01016, audio_tagging_loss=0.01016, over 24038.00 frames. ], tot_loss[loss=0.0124, audio_tagging_loss=0.0124, over 4917737.74 frames. ], batch size: 100, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:38:00,266 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.049e+01 3.330e+01 3.500e+01 3.696e+01 4.249e+01, threshold=7.000e+01, percent-clipped=0.0 2023-12-23 07:38:03,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1023746.6666666666, ans=0.0 2023-12-23 07:38:05,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1023813.3333333334, ans=0.5 2023-12-23 07:38:17,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1023880.0, ans=0.125 2023-12-23 07:38:44,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1024013.3333333334, ans=0.0 2023-12-23 07:38:47,381 INFO [train.py:886] (1/4) Epoch 33, batch 1100, loss[loss=0.01301, audio_tagging_loss=0.01301, over 25000.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4929182.49 frames. ], batch size: 100, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:39:06,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1024213.3333333334, ans=0.125 2023-12-23 07:39:36,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1024413.3333333334, ans=0.125 2023-12-23 07:39:37,336 INFO [train.py:886] (1/4) Epoch 33, batch 1150, loss[loss=0.01077, audio_tagging_loss=0.01077, over 25000.00 frames. ], tot_loss[loss=0.01225, audio_tagging_loss=0.01225, over 4933712.93 frames. ], batch size: 100, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:39:42,736 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.007e+01 3.357e+01 3.502e+01 3.673e+01 4.162e+01, threshold=7.004e+01, percent-clipped=0.0 2023-12-23 07:39:43,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1024413.3333333334, ans=0.125 2023-12-23 07:39:58,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1024546.6666666666, ans=0.125 2023-12-23 07:40:00,029 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.10 vs. limit=12.0 2023-12-23 07:40:05,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1024546.6666666666, ans=0.1 2023-12-23 07:40:08,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=1024613.3333333334, ans=0.5 2023-12-23 07:40:16,651 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.56 vs. limit=10.0 2023-12-23 07:40:21,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1024680.0, ans=0.0 2023-12-23 07:40:28,352 INFO [train.py:886] (1/4) Epoch 33, batch 1200, loss[loss=0.01253, audio_tagging_loss=0.01253, over 24931.00 frames. ], tot_loss[loss=0.01237, audio_tagging_loss=0.01237, over 4941104.23 frames. ], batch size: 100, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:40:38,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1024813.3333333334, ans=0.2 2023-12-23 07:40:42,343 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:41:01,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1024946.6666666666, ans=0.125 2023-12-23 07:41:01,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1024946.6666666666, ans=0.09899494936611666 2023-12-23 07:41:10,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1025013.3333333334, ans=0.0 2023-12-23 07:41:20,676 INFO [train.py:886] (1/4) Epoch 33, batch 1250, loss[loss=0.01143, audio_tagging_loss=0.01143, over 24750.00 frames. ], tot_loss[loss=0.01254, audio_tagging_loss=0.01254, over 4936229.45 frames. ], batch size: 99, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:41:20,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1025080.0, ans=0.125 2023-12-23 07:41:25,269 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.073e+01 3.391e+01 3.513e+01 3.731e+01 4.516e+01, threshold=7.026e+01, percent-clipped=0.0 2023-12-23 07:41:30,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1025146.6666666666, ans=0.125 2023-12-23 07:41:38,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1025146.6666666666, ans=0.0 2023-12-23 07:41:41,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1025213.3333333334, ans=0.2 2023-12-23 07:41:46,525 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:41:55,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1025280.0, ans=0.1 2023-12-23 07:42:06,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1025346.6666666666, ans=0.125 2023-12-23 07:42:12,427 INFO [train.py:886] (1/4) Epoch 33, batch 1300, loss[loss=0.01177, audio_tagging_loss=0.01177, over 24026.00 frames. ], tot_loss[loss=0.01261, audio_tagging_loss=0.01261, over 4937474.55 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:42:29,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1025480.0, ans=0.0 2023-12-23 07:42:35,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1025546.6666666666, ans=0.1 2023-12-23 07:42:44,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1025613.3333333334, ans=0.09899494936611666 2023-12-23 07:42:48,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1025613.3333333334, ans=0.125 2023-12-23 07:43:04,407 INFO [train.py:886] (1/4) Epoch 33, batch 1350, loss[loss=0.01128, audio_tagging_loss=0.01128, over 24750.00 frames. ], tot_loss[loss=0.01254, audio_tagging_loss=0.01254, over 4936802.48 frames. ], batch size: 99, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:43:09,135 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.898e+01 3.394e+01 3.555e+01 3.712e+01 4.283e+01, threshold=7.109e+01, percent-clipped=0.0 2023-12-23 07:43:21,310 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.99 vs. limit=12.0 2023-12-23 07:43:57,045 INFO [train.py:886] (1/4) Epoch 33, batch 1400, loss[loss=0.01114, audio_tagging_loss=0.01114, over 25000.00 frames. ], tot_loss[loss=0.01251, audio_tagging_loss=0.01251, over 4942563.99 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:43:59,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1026080.0, ans=0.125 2023-12-23 07:43:59,357 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.16 vs. limit=15.0 2023-12-23 07:44:03,141 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.81 vs. limit=12.0 2023-12-23 07:44:07,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1026146.6666666666, ans=0.125 2023-12-23 07:44:19,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1026213.3333333334, ans=0.0 2023-12-23 07:44:26,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1026213.3333333334, ans=0.0 2023-12-23 07:44:29,787 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.27 vs. limit=10.0 2023-12-23 07:44:30,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1026280.0, ans=0.1 2023-12-23 07:44:47,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1026413.3333333334, ans=0.125 2023-12-23 07:44:48,417 INFO [train.py:886] (1/4) Epoch 33, batch 1450, loss[loss=0.01261, audio_tagging_loss=0.01261, over 25000.00 frames. ], tot_loss[loss=0.01241, audio_tagging_loss=0.01241, over 4946513.47 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:44:53,832 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.086e+01 3.346e+01 3.476e+01 3.616e+01 4.118e+01, threshold=6.952e+01, percent-clipped=0.0 2023-12-23 07:44:55,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1026413.3333333334, ans=0.0 2023-12-23 07:44:56,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1026413.3333333334, ans=0.125 2023-12-23 07:45:17,814 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.68 vs. limit=10.0 2023-12-23 07:45:25,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1026613.3333333334, ans=0.1 2023-12-23 07:45:34,285 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.53 vs. limit=15.0 2023-12-23 07:45:40,601 INFO [train.py:886] (1/4) Epoch 33, batch 1500, loss[loss=0.01051, audio_tagging_loss=0.01051, over 25000.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4950011.73 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:45:49,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1026813.3333333334, ans=0.125 2023-12-23 07:45:59,601 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=15.0 2023-12-23 07:46:32,825 INFO [train.py:886] (1/4) Epoch 33, batch 1550, loss[loss=0.01343, audio_tagging_loss=0.01343, over 24750.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4954883.58 frames. ], batch size: 99, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:46:38,190 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.078e+01 3.419e+01 3.555e+01 3.697e+01 4.231e+01, threshold=7.109e+01, percent-clipped=0.0 2023-12-23 07:46:49,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1027146.6666666666, ans=0.125 2023-12-23 07:46:55,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1027213.3333333334, ans=0.0 2023-12-23 07:47:00,594 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:47:03,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1027280.0, ans=0.125 2023-12-23 07:47:05,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1027280.0, ans=0.125 2023-12-23 07:47:08,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1027280.0, ans=0.2 2023-12-23 07:47:10,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1027280.0, ans=0.125 2023-12-23 07:47:11,850 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.63 vs. limit=15.0 2023-12-23 07:47:14,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1027346.6666666666, ans=0.125 2023-12-23 07:47:17,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1027346.6666666666, ans=0.0 2023-12-23 07:47:20,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1027346.6666666666, ans=0.1 2023-12-23 07:47:20,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1027346.6666666666, ans=0.2 2023-12-23 07:47:23,694 INFO [train.py:886] (1/4) Epoch 33, batch 1600, loss[loss=0.01333, audio_tagging_loss=0.01333, over 24750.00 frames. ], tot_loss[loss=0.01244, audio_tagging_loss=0.01244, over 4950502.55 frames. ], batch size: 99, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:47:45,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1027546.6666666666, ans=0.125 2023-12-23 07:47:47,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1027546.6666666666, ans=0.035 2023-12-23 07:47:55,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1027613.3333333334, ans=0.125 2023-12-23 07:48:06,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1027680.0, ans=0.04949747468305833 2023-12-23 07:48:16,987 INFO [train.py:886] (1/4) Epoch 33, batch 1650, loss[loss=0.009235, audio_tagging_loss=0.009235, over 25000.00 frames. ], tot_loss[loss=0.0123, audio_tagging_loss=0.0123, over 4943848.09 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:48:19,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1027746.6666666666, ans=0.125 2023-12-23 07:48:21,581 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.029e+01 3.394e+01 3.522e+01 3.682e+01 5.123e+01, threshold=7.045e+01, percent-clipped=0.0 2023-12-23 07:48:21,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1027746.6666666666, ans=0.0 2023-12-23 07:48:38,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1027880.0, ans=0.125 2023-12-23 07:48:40,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1027880.0, ans=0.125 2023-12-23 07:48:47,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1027946.6666666666, ans=0.0 2023-12-23 07:49:05,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1028013.3333333334, ans=0.125 2023-12-23 07:49:08,126 INFO [train.py:886] (1/4) Epoch 33, batch 1700, loss[loss=0.0135, audio_tagging_loss=0.0135, over 24750.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4947107.96 frames. ], batch size: 99, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:49:18,521 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2023-12-23 07:49:38,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1028280.0, ans=0.1 2023-12-23 07:49:39,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1028280.0, ans=0.1 2023-12-23 07:49:43,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1028280.0, ans=0.2 2023-12-23 07:49:46,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1028280.0, ans=0.125 2023-12-23 07:49:48,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1028280.0, ans=0.125 2023-12-23 07:49:49,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1028346.6666666666, ans=0.125 2023-12-23 07:49:59,795 INFO [train.py:886] (1/4) Epoch 33, batch 1750, loss[loss=0.01171, audio_tagging_loss=0.01171, over 25000.00 frames. ], tot_loss[loss=0.01222, audio_tagging_loss=0.01222, over 4949351.93 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:50:00,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1028413.3333333334, ans=0.0 2023-12-23 07:50:04,544 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.030e+01 3.329e+01 3.475e+01 3.627e+01 4.397e+01, threshold=6.950e+01, percent-clipped=0.0 2023-12-23 07:50:09,672 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=15.0 2023-12-23 07:50:16,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1028480.0, ans=0.0 2023-12-23 07:50:17,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=15.0 2023-12-23 07:50:32,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1028613.3333333334, ans=0.125 2023-12-23 07:50:41,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1028680.0, ans=0.1 2023-12-23 07:50:44,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1028680.0, ans=0.125 2023-12-23 07:50:45,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1028680.0, ans=0.0 2023-12-23 07:50:46,663 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.90 vs. limit=22.5 2023-12-23 07:50:52,831 INFO [train.py:886] (1/4) Epoch 33, batch 1800, loss[loss=0.01104, audio_tagging_loss=0.01104, over 25000.00 frames. ], tot_loss[loss=0.01224, audio_tagging_loss=0.01224, over 4953885.06 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:51:18,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1028880.0, ans=0.1 2023-12-23 07:51:25,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1028946.6666666666, ans=0.0 2023-12-23 07:51:36,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1029013.3333333334, ans=0.1 2023-12-23 07:51:36,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.28 vs. limit=12.0 2023-12-23 07:51:42,088 INFO [train.py:886] (1/4) Epoch 33, batch 1850, loss[loss=0.01155, audio_tagging_loss=0.01155, over 24028.00 frames. ], tot_loss[loss=0.01238, audio_tagging_loss=0.01238, over 4958611.40 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:51:44,570 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.31 vs. limit=15.0 2023-12-23 07:51:46,822 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.019e+01 3.396e+01 3.524e+01 3.649e+01 4.087e+01, threshold=7.047e+01, percent-clipped=0.0 2023-12-23 07:51:54,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1029146.6666666666, ans=0.125 2023-12-23 07:52:04,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1029213.3333333334, ans=0.035 2023-12-23 07:52:06,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1029213.3333333334, ans=0.125 2023-12-23 07:52:16,651 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.47 vs. limit=22.5 2023-12-23 07:52:17,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1029280.0, ans=0.0 2023-12-23 07:52:26,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1029346.6666666666, ans=0.125 2023-12-23 07:52:31,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1029346.6666666666, ans=0.07 2023-12-23 07:52:35,176 INFO [train.py:886] (1/4) Epoch 33, batch 1900, loss[loss=0.01274, audio_tagging_loss=0.01274, over 24750.00 frames. ], tot_loss[loss=0.01247, audio_tagging_loss=0.01247, over 4952800.00 frames. ], batch size: 99, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:52:55,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1029546.6666666666, ans=0.025 2023-12-23 07:52:56,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1029546.6666666666, ans=0.1 2023-12-23 07:52:58,933 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=12.0 2023-12-23 07:53:03,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1029546.6666666666, ans=0.1 2023-12-23 07:53:07,882 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.43 vs. limit=15.0 2023-12-23 07:53:13,593 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.35 vs. limit=15.0 2023-12-23 07:53:14,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1029613.3333333334, ans=0.125 2023-12-23 07:53:26,790 INFO [train.py:886] (1/4) Epoch 33, batch 1950, loss[loss=0.01419, audio_tagging_loss=0.01419, over 24750.00 frames. ], tot_loss[loss=0.01245, audio_tagging_loss=0.01245, over 4951515.29 frames. ], batch size: 99, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:53:30,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1029746.6666666666, ans=0.0 2023-12-23 07:53:32,178 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.173e+01 3.392e+01 3.533e+01 3.748e+01 4.233e+01, threshold=7.067e+01, percent-clipped=0.0 2023-12-23 07:53:32,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1029746.6666666666, ans=0.125 2023-12-23 07:53:35,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=1029746.6666666666, ans=15.0 2023-12-23 07:53:35,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1029746.6666666666, ans=0.125 2023-12-23 07:53:58,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1029946.6666666666, ans=0.025 2023-12-23 07:54:10,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1030013.3333333334, ans=0.125 2023-12-23 07:54:18,567 INFO [train.py:886] (1/4) Epoch 33, batch 2000, loss[loss=0.01285, audio_tagging_loss=0.01285, over 25000.00 frames. ], tot_loss[loss=0.0123, audio_tagging_loss=0.0123, over 4952552.52 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:54:18,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1030080.0, ans=0.025 2023-12-23 07:54:36,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=1030146.6666666666, ans=0.05 2023-12-23 07:54:45,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1030213.3333333334, ans=0.07 2023-12-23 07:54:50,049 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:54:52,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1030280.0, ans=0.0 2023-12-23 07:54:55,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.64 vs. limit=15.0 2023-12-23 07:54:55,972 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.66 vs. limit=6.0 2023-12-23 07:54:59,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1030346.6666666666, ans=0.125 2023-12-23 07:55:02,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1030346.6666666666, ans=0.2 2023-12-23 07:55:10,775 INFO [train.py:886] (1/4) Epoch 33, batch 2050, loss[loss=0.01477, audio_tagging_loss=0.01477, over 24750.00 frames. ], tot_loss[loss=0.01221, audio_tagging_loss=0.01221, over 4951889.01 frames. ], batch size: 99, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:55:16,251 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.917e+01 3.317e+01 3.445e+01 3.668e+01 4.108e+01, threshold=6.890e+01, percent-clipped=0.0 2023-12-23 07:55:16,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1030413.3333333334, ans=0.125 2023-12-23 07:55:17,002 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.89 vs. limit=15.0 2023-12-23 07:55:26,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=1030480.0, ans=0.5 2023-12-23 07:55:33,761 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.66 vs. limit=15.0 2023-12-23 07:55:41,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1030613.3333333334, ans=0.0 2023-12-23 07:56:01,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1030746.6666666666, ans=0.0 2023-12-23 07:56:01,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1030746.6666666666, ans=0.125 2023-12-23 07:56:01,869 INFO [train.py:886] (1/4) Epoch 33, batch 2100, loss[loss=0.01267, audio_tagging_loss=0.01267, over 25000.00 frames. ], tot_loss[loss=0.01224, audio_tagging_loss=0.01224, over 4960974.15 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:56:02,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=1030746.6666666666, ans=0.95 2023-12-23 07:56:22,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1030813.3333333334, ans=0.1 2023-12-23 07:56:35,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1030946.6666666666, ans=0.2 2023-12-23 07:56:35,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1030946.6666666666, ans=0.125 2023-12-23 07:56:37,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1030946.6666666666, ans=0.0 2023-12-23 07:56:54,962 INFO [train.py:886] (1/4) Epoch 33, batch 2150, loss[loss=0.008522, audio_tagging_loss=0.008522, over 24032.00 frames. ], tot_loss[loss=0.01224, audio_tagging_loss=0.01224, over 4963779.85 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:56:59,623 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.872e+01 3.355e+01 3.526e+01 3.683e+01 4.612e+01, threshold=7.052e+01, percent-clipped=0.0 2023-12-23 07:57:02,871 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.23 vs. limit=15.0 2023-12-23 07:57:06,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1031146.6666666666, ans=0.125 2023-12-23 07:57:09,380 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.42 vs. limit=15.0 2023-12-23 07:57:09,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1031146.6666666666, ans=0.0 2023-12-23 07:57:18,028 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.74 vs. limit=15.0 2023-12-23 07:57:27,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1031280.0, ans=0.2 2023-12-23 07:57:32,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1031280.0, ans=0.125 2023-12-23 07:57:38,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1031346.6666666666, ans=0.09899494936611666 2023-12-23 07:57:46,398 INFO [train.py:886] (1/4) Epoch 33, batch 2200, loss[loss=0.009985, audio_tagging_loss=0.009985, over 24750.00 frames. ], tot_loss[loss=0.01235, audio_tagging_loss=0.01235, over 4955024.19 frames. ], batch size: 99, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:57:50,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1031413.3333333334, ans=0.125 2023-12-23 07:57:54,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1031413.3333333334, ans=0.0 2023-12-23 07:58:04,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1031480.0, ans=0.0 2023-12-23 07:58:20,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1031613.3333333334, ans=0.0 2023-12-23 07:58:21,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1031613.3333333334, ans=0.2 2023-12-23 07:58:32,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1031680.0, ans=0.125 2023-12-23 07:58:38,167 INFO [train.py:886] (1/4) Epoch 33, batch 2250, loss[loss=0.01301, audio_tagging_loss=0.01301, over 24750.00 frames. ], tot_loss[loss=0.01237, audio_tagging_loss=0.01237, over 4946390.15 frames. ], batch size: 99, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 07:58:42,940 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.057e+01 3.375e+01 3.511e+01 3.690e+01 4.571e+01, threshold=7.022e+01, percent-clipped=0.0 2023-12-23 07:58:59,958 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.80 vs. limit=22.5 2023-12-23 07:59:04,534 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.93 vs. limit=15.0 2023-12-23 07:59:10,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1031946.6666666666, ans=0.125 2023-12-23 07:59:10,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1031946.6666666666, ans=0.125 2023-12-23 07:59:14,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1031946.6666666666, ans=0.125 2023-12-23 07:59:25,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1032013.3333333334, ans=0.125 2023-12-23 07:59:30,936 INFO [train.py:886] (1/4) Epoch 33, batch 2300, loss[loss=0.01335, audio_tagging_loss=0.01335, over 25000.00 frames. ], tot_loss[loss=0.0123, audio_tagging_loss=0.0123, over 4945238.86 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 07:59:36,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1032080.0, ans=0.1 2023-12-23 07:59:37,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1032080.0, ans=0.025 2023-12-23 07:59:43,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1032146.6666666666, ans=0.0 2023-12-23 07:59:51,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1032213.3333333334, ans=0.1 2023-12-23 08:00:03,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1032280.0, ans=0.0 2023-12-23 08:00:07,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1032280.0, ans=0.2 2023-12-23 08:00:16,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1032346.6666666666, ans=0.04949747468305833 2023-12-23 08:00:23,005 INFO [train.py:886] (1/4) Epoch 33, batch 2350, loss[loss=0.01161, audio_tagging_loss=0.01161, over 25000.00 frames. ], tot_loss[loss=0.0123, audio_tagging_loss=0.0123, over 4953275.42 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:00:28,462 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.041e+01 3.321e+01 3.472e+01 3.677e+01 4.298e+01, threshold=6.945e+01, percent-clipped=0.0 2023-12-23 08:00:34,679 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.31 vs. limit=15.0 2023-12-23 08:00:35,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1032480.0, ans=0.0 2023-12-23 08:01:14,864 INFO [train.py:886] (1/4) Epoch 33, batch 2400, loss[loss=0.01204, audio_tagging_loss=0.01204, over 25000.00 frames. ], tot_loss[loss=0.01221, audio_tagging_loss=0.01221, over 4952059.73 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:01:16,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1032746.6666666666, ans=0.2 2023-12-23 08:01:19,608 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:01:23,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1032746.6666666666, ans=0.125 2023-12-23 08:01:29,025 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.08 vs. limit=15.0 2023-12-23 08:01:35,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1032880.0, ans=0.0 2023-12-23 08:01:43,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1032880.0, ans=0.1 2023-12-23 08:01:47,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1032946.6666666666, ans=0.0 2023-12-23 08:02:07,480 INFO [train.py:886] (1/4) Epoch 33, batch 2450, loss[loss=0.01427, audio_tagging_loss=0.01427, over 25000.00 frames. ], tot_loss[loss=0.01226, audio_tagging_loss=0.01226, over 4951756.91 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:02:12,517 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.25 vs. limit=15.0 2023-12-23 08:02:12,824 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.086e+01 3.321e+01 3.452e+01 3.648e+01 4.269e+01, threshold=6.903e+01, percent-clipped=0.0 2023-12-23 08:02:13,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1033080.0, ans=0.2 2023-12-23 08:02:14,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1033080.0, ans=0.125 2023-12-23 08:02:23,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1033146.6666666666, ans=0.04949747468305833 2023-12-23 08:02:53,186 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.05 vs. limit=15.0 2023-12-23 08:02:58,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1033413.3333333334, ans=0.2 2023-12-23 08:02:58,921 INFO [train.py:886] (1/4) Epoch 33, batch 2500, loss[loss=0.0132, audio_tagging_loss=0.0132, over 24750.00 frames. ], tot_loss[loss=0.01235, audio_tagging_loss=0.01235, over 4949009.39 frames. ], batch size: 99, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:02:59,306 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.99 vs. limit=12.0 2023-12-23 08:03:05,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1033413.3333333334, ans=0.125 2023-12-23 08:03:23,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1033546.6666666666, ans=0.1 2023-12-23 08:03:29,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1033613.3333333334, ans=0.125 2023-12-23 08:03:48,740 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=15.0 2023-12-23 08:03:51,012 INFO [train.py:886] (1/4) Epoch 33, batch 2550, loss[loss=0.01127, audio_tagging_loss=0.01127, over 24750.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4946294.90 frames. ], batch size: 99, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:03:55,658 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.147e+01 3.396e+01 3.556e+01 3.745e+01 4.206e+01, threshold=7.112e+01, percent-clipped=0.0 2023-12-23 08:03:57,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1033746.6666666666, ans=0.125 2023-12-23 08:04:03,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1033813.3333333334, ans=0.125 2023-12-23 08:04:17,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1033880.0, ans=0.5 2023-12-23 08:04:34,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1034013.3333333334, ans=0.125 2023-12-23 08:04:35,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1034013.3333333334, ans=0.125 2023-12-23 08:04:43,673 INFO [train.py:886] (1/4) Epoch 33, batch 2600, loss[loss=0.01303, audio_tagging_loss=0.01303, over 24001.00 frames. ], tot_loss[loss=0.01239, audio_tagging_loss=0.01239, over 4937141.06 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:04:43,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1034080.0, ans=0.125 2023-12-23 08:04:55,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1034146.6666666666, ans=0.0 2023-12-23 08:04:56,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1034146.6666666666, ans=0.125 2023-12-23 08:05:12,689 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.83 vs. limit=6.0 2023-12-23 08:05:15,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1034280.0, ans=0.1 2023-12-23 08:05:17,127 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.94 vs. limit=12.0 2023-12-23 08:05:34,060 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:05:34,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1034413.3333333334, ans=0.0 2023-12-23 08:05:34,823 INFO [train.py:886] (1/4) Epoch 33, batch 2650, loss[loss=0.009305, audio_tagging_loss=0.009305, over 25000.00 frames. ], tot_loss[loss=0.01233, audio_tagging_loss=0.01233, over 4937257.85 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:05:39,478 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.928e+01 3.366e+01 3.521e+01 3.683e+01 4.023e+01, threshold=7.042e+01, percent-clipped=0.0 2023-12-23 08:05:59,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1034546.6666666666, ans=0.1 2023-12-23 08:06:10,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1034613.3333333334, ans=0.0 2023-12-23 08:06:12,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1034613.3333333334, ans=0.1 2023-12-23 08:06:22,807 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.28 vs. limit=15.0 2023-12-23 08:06:27,635 INFO [train.py:886] (1/4) Epoch 33, batch 2700, loss[loss=0.01233, audio_tagging_loss=0.01233, over 25000.00 frames. ], tot_loss[loss=0.01224, audio_tagging_loss=0.01224, over 4936288.00 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:06:39,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1034813.3333333334, ans=0.1 2023-12-23 08:06:47,619 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:06:50,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1034880.0, ans=0.1 2023-12-23 08:06:51,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1034880.0, ans=0.0 2023-12-23 08:06:51,445 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.06 vs. limit=12.0 2023-12-23 08:07:13,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1035013.3333333334, ans=0.1 2023-12-23 08:07:14,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1035013.3333333334, ans=0.125 2023-12-23 08:07:17,212 INFO [train.py:886] (1/4) Epoch 33, batch 2750, loss[loss=0.01007, audio_tagging_loss=0.01007, over 25000.00 frames. ], tot_loss[loss=0.01218, audio_tagging_loss=0.01218, over 4947784.98 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 128.0 2023-12-23 08:07:23,276 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.910e+01 3.350e+01 3.483e+01 3.677e+01 4.348e+01, threshold=6.966e+01, percent-clipped=0.0 2023-12-23 08:07:27,491 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=15.0 2023-12-23 08:07:29,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1035146.6666666666, ans=0.125 2023-12-23 08:07:30,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1035146.6666666666, ans=0.125 2023-12-23 08:08:04,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1035346.6666666666, ans=0.0 2023-12-23 08:08:09,523 INFO [train.py:886] (1/4) Epoch 33, batch 2800, loss[loss=0.0129, audio_tagging_loss=0.0129, over 24750.00 frames. ], tot_loss[loss=0.0123, audio_tagging_loss=0.0123, over 4948047.37 frames. ], batch size: 99, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:08:15,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1035413.3333333334, ans=0.125 2023-12-23 08:08:23,869 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.72 vs. limit=15.0 2023-12-23 08:08:28,856 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:08:32,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1035546.6666666666, ans=0.125 2023-12-23 08:08:34,016 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.35 vs. limit=15.0 2023-12-23 08:08:56,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1035680.0, ans=0.1 2023-12-23 08:09:01,183 INFO [train.py:886] (1/4) Epoch 33, batch 2850, loss[loss=0.01341, audio_tagging_loss=0.01341, over 24750.00 frames. ], tot_loss[loss=0.01246, audio_tagging_loss=0.01246, over 4936518.94 frames. ], batch size: 99, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:09:01,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1035746.6666666666, ans=0.125 2023-12-23 08:09:07,509 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.083e+01 3.407e+01 3.538e+01 3.728e+01 5.942e+01, threshold=7.077e+01, percent-clipped=0.0 2023-12-23 08:09:08,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1035746.6666666666, ans=0.0 2023-12-23 08:09:18,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1035813.3333333334, ans=0.125 2023-12-23 08:09:20,262 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.79 vs. limit=15.0 2023-12-23 08:09:20,454 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2023-12-23 08:09:38,074 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.11 vs. limit=15.0 2023-12-23 08:09:42,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1036013.3333333334, ans=0.0 2023-12-23 08:09:52,024 INFO [train.py:886] (1/4) Epoch 33, batch 2900, loss[loss=0.01245, audio_tagging_loss=0.01245, over 24750.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4933552.97 frames. ], batch size: 99, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:10:01,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1036080.0, ans=0.2 2023-12-23 08:10:13,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1036213.3333333334, ans=0.125 2023-12-23 08:10:19,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1036213.3333333334, ans=0.125 2023-12-23 08:10:27,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1036280.0, ans=0.125 2023-12-23 08:10:45,084 INFO [train.py:886] (1/4) Epoch 33, batch 2950, loss[loss=0.008366, audio_tagging_loss=0.008366, over 25000.00 frames. ], tot_loss[loss=0.01226, audio_tagging_loss=0.01226, over 4934664.79 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:10:45,706 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.16 vs. limit=15.0 2023-12-23 08:10:47,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1036413.3333333334, ans=0.0 2023-12-23 08:10:47,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1036413.3333333334, ans=0.125 2023-12-23 08:10:50,740 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.973e+01 3.310e+01 3.454e+01 3.690e+01 4.834e+01, threshold=6.907e+01, percent-clipped=0.0 2023-12-23 08:11:15,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1036613.3333333334, ans=10.0 2023-12-23 08:11:21,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1036613.3333333334, ans=0.125 2023-12-23 08:11:25,219 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.97 vs. limit=15.0 2023-12-23 08:11:27,348 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.29 vs. limit=22.5 2023-12-23 08:11:35,886 INFO [train.py:886] (1/4) Epoch 33, batch 3000, loss[loss=0.0133, audio_tagging_loss=0.0133, over 25000.00 frames. ], tot_loss[loss=0.01223, audio_tagging_loss=0.01223, over 4942274.93 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:11:35,886 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 08:11:56,757 INFO [train.py:917] (1/4) Epoch 33, validation: loss=0.03378, audio_tagging_loss=0.03378, over 3737520.00 frames. 2023-12-23 08:11:56,757 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 08:12:01,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1036746.6666666666, ans=0.0 2023-12-23 08:12:17,356 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:12:20,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1036880.0, ans=0.0 2023-12-23 08:12:24,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1036880.0, ans=0.0 2023-12-23 08:12:38,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1037013.3333333334, ans=0.2 2023-12-23 08:12:41,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1037013.3333333334, ans=0.125 2023-12-23 08:12:49,276 INFO [train.py:886] (1/4) Epoch 33, batch 3050, loss[loss=0.0117, audio_tagging_loss=0.0117, over 25000.00 frames. ], tot_loss[loss=0.01221, audio_tagging_loss=0.01221, over 4947140.52 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:12:54,860 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.120e+01 3.381e+01 3.541e+01 3.697e+01 4.146e+01, threshold=7.081e+01, percent-clipped=0.0 2023-12-23 08:13:03,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1037146.6666666666, ans=0.5 2023-12-23 08:13:15,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1037213.3333333334, ans=0.95 2023-12-23 08:13:23,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1037280.0, ans=0.1 2023-12-23 08:13:23,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1037280.0, ans=0.0 2023-12-23 08:13:26,707 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.67 vs. limit=15.0 2023-12-23 08:13:33,724 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.52 vs. limit=12.0 2023-12-23 08:13:34,482 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2023-12-23 08:13:40,929 INFO [train.py:886] (1/4) Epoch 33, batch 3100, loss[loss=0.01311, audio_tagging_loss=0.01311, over 25000.00 frames. ], tot_loss[loss=0.01226, audio_tagging_loss=0.01226, over 4949909.23 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:13:44,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1037413.3333333334, ans=0.0 2023-12-23 08:13:54,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1037480.0, ans=0.125 2023-12-23 08:13:57,963 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.88 vs. limit=15.0 2023-12-23 08:14:20,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1037613.3333333334, ans=0.0 2023-12-23 08:14:22,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1037680.0, ans=0.125 2023-12-23 08:14:28,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1037680.0, ans=0.0 2023-12-23 08:14:32,357 INFO [train.py:886] (1/4) Epoch 33, batch 3150, loss[loss=0.01286, audio_tagging_loss=0.01286, over 25000.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4944394.21 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:14:34,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1037746.6666666666, ans=0.125 2023-12-23 08:14:38,068 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.211e+01 3.392e+01 3.546e+01 3.675e+01 4.446e+01, threshold=7.092e+01, percent-clipped=0.0 2023-12-23 08:14:39,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1037746.6666666666, ans=0.5 2023-12-23 08:14:41,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1037813.3333333334, ans=0.0 2023-12-23 08:14:48,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1037813.3333333334, ans=0.125 2023-12-23 08:14:50,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1037813.3333333334, ans=0.2 2023-12-23 08:15:01,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1037880.0, ans=0.125 2023-12-23 08:15:06,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1037946.6666666666, ans=0.1 2023-12-23 08:15:06,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1037946.6666666666, ans=0.125 2023-12-23 08:15:24,443 INFO [train.py:886] (1/4) Epoch 33, batch 3200, loss[loss=0.01141, audio_tagging_loss=0.01141, over 25000.00 frames. ], tot_loss[loss=0.01235, audio_tagging_loss=0.01235, over 4946622.35 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:15:34,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.56 vs. limit=22.5 2023-12-23 08:15:37,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1038146.6666666666, ans=0.125 2023-12-23 08:15:40,958 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.40 vs. limit=15.0 2023-12-23 08:16:15,388 INFO [train.py:886] (1/4) Epoch 33, batch 3250, loss[loss=0.01226, audio_tagging_loss=0.01226, over 25000.00 frames. ], tot_loss[loss=0.01225, audio_tagging_loss=0.01225, over 4947193.64 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:16:17,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1038413.3333333334, ans=0.07 2023-12-23 08:16:22,359 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.103e+01 3.403e+01 3.565e+01 3.733e+01 4.507e+01, threshold=7.131e+01, percent-clipped=0.0 2023-12-23 08:16:43,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1038546.6666666666, ans=0.125 2023-12-23 08:16:45,882 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:16:56,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1038680.0, ans=0.125 2023-12-23 08:17:04,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1038680.0, ans=0.125 2023-12-23 08:17:08,595 INFO [train.py:886] (1/4) Epoch 33, batch 3300, loss[loss=0.01299, audio_tagging_loss=0.01299, over 24912.00 frames. ], tot_loss[loss=0.01229, audio_tagging_loss=0.01229, over 4950531.01 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:17:09,134 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.23 vs. limit=10.0 2023-12-23 08:17:10,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1038746.6666666666, ans=0.05 2023-12-23 08:17:13,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1038746.6666666666, ans=0.0 2023-12-23 08:17:29,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1038880.0, ans=0.125 2023-12-23 08:17:41,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1038946.6666666666, ans=0.0 2023-12-23 08:17:59,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1039080.0, ans=0.125 2023-12-23 08:17:59,942 INFO [train.py:886] (1/4) Epoch 33, batch 3350, loss[loss=0.01253, audio_tagging_loss=0.01253, over 25000.00 frames. ], tot_loss[loss=0.01224, audio_tagging_loss=0.01224, over 4954175.87 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:18:06,316 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.096e+01 3.385e+01 3.532e+01 3.687e+01 4.158e+01, threshold=7.063e+01, percent-clipped=0.0 2023-12-23 08:18:19,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1039213.3333333334, ans=0.1 2023-12-23 08:18:33,778 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.93 vs. limit=15.0 2023-12-23 08:18:40,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.54 vs. limit=5.0 2023-12-23 08:18:50,770 INFO [train.py:886] (1/4) Epoch 33, batch 3400, loss[loss=0.01326, audio_tagging_loss=0.01326, over 25000.00 frames. ], tot_loss[loss=0.01225, audio_tagging_loss=0.01225, over 4962035.17 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:19:11,160 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.91 vs. limit=6.0 2023-12-23 08:19:14,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1039546.6666666666, ans=0.1 2023-12-23 08:19:27,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.87 vs. limit=15.0 2023-12-23 08:19:42,842 INFO [train.py:886] (1/4) Epoch 33, batch 3450, loss[loss=0.01409, audio_tagging_loss=0.01409, over 24750.00 frames. ], tot_loss[loss=0.01225, audio_tagging_loss=0.01225, over 4951361.50 frames. ], batch size: 99, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:19:45,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1039746.6666666666, ans=0.125 2023-12-23 08:19:48,486 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.944e+01 3.448e+01 3.587e+01 3.703e+01 4.197e+01, threshold=7.175e+01, percent-clipped=0.0 2023-12-23 08:19:48,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1039746.6666666666, ans=0.0 2023-12-23 08:19:56,098 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:20:03,481 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:20:06,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1039880.0, ans=0.125 2023-12-23 08:20:11,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1039880.0, ans=0.125 2023-12-23 08:20:17,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1039946.6666666666, ans=0.0 2023-12-23 08:20:24,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1040013.3333333334, ans=0.125 2023-12-23 08:20:27,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=1040013.3333333334, ans=0.05 2023-12-23 08:20:28,186 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.69 vs. limit=12.0 2023-12-23 08:20:32,890 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.04 vs. limit=15.0 2023-12-23 08:20:33,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1040013.3333333334, ans=0.0 2023-12-23 08:20:36,033 INFO [train.py:886] (1/4) Epoch 33, batch 3500, loss[loss=0.01397, audio_tagging_loss=0.01397, over 24750.00 frames. ], tot_loss[loss=0.01233, audio_tagging_loss=0.01233, over 4947852.80 frames. ], batch size: 99, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:20:41,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1040080.0, ans=0.5 2023-12-23 08:20:43,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1040080.0, ans=0.2 2023-12-23 08:20:44,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1040080.0, ans=0.125 2023-12-23 08:20:47,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.73 vs. limit=15.0 2023-12-23 08:20:53,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1040146.6666666666, ans=0.0 2023-12-23 08:20:53,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1040146.6666666666, ans=0.125 2023-12-23 08:21:07,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1040280.0, ans=0.1 2023-12-23 08:21:24,095 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.49 vs. limit=12.0 2023-12-23 08:21:26,442 INFO [train.py:886] (1/4) Epoch 33, batch 3550, loss[loss=0.01346, audio_tagging_loss=0.01346, over 25000.00 frames. ], tot_loss[loss=0.01228, audio_tagging_loss=0.01228, over 4952077.75 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:21:32,831 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.940e+01 3.348e+01 3.483e+01 3.687e+01 4.217e+01, threshold=6.967e+01, percent-clipped=0.0 2023-12-23 08:21:38,987 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.07 vs. limit=12.0 2023-12-23 08:21:50,102 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.47 vs. limit=15.0 2023-12-23 08:21:53,895 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.71 vs. limit=15.0 2023-12-23 08:22:06,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1040680.0, ans=0.125 2023-12-23 08:22:06,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1040680.0, ans=0.1 2023-12-23 08:22:18,353 INFO [train.py:886] (1/4) Epoch 33, batch 3600, loss[loss=0.01096, audio_tagging_loss=0.01096, over 21514.00 frames. ], tot_loss[loss=0.01221, audio_tagging_loss=0.01221, over 4946661.88 frames. ], batch size: 107, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:22:20,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1040746.6666666666, ans=0.05 2023-12-23 08:22:24,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1040746.6666666666, ans=0.125 2023-12-23 08:22:24,647 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.93 vs. limit=15.0 2023-12-23 08:22:25,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1040746.6666666666, ans=0.09899494936611666 2023-12-23 08:22:31,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1040813.3333333334, ans=0.125 2023-12-23 08:22:39,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1040880.0, ans=0.0 2023-12-23 08:22:57,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1040946.6666666666, ans=0.125 2023-12-23 08:22:58,478 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:22:59,895 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.50 vs. limit=15.0 2023-12-23 08:22:59,987 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.73 vs. limit=15.0 2023-12-23 08:23:01,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1041013.3333333334, ans=0.0 2023-12-23 08:23:09,592 INFO [train.py:886] (1/4) Epoch 33, batch 3650, loss[loss=0.01303, audio_tagging_loss=0.01303, over 25000.00 frames. ], tot_loss[loss=0.01211, audio_tagging_loss=0.01211, over 4957325.48 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:23:11,043 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.85 vs. limit=10.0 2023-12-23 08:23:15,867 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.932e+01 3.316e+01 3.480e+01 3.651e+01 4.543e+01, threshold=6.960e+01, percent-clipped=0.0 2023-12-23 08:23:18,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1041146.6666666666, ans=0.2 2023-12-23 08:23:21,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1041146.6666666666, ans=0.07 2023-12-23 08:23:26,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1041146.6666666666, ans=0.125 2023-12-23 08:23:35,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1041213.3333333334, ans=0.125 2023-12-23 08:23:39,890 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.76 vs. limit=15.0 2023-12-23 08:23:43,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1041280.0, ans=0.07 2023-12-23 08:23:48,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1041280.0, ans=0.125 2023-12-23 08:24:01,234 INFO [train.py:886] (1/4) Epoch 33, batch 3700, loss[loss=0.01037, audio_tagging_loss=0.01037, over 25000.00 frames. ], tot_loss[loss=0.01204, audio_tagging_loss=0.01204, over 4959435.37 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:24:11,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1041480.0, ans=0.0 2023-12-23 08:24:31,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1041613.3333333334, ans=0.125 2023-12-23 08:24:43,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1041680.0, ans=0.2 2023-12-23 08:24:51,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1041746.6666666666, ans=0.2 2023-12-23 08:24:52,538 INFO [train.py:886] (1/4) Epoch 33, batch 3750, loss[loss=0.00754, audio_tagging_loss=0.00754, over 24035.00 frames. ], tot_loss[loss=0.01209, audio_tagging_loss=0.01209, over 4953827.19 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:24:59,664 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.113e+01 3.433e+01 3.584e+01 3.717e+01 4.082e+01, threshold=7.168e+01, percent-clipped=0.0 2023-12-23 08:25:13,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1041880.0, ans=0.07 2023-12-23 08:25:28,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=1041946.6666666666, ans=0.1 2023-12-23 08:25:44,410 INFO [train.py:886] (1/4) Epoch 33, batch 3800, loss[loss=0.01334, audio_tagging_loss=0.01334, over 24750.00 frames. ], tot_loss[loss=0.01221, audio_tagging_loss=0.01221, over 4944042.53 frames. ], batch size: 99, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:26:07,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1042213.3333333334, ans=0.125 2023-12-23 08:26:11,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1042213.3333333334, ans=0.0 2023-12-23 08:26:16,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1042280.0, ans=0.0 2023-12-23 08:26:22,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1042280.0, ans=0.125 2023-12-23 08:26:36,534 INFO [train.py:886] (1/4) Epoch 33, batch 3850, loss[loss=0.01273, audio_tagging_loss=0.01273, over 25000.00 frames. ], tot_loss[loss=0.0122, audio_tagging_loss=0.0122, over 4945850.08 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:26:42,211 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.174e+01 3.454e+01 3.600e+01 3.785e+01 4.455e+01, threshold=7.200e+01, percent-clipped=0.0 2023-12-23 08:26:44,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1042413.3333333334, ans=0.0 2023-12-23 08:26:46,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.26 vs. limit=15.0 2023-12-23 08:26:52,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1042480.0, ans=0.0 2023-12-23 08:26:53,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1042480.0, ans=0.1 2023-12-23 08:27:15,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1042613.3333333334, ans=0.0 2023-12-23 08:27:26,759 INFO [train.py:886] (1/4) Epoch 33, batch 3900, loss[loss=0.01134, audio_tagging_loss=0.01134, over 25000.00 frames. ], tot_loss[loss=0.01221, audio_tagging_loss=0.01221, over 4946892.31 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:27:34,420 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.45 vs. limit=10.0 2023-12-23 08:27:40,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1042813.3333333334, ans=0.05 2023-12-23 08:27:44,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1042813.3333333334, ans=0.125 2023-12-23 08:27:49,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1042880.0, ans=0.125 2023-12-23 08:28:11,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1043013.3333333334, ans=0.125 2023-12-23 08:28:18,537 INFO [train.py:886] (1/4) Epoch 33, batch 3950, loss[loss=0.01211, audio_tagging_loss=0.01211, over 23986.00 frames. ], tot_loss[loss=0.01214, audio_tagging_loss=0.01214, over 4947374.72 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:28:22,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1043080.0, ans=0.0 2023-12-23 08:28:24,295 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.034e+01 3.369e+01 3.508e+01 3.685e+01 5.218e+01, threshold=7.015e+01, percent-clipped=0.0 2023-12-23 08:28:33,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1043146.6666666666, ans=0.0 2023-12-23 08:28:43,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1043213.3333333334, ans=0.125 2023-12-23 08:29:03,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1043346.6666666666, ans=0.125 2023-12-23 08:29:06,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1043346.6666666666, ans=0.125 2023-12-23 08:29:09,813 INFO [train.py:886] (1/4) Epoch 33, batch 4000, loss[loss=0.01027, audio_tagging_loss=0.01027, over 25000.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 4946692.20 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:29:11,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1043413.3333333334, ans=0.0 2023-12-23 08:29:15,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1043413.3333333334, ans=0.125 2023-12-23 08:29:18,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1043413.3333333334, ans=0.125 2023-12-23 08:29:24,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1043480.0, ans=0.125 2023-12-23 08:29:24,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1043480.0, ans=0.0 2023-12-23 08:29:29,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1043546.6666666666, ans=0.015 2023-12-23 08:29:35,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.91 vs. limit=22.5 2023-12-23 08:29:39,180 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.51 vs. limit=22.5 2023-12-23 08:30:00,661 INFO [train.py:886] (1/4) Epoch 33, batch 4050, loss[loss=0.01302, audio_tagging_loss=0.01302, over 24055.00 frames. ], tot_loss[loss=0.01224, audio_tagging_loss=0.01224, over 4952673.19 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:30:06,405 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.985e+01 3.429e+01 3.579e+01 3.740e+01 4.198e+01, threshold=7.158e+01, percent-clipped=0.0 2023-12-23 08:30:07,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1043746.6666666666, ans=0.07 2023-12-23 08:30:11,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1043813.3333333334, ans=0.0 2023-12-23 08:30:17,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1043813.3333333334, ans=0.0 2023-12-23 08:30:31,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1043946.6666666666, ans=0.0 2023-12-23 08:30:40,380 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.88 vs. limit=6.0 2023-12-23 08:30:41,302 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.70 vs. limit=22.5 2023-12-23 08:30:52,103 INFO [train.py:886] (1/4) Epoch 33, batch 4100, loss[loss=0.01275, audio_tagging_loss=0.01275, over 24750.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4942807.34 frames. ], batch size: 99, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:30:53,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1044080.0, ans=0.125 2023-12-23 08:31:05,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.13 vs. limit=15.0 2023-12-23 08:31:42,125 INFO [train.py:886] (1/4) Epoch 33, batch 4150, loss[loss=0.01197, audio_tagging_loss=0.01197, over 24750.00 frames. ], tot_loss[loss=0.0123, audio_tagging_loss=0.0123, over 4935946.97 frames. ], batch size: 99, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:31:47,615 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.55 vs. limit=12.0 2023-12-23 08:31:48,512 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.092e+01 3.375e+01 3.544e+01 3.687e+01 4.379e+01, threshold=7.088e+01, percent-clipped=0.0 2023-12-23 08:32:32,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1044746.6666666666, ans=0.1 2023-12-23 08:32:33,498 INFO [train.py:886] (1/4) Epoch 33, batch 4200, loss[loss=0.01043, audio_tagging_loss=0.01043, over 25000.00 frames. ], tot_loss[loss=0.01214, audio_tagging_loss=0.01214, over 4940244.12 frames. ], batch size: 100, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:32:34,217 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.49 vs. limit=15.0 2023-12-23 08:32:40,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=1044746.6666666666, ans=0.02 2023-12-23 08:32:49,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1044813.3333333334, ans=0.2 2023-12-23 08:32:49,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1044813.3333333334, ans=0.2 2023-12-23 08:33:25,248 INFO [train.py:886] (1/4) Epoch 33, batch 4250, loss[loss=0.0127, audio_tagging_loss=0.0127, over 25000.00 frames. ], tot_loss[loss=0.01216, audio_tagging_loss=0.01216, over 4944430.62 frames. ], batch size: 100, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:33:31,640 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.010e+01 3.356e+01 3.487e+01 3.659e+01 4.243e+01, threshold=6.975e+01, percent-clipped=0.0 2023-12-23 08:33:34,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1045146.6666666666, ans=0.1 2023-12-23 08:33:39,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1045146.6666666666, ans=0.0 2023-12-23 08:33:47,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1045213.3333333334, ans=0.09899494936611666 2023-12-23 08:33:52,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1045213.3333333334, ans=0.0 2023-12-23 08:33:53,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1045213.3333333334, ans=0.0 2023-12-23 08:33:53,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1045213.3333333334, ans=0.0 2023-12-23 08:33:55,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=1045280.0, ans=0.5 2023-12-23 08:34:15,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1045413.3333333334, ans=0.2 2023-12-23 08:34:16,219 INFO [train.py:886] (1/4) Epoch 33, batch 4300, loss[loss=0.01322, audio_tagging_loss=0.01322, over 25000.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 4945647.04 frames. ], batch size: 100, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:34:22,933 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.86 vs. limit=15.0 2023-12-23 08:34:34,586 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.69 vs. limit=22.5 2023-12-23 08:34:38,477 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.63 vs. limit=22.5 2023-12-23 08:34:42,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1045546.6666666666, ans=0.09899494936611666 2023-12-23 08:34:49,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1045613.3333333334, ans=0.0 2023-12-23 08:34:52,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=1045613.3333333334, ans=15.0 2023-12-23 08:35:08,715 INFO [train.py:886] (1/4) Epoch 33, batch 4350, loss[loss=0.01284, audio_tagging_loss=0.01284, over 25000.00 frames. ], tot_loss[loss=0.01228, audio_tagging_loss=0.01228, over 4953161.24 frames. ], batch size: 100, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:35:15,036 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.109e+01 3.438e+01 3.558e+01 3.684e+01 4.485e+01, threshold=7.115e+01, percent-clipped=0.0 2023-12-23 08:35:29,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1045880.0, ans=0.0 2023-12-23 08:35:30,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1045880.0, ans=0.125 2023-12-23 08:35:35,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1045880.0, ans=0.0 2023-12-23 08:35:39,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1045946.6666666666, ans=0.125 2023-12-23 08:35:41,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1045946.6666666666, ans=0.07 2023-12-23 08:35:47,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1045946.6666666666, ans=0.125 2023-12-23 08:35:49,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1046013.3333333334, ans=0.125 2023-12-23 08:36:01,115 INFO [train.py:886] (1/4) Epoch 33, batch 4400, loss[loss=0.01222, audio_tagging_loss=0.01222, over 24750.00 frames. ], tot_loss[loss=0.01236, audio_tagging_loss=0.01236, over 4952442.29 frames. ], batch size: 99, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:36:13,863 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.72 vs. limit=22.5 2023-12-23 08:36:20,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1046213.3333333334, ans=0.125 2023-12-23 08:36:23,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1046213.3333333334, ans=0.2 2023-12-23 08:36:42,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1046346.6666666666, ans=0.05 2023-12-23 08:36:52,638 INFO [train.py:886] (1/4) Epoch 33, batch 4450, loss[loss=0.01213, audio_tagging_loss=0.01213, over 24750.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4945801.43 frames. ], batch size: 99, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:36:56,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1046413.3333333334, ans=0.1 2023-12-23 08:36:58,261 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.139e+01 3.349e+01 3.519e+01 3.667e+01 4.264e+01, threshold=7.037e+01, percent-clipped=0.0 2023-12-23 08:37:18,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1046546.6666666666, ans=0.125 2023-12-23 08:37:21,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1046546.6666666666, ans=0.2 2023-12-23 08:37:24,330 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.14 vs. limit=22.5 2023-12-23 08:37:28,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1046613.3333333334, ans=0.125 2023-12-23 08:37:29,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1046613.3333333334, ans=0.1 2023-12-23 08:37:45,002 INFO [train.py:886] (1/4) Epoch 33, batch 4500, loss[loss=0.01204, audio_tagging_loss=0.01204, over 25000.00 frames. ], tot_loss[loss=0.01228, audio_tagging_loss=0.01228, over 4946896.85 frames. ], batch size: 100, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:37:53,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1046746.6666666666, ans=0.0 2023-12-23 08:38:02,357 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.45 vs. limit=6.0 2023-12-23 08:38:06,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1046880.0, ans=0.125 2023-12-23 08:38:13,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1046880.0, ans=0.0 2023-12-23 08:38:24,354 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.12 vs. limit=15.0 2023-12-23 08:38:35,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1047080.0, ans=0.125 2023-12-23 08:38:35,947 INFO [train.py:886] (1/4) Epoch 33, batch 4550, loss[loss=0.01202, audio_tagging_loss=0.01202, over 25000.00 frames. ], tot_loss[loss=0.01224, audio_tagging_loss=0.01224, over 4949306.28 frames. ], batch size: 100, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:38:36,364 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.32 vs. limit=12.0 2023-12-23 08:38:38,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1047080.0, ans=0.1 2023-12-23 08:38:43,054 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.099e+01 3.336e+01 3.508e+01 3.645e+01 4.432e+01, threshold=7.015e+01, percent-clipped=0.0 2023-12-23 08:38:53,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1047146.6666666666, ans=0.04949747468305833 2023-12-23 08:39:02,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1047213.3333333334, ans=0.125 2023-12-23 08:39:10,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1047280.0, ans=0.09899494936611666 2023-12-23 08:39:28,812 INFO [train.py:886] (1/4) Epoch 33, batch 4600, loss[loss=0.01308, audio_tagging_loss=0.01308, over 25000.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4953248.99 frames. ], batch size: 100, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:39:36,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1047413.3333333334, ans=0.1 2023-12-23 08:39:40,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1047480.0, ans=0.125 2023-12-23 08:40:01,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1047613.3333333334, ans=0.0 2023-12-23 08:40:21,225 INFO [train.py:886] (1/4) Epoch 33, batch 4650, loss[loss=0.0106, audio_tagging_loss=0.0106, over 25000.00 frames. ], tot_loss[loss=0.01215, audio_tagging_loss=0.01215, over 4953045.56 frames. ], batch size: 100, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:40:23,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1047746.6666666666, ans=0.0 2023-12-23 08:40:27,559 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.970e+01 3.430e+01 3.556e+01 3.733e+01 4.404e+01, threshold=7.113e+01, percent-clipped=0.0 2023-12-23 08:40:34,500 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.93 vs. limit=15.0 2023-12-23 08:41:01,291 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:41:12,180 INFO [train.py:886] (1/4) Epoch 33, batch 4700, loss[loss=0.01277, audio_tagging_loss=0.01277, over 24750.00 frames. ], tot_loss[loss=0.01227, audio_tagging_loss=0.01227, over 4947116.14 frames. ], batch size: 99, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:41:17,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1048080.0, ans=0.09899494936611666 2023-12-23 08:41:19,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1048080.0, ans=0.0 2023-12-23 08:41:29,501 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2023-12-23 08:41:33,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1048213.3333333334, ans=0.0 2023-12-23 08:41:37,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1048213.3333333334, ans=0.125 2023-12-23 08:41:37,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1048213.3333333334, ans=0.0 2023-12-23 08:41:44,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1048280.0, ans=0.125 2023-12-23 08:41:54,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1048346.6666666666, ans=0.125 2023-12-23 08:41:57,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1048346.6666666666, ans=0.1 2023-12-23 08:41:58,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1048346.6666666666, ans=0.0 2023-12-23 08:41:59,682 INFO [train.py:886] (1/4) Epoch 33, batch 4750, loss[loss=0.01478, audio_tagging_loss=0.01478, over 24750.00 frames. ], tot_loss[loss=0.01238, audio_tagging_loss=0.01238, over 4946553.00 frames. ], batch size: 99, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:42:05,086 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.132e+01 3.425e+01 3.596e+01 3.749e+01 4.228e+01, threshold=7.192e+01, percent-clipped=0.0 2023-12-23 08:42:08,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1048480.0, ans=0.0 2023-12-23 08:42:10,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1048480.0, ans=0.125 2023-12-23 08:42:11,945 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.65 vs. limit=10.0 2023-12-23 08:42:35,317 INFO [train.py:886] (1/4) Epoch 34, batch 0, loss[loss=0.03254, audio_tagging_loss=0.03254, over 21570.00 frames. ], tot_loss[loss=0.03254, audio_tagging_loss=0.03254, over 21570.00 frames. ], batch size: 107, lr: 3.19e-03, grad_scale: 32.0 2023-12-23 08:42:35,317 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 08:42:56,423 INFO [train.py:917] (1/4) Epoch 34, validation: loss=0.03363, audio_tagging_loss=0.03363, over 3737520.00 frames. 2023-12-23 08:42:56,423 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 08:43:32,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1048720.0, ans=0.1 2023-12-23 08:43:34,119 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.02 vs. limit=22.5 2023-12-23 08:43:45,945 INFO [train.py:886] (1/4) Epoch 34, batch 50, loss[loss=0.01483, audio_tagging_loss=0.01483, over 25000.00 frames. ], tot_loss[loss=0.01932, audio_tagging_loss=0.01932, over 1113317.91 frames. ], batch size: 100, lr: 3.19e-03, grad_scale: 32.0 2023-12-23 08:43:53,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1048853.3333333333, ans=0.0 2023-12-23 08:44:00,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1048920.0, ans=0.2 2023-12-23 08:44:06,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1048986.6666666667, ans=0.125 2023-12-23 08:44:10,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1048986.6666666667, ans=0.125 2023-12-23 08:44:11,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1048986.6666666667, ans=0.125 2023-12-23 08:44:12,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1048986.6666666667, ans=0.2 2023-12-23 08:44:18,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1049053.3333333333, ans=0.125 2023-12-23 08:44:18,837 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.32 vs. limit=22.5 2023-12-23 08:44:22,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1049053.3333333333, ans=0.125 2023-12-23 08:44:28,833 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.137e+01 4.024e+01 4.370e+01 4.886e+01 9.756e+01, threshold=8.739e+01, percent-clipped=6.0 2023-12-23 08:44:37,894 INFO [train.py:886] (1/4) Epoch 34, batch 100, loss[loss=0.01445, audio_tagging_loss=0.01445, over 25000.00 frames. ], tot_loss[loss=0.01695, audio_tagging_loss=0.01695, over 1971493.70 frames. ], batch size: 100, lr: 3.19e-03, grad_scale: 32.0 2023-12-23 08:44:41,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1049186.6666666667, ans=0.125 2023-12-23 08:44:46,217 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.55 vs. limit=12.0 2023-12-23 08:44:47,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1049253.3333333333, ans=0.1 2023-12-23 08:44:49,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.82 vs. limit=10.0 2023-12-23 08:45:18,255 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.46 vs. limit=15.0 2023-12-23 08:45:22,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1049453.3333333333, ans=0.1 2023-12-23 08:45:22,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1049453.3333333333, ans=0.125 2023-12-23 08:45:27,666 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.51 vs. limit=22.5 2023-12-23 08:45:28,901 INFO [train.py:886] (1/4) Epoch 34, batch 150, loss[loss=0.01472, audio_tagging_loss=0.01472, over 25000.00 frames. ], tot_loss[loss=0.01526, audio_tagging_loss=0.01526, over 2634758.05 frames. ], batch size: 100, lr: 3.19e-03, grad_scale: 32.0 2023-12-23 08:45:43,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1049586.6666666667, ans=0.09899494936611666 2023-12-23 08:45:54,412 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.546e-03 2023-12-23 08:46:08,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1049720.0, ans=0.1 2023-12-23 08:46:11,369 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.202e+01 3.487e+01 3.657e+01 3.856e+01 4.371e+01, threshold=7.314e+01, percent-clipped=0.0 2023-12-23 08:46:14,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1049786.6666666667, ans=0.07 2023-12-23 08:46:16,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1049786.6666666667, ans=0.125 2023-12-23 08:46:19,919 INFO [train.py:886] (1/4) Epoch 34, batch 200, loss[loss=0.01245, audio_tagging_loss=0.01245, over 25000.00 frames. ], tot_loss[loss=0.01436, audio_tagging_loss=0.01436, over 3153829.48 frames. ], batch size: 100, lr: 3.19e-03, grad_scale: 32.0 2023-12-23 08:46:24,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1049853.3333333333, ans=0.05 2023-12-23 08:46:25,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1049853.3333333333, ans=0.125 2023-12-23 08:46:28,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1049920.0, ans=0.1 2023-12-23 08:46:34,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1049920.0, ans=0.125 2023-12-23 08:46:37,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1049920.0, ans=0.09899494936611666 2023-12-23 08:47:03,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1050120.0, ans=0.125 2023-12-23 08:47:03,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1050120.0, ans=0.04949747468305833 2023-12-23 08:47:10,936 INFO [train.py:886] (1/4) Epoch 34, batch 250, loss[loss=0.01481, audio_tagging_loss=0.01481, over 25000.00 frames. ], tot_loss[loss=0.01384, audio_tagging_loss=0.01384, over 3555820.20 frames. ], batch size: 100, lr: 3.19e-03, grad_scale: 32.0 2023-12-23 08:47:12,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1050186.6666666667, ans=0.0 2023-12-23 08:47:15,017 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:47:15,213 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.28 vs. limit=15.0 2023-12-23 08:47:17,120 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.75 vs. limit=15.0 2023-12-23 08:47:43,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1050386.6666666667, ans=0.1 2023-12-23 08:47:44,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1050386.6666666667, ans=0.125 2023-12-23 08:47:51,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1050453.3333333333, ans=0.0 2023-12-23 08:47:51,928 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.087e+01 3.410e+01 3.573e+01 3.673e+01 4.532e+01, threshold=7.147e+01, percent-clipped=0.0 2023-12-23 08:47:57,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1050453.3333333333, ans=0.125 2023-12-23 08:48:00,545 INFO [train.py:886] (1/4) Epoch 34, batch 300, loss[loss=0.01287, audio_tagging_loss=0.01287, over 24750.00 frames. ], tot_loss[loss=0.01356, audio_tagging_loss=0.01356, over 3864524.25 frames. ], batch size: 99, lr: 3.19e-03, grad_scale: 32.0 2023-12-23 08:48:14,392 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.55 vs. limit=22.5 2023-12-23 08:48:20,250 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.56 vs. limit=22.5 2023-12-23 08:48:37,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1050720.0, ans=0.035 2023-12-23 08:48:43,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1050786.6666666667, ans=0.125 2023-12-23 08:48:43,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1050786.6666666667, ans=10.0 2023-12-23 08:48:52,560 INFO [train.py:886] (1/4) Epoch 34, batch 350, loss[loss=0.01364, audio_tagging_loss=0.01364, over 25000.00 frames. ], tot_loss[loss=0.01322, audio_tagging_loss=0.01322, over 4102403.50 frames. ], batch size: 100, lr: 3.19e-03, grad_scale: 32.0 2023-12-23 08:48:55,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1050853.3333333333, ans=0.2 2023-12-23 08:49:20,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1050986.6666666667, ans=0.0 2023-12-23 08:49:34,322 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.936e+01 3.391e+01 3.531e+01 3.690e+01 4.649e+01, threshold=7.063e+01, percent-clipped=0.0 2023-12-23 08:49:34,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1051120.0, ans=0.1 2023-12-23 08:49:44,269 INFO [train.py:886] (1/4) Epoch 34, batch 400, loss[loss=0.01106, audio_tagging_loss=0.01106, over 24750.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4289257.31 frames. ], batch size: 99, lr: 3.19e-03, grad_scale: 32.0 2023-12-23 08:49:44,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.25 vs. limit=22.5 2023-12-23 08:49:50,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1051186.6666666667, ans=0.125 2023-12-23 08:49:54,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1051253.3333333333, ans=0.1 2023-12-23 08:50:06,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1051320.0, ans=0.125 2023-12-23 08:50:09,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1051320.0, ans=0.2 2023-12-23 08:50:10,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1051320.0, ans=0.2 2023-12-23 08:50:24,020 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:50:36,047 INFO [train.py:886] (1/4) Epoch 34, batch 450, loss[loss=0.01318, audio_tagging_loss=0.01318, over 25000.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 4437392.07 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:50:36,524 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.71 vs. limit=10.0 2023-12-23 08:51:06,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1051720.0, ans=0.2 2023-12-23 08:51:07,745 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.79 vs. limit=12.0 2023-12-23 08:51:09,017 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.56 vs. limit=12.0 2023-12-23 08:51:17,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1051786.6666666667, ans=0.125 2023-12-23 08:51:18,111 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.014e+01 3.386e+01 3.481e+01 3.688e+01 4.054e+01, threshold=6.962e+01, percent-clipped=0.0 2023-12-23 08:51:24,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1051786.6666666667, ans=0.1 2023-12-23 08:51:26,458 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.48 vs. limit=15.0 2023-12-23 08:51:27,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1051786.6666666667, ans=0.2 2023-12-23 08:51:28,814 INFO [train.py:886] (1/4) Epoch 34, batch 500, loss[loss=0.01143, audio_tagging_loss=0.01143, over 25000.00 frames. ], tot_loss[loss=0.01247, audio_tagging_loss=0.01247, over 4556210.16 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:51:38,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1051920.0, ans=0.0 2023-12-23 08:51:40,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1051920.0, ans=0.0 2023-12-23 08:51:51,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1051986.6666666667, ans=0.0 2023-12-23 08:52:01,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1052053.3333333333, ans=0.125 2023-12-23 08:52:16,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1052120.0, ans=0.0 2023-12-23 08:52:19,581 INFO [train.py:886] (1/4) Epoch 34, batch 550, loss[loss=0.01482, audio_tagging_loss=0.01482, over 25000.00 frames. ], tot_loss[loss=0.01248, audio_tagging_loss=0.01248, over 4649087.00 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:52:53,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1052386.6666666667, ans=0.1 2023-12-23 08:52:59,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1052386.6666666667, ans=0.125 2023-12-23 08:53:03,276 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.111e+01 3.451e+01 3.642e+01 3.802e+01 4.281e+01, threshold=7.285e+01, percent-clipped=0.0 2023-12-23 08:53:09,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1052453.3333333333, ans=0.0 2023-12-23 08:53:12,566 INFO [train.py:886] (1/4) Epoch 34, batch 600, loss[loss=0.01086, audio_tagging_loss=0.01086, over 21900.00 frames. ], tot_loss[loss=0.01252, audio_tagging_loss=0.01252, over 4709793.65 frames. ], batch size: 107, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:53:24,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1052586.6666666667, ans=0.05 2023-12-23 08:53:30,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1052586.6666666667, ans=0.0 2023-12-23 08:53:46,268 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.13 vs. limit=10.0 2023-12-23 08:53:52,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1052786.6666666667, ans=0.125 2023-12-23 08:54:04,392 INFO [train.py:886] (1/4) Epoch 34, batch 650, loss[loss=0.0108, audio_tagging_loss=0.0108, over 24750.00 frames. ], tot_loss[loss=0.01252, audio_tagging_loss=0.01252, over 4759753.19 frames. ], batch size: 99, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:54:04,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=1052853.3333333333, ans=0.02 2023-12-23 08:54:12,400 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.40 vs. limit=10.0 2023-12-23 08:54:26,245 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.11 vs. limit=22.5 2023-12-23 08:54:30,993 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.62 vs. limit=10.0 2023-12-23 08:54:34,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1053053.3333333333, ans=0.015 2023-12-23 08:54:46,592 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.068e+01 3.392e+01 3.565e+01 3.715e+01 5.032e+01, threshold=7.129e+01, percent-clipped=0.0 2023-12-23 08:54:46,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1053120.0, ans=0.125 2023-12-23 08:54:47,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1053120.0, ans=0.125 2023-12-23 08:54:55,083 INFO [train.py:886] (1/4) Epoch 34, batch 700, loss[loss=0.01173, audio_tagging_loss=0.01173, over 24750.00 frames. ], tot_loss[loss=0.01245, audio_tagging_loss=0.01245, over 4797512.09 frames. ], batch size: 99, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:54:58,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1053186.6666666667, ans=0.125 2023-12-23 08:55:14,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1053253.3333333333, ans=0.05 2023-12-23 08:55:15,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1053320.0, ans=0.0 2023-12-23 08:55:15,605 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.85 vs. limit=6.0 2023-12-23 08:55:19,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1053320.0, ans=0.125 2023-12-23 08:55:22,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1053320.0, ans=0.125 2023-12-23 08:55:47,838 INFO [train.py:886] (1/4) Epoch 34, batch 750, loss[loss=0.01232, audio_tagging_loss=0.01232, over 24750.00 frames. ], tot_loss[loss=0.01237, audio_tagging_loss=0.01237, over 4830282.99 frames. ], batch size: 99, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:56:06,691 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.57 vs. limit=15.0 2023-12-23 08:56:18,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1053720.0, ans=0.2 2023-12-23 08:56:30,775 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.054e+01 3.393e+01 3.521e+01 3.705e+01 4.133e+01, threshold=7.041e+01, percent-clipped=0.0 2023-12-23 08:56:31,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=1053786.6666666667, ans=12.0 2023-12-23 08:56:32,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1053786.6666666667, ans=0.2 2023-12-23 08:56:34,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1053786.6666666667, ans=0.0 2023-12-23 08:56:40,023 INFO [train.py:886] (1/4) Epoch 34, batch 800, loss[loss=0.01069, audio_tagging_loss=0.01069, over 25000.00 frames. ], tot_loss[loss=0.01223, audio_tagging_loss=0.01223, over 4860624.16 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:56:42,201 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.23 vs. limit=12.0 2023-12-23 08:56:48,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1053853.3333333333, ans=0.0 2023-12-23 08:57:01,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1053986.6666666667, ans=0.125 2023-12-23 08:57:09,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1053986.6666666667, ans=0.125 2023-12-23 08:57:32,039 INFO [train.py:886] (1/4) Epoch 34, batch 850, loss[loss=0.01066, audio_tagging_loss=0.01066, over 25000.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4881031.68 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:57:57,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1054320.0, ans=0.125 2023-12-23 08:58:13,917 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:58:14,561 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.893e+01 3.443e+01 3.585e+01 3.750e+01 4.520e+01, threshold=7.170e+01, percent-clipped=0.0 2023-12-23 08:58:25,631 INFO [train.py:886] (1/4) Epoch 34, batch 900, loss[loss=0.01274, audio_tagging_loss=0.01274, over 24750.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 4897773.76 frames. ], batch size: 99, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:58:43,807 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.83 vs. limit=15.0 2023-12-23 08:59:04,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1054720.0, ans=0.1 2023-12-23 08:59:04,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1054720.0, ans=0.125 2023-12-23 08:59:16,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1054853.3333333333, ans=0.125 2023-12-23 08:59:17,002 INFO [train.py:886] (1/4) Epoch 34, batch 950, loss[loss=0.01275, audio_tagging_loss=0.01275, over 24750.00 frames. ], tot_loss[loss=0.01226, audio_tagging_loss=0.01226, over 4898465.94 frames. ], batch size: 99, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:59:17,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1054853.3333333333, ans=0.125 2023-12-23 08:59:21,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1054853.3333333333, ans=0.125 2023-12-23 08:59:24,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1054853.3333333333, ans=0.0 2023-12-23 08:59:34,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1054920.0, ans=0.0 2023-12-23 09:00:00,944 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.950e+01 3.447e+01 3.600e+01 3.803e+01 4.759e+01, threshold=7.201e+01, percent-clipped=0.0 2023-12-23 09:00:01,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1055120.0, ans=0.2 2023-12-23 09:00:03,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1055120.0, ans=0.0 2023-12-23 09:00:03,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1055120.0, ans=0.2 2023-12-23 09:00:04,932 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:00:09,512 INFO [train.py:886] (1/4) Epoch 34, batch 1000, loss[loss=0.01009, audio_tagging_loss=0.01009, over 25000.00 frames. ], tot_loss[loss=0.01225, audio_tagging_loss=0.01225, over 4905490.53 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 09:00:11,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1055186.6666666667, ans=0.125 2023-12-23 09:00:19,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1055253.3333333333, ans=0.0 2023-12-23 09:00:39,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1055320.0, ans=0.07 2023-12-23 09:00:43,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1055386.6666666667, ans=0.07 2023-12-23 09:00:47,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1055386.6666666667, ans=0.0 2023-12-23 09:00:47,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1055386.6666666667, ans=0.95 2023-12-23 09:00:53,308 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:01:01,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1055520.0, ans=0.125 2023-12-23 09:01:02,091 INFO [train.py:886] (1/4) Epoch 34, batch 1050, loss[loss=0.01351, audio_tagging_loss=0.01351, over 25000.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4910962.37 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 09:01:02,526 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.44 vs. limit=22.5 2023-12-23 09:01:09,802 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2023-12-23 09:01:13,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1055586.6666666667, ans=0.125 2023-12-23 09:01:25,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1055653.3333333333, ans=0.1 2023-12-23 09:01:40,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1055720.0, ans=0.1 2023-12-23 09:01:44,523 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.130e+01 3.411e+01 3.556e+01 3.695e+01 4.710e+01, threshold=7.113e+01, percent-clipped=0.0 2023-12-23 09:01:53,125 INFO [train.py:886] (1/4) Epoch 34, batch 1100, loss[loss=0.01247, audio_tagging_loss=0.01247, over 24905.00 frames. ], tot_loss[loss=0.01211, audio_tagging_loss=0.01211, over 4921046.47 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 09:02:12,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1055920.0, ans=0.1 2023-12-23 09:02:20,490 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.44 vs. limit=5.0 2023-12-23 09:02:23,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1056053.3333333333, ans=0.07 2023-12-23 09:02:27,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1056053.3333333333, ans=0.125 2023-12-23 09:02:28,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1056053.3333333333, ans=0.0 2023-12-23 09:02:35,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1056120.0, ans=0.0 2023-12-23 09:02:46,090 INFO [train.py:886] (1/4) Epoch 34, batch 1150, loss[loss=0.01001, audio_tagging_loss=0.01001, over 24028.00 frames. ], tot_loss[loss=0.01206, audio_tagging_loss=0.01206, over 4930437.48 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 09:03:11,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1056320.0, ans=0.125 2023-12-23 09:03:19,790 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.31 vs. limit=10.0 2023-12-23 09:03:19,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.60 vs. limit=22.5 2023-12-23 09:03:27,716 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.087e+01 3.380e+01 3.484e+01 3.660e+01 4.361e+01, threshold=6.968e+01, percent-clipped=0.0 2023-12-23 09:03:36,206 INFO [train.py:886] (1/4) Epoch 34, batch 1200, loss[loss=0.01223, audio_tagging_loss=0.01223, over 25000.00 frames. ], tot_loss[loss=0.0121, audio_tagging_loss=0.0121, over 4937746.41 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 09:03:39,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1056520.0, ans=0.125 2023-12-23 09:03:44,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1056520.0, ans=0.2 2023-12-23 09:03:44,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1056520.0, ans=0.125 2023-12-23 09:03:59,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1056653.3333333333, ans=0.125 2023-12-23 09:04:04,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1056653.3333333333, ans=0.0 2023-12-23 09:04:27,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1056853.3333333333, ans=0.125 2023-12-23 09:04:28,009 INFO [train.py:886] (1/4) Epoch 34, batch 1250, loss[loss=0.01225, audio_tagging_loss=0.01225, over 24750.00 frames. ], tot_loss[loss=0.01223, audio_tagging_loss=0.01223, over 4942955.90 frames. ], batch size: 99, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 09:04:31,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1056853.3333333333, ans=0.125 2023-12-23 09:04:32,269 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.22 vs. limit=15.0 2023-12-23 09:04:46,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1056920.0, ans=0.125 2023-12-23 09:04:46,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1056920.0, ans=0.125 2023-12-23 09:04:50,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1056986.6666666667, ans=0.2 2023-12-23 09:04:52,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1056986.6666666667, ans=0.0 2023-12-23 09:05:09,584 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.087e+01 3.461e+01 3.581e+01 3.707e+01 4.566e+01, threshold=7.161e+01, percent-clipped=0.0 2023-12-23 09:05:12,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1057120.0, ans=0.125 2023-12-23 09:05:20,298 INFO [train.py:886] (1/4) Epoch 34, batch 1300, loss[loss=0.01224, audio_tagging_loss=0.01224, over 24750.00 frames. ], tot_loss[loss=0.0123, audio_tagging_loss=0.0123, over 4944900.67 frames. ], batch size: 99, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 09:05:26,413 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.08 vs. limit=8.0 2023-12-23 09:05:33,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1057253.3333333333, ans=0.0 2023-12-23 09:05:35,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1057253.3333333333, ans=0.125 2023-12-23 09:05:36,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1057253.3333333333, ans=0.0 2023-12-23 09:05:49,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1057320.0, ans=0.2 2023-12-23 09:06:10,400 INFO [train.py:886] (1/4) Epoch 34, batch 1350, loss[loss=0.01259, audio_tagging_loss=0.01259, over 24023.00 frames. ], tot_loss[loss=0.01225, audio_tagging_loss=0.01225, over 4947446.85 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 09:06:22,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1057586.6666666667, ans=0.0 2023-12-23 09:06:36,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1057653.3333333333, ans=0.125 2023-12-23 09:06:36,361 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.75 vs. limit=15.0 2023-12-23 09:06:42,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1057720.0, ans=0.0 2023-12-23 09:06:53,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1057786.6666666667, ans=0.125 2023-12-23 09:06:54,021 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.110e+01 3.373e+01 3.475e+01 3.645e+01 4.225e+01, threshold=6.949e+01, percent-clipped=0.0 2023-12-23 09:06:55,528 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.86 vs. limit=15.0 2023-12-23 09:07:00,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1057786.6666666667, ans=0.125 2023-12-23 09:07:03,275 INFO [train.py:886] (1/4) Epoch 34, batch 1400, loss[loss=0.01148, audio_tagging_loss=0.01148, over 25000.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4945573.13 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 09:07:13,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1057920.0, ans=0.2 2023-12-23 09:07:27,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1057986.6666666667, ans=0.1 2023-12-23 09:07:54,254 INFO [train.py:886] (1/4) Epoch 34, batch 1450, loss[loss=0.01062, audio_tagging_loss=0.01062, over 21957.00 frames. ], tot_loss[loss=0.0122, audio_tagging_loss=0.0122, over 4950469.16 frames. ], batch size: 107, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:08:03,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1058186.6666666667, ans=0.1 2023-12-23 09:08:15,081 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.90 vs. limit=12.0 2023-12-23 09:08:16,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1058320.0, ans=0.2 2023-12-23 09:08:18,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1058320.0, ans=10.0 2023-12-23 09:08:25,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1058386.6666666667, ans=0.0 2023-12-23 09:08:30,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1058386.6666666667, ans=0.2 2023-12-23 09:08:37,985 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.058e+01 3.393e+01 3.506e+01 3.631e+01 4.312e+01, threshold=7.011e+01, percent-clipped=0.0 2023-12-23 09:08:46,594 INFO [train.py:886] (1/4) Epoch 34, batch 1500, loss[loss=0.01234, audio_tagging_loss=0.01234, over 25000.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4954395.78 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:08:46,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1058520.0, ans=0.125 2023-12-23 09:08:50,880 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.57 vs. limit=22.5 2023-12-23 09:08:52,844 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=4.50 vs. limit=15.0 2023-12-23 09:09:00,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1058586.6666666667, ans=0.125 2023-12-23 09:09:03,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1058586.6666666667, ans=0.125 2023-12-23 09:09:10,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1058653.3333333333, ans=0.125 2023-12-23 09:09:17,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1058720.0, ans=0.125 2023-12-23 09:09:21,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1058720.0, ans=0.2 2023-12-23 09:09:21,380 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2023-12-23 09:09:26,959 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.52 vs. limit=15.0 2023-12-23 09:09:29,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1058786.6666666667, ans=0.125 2023-12-23 09:09:38,131 INFO [train.py:886] (1/4) Epoch 34, batch 1550, loss[loss=0.0142, audio_tagging_loss=0.0142, over 24750.00 frames. ], tot_loss[loss=0.01226, audio_tagging_loss=0.01226, over 4951300.59 frames. ], batch size: 99, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:09:56,777 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:10:10,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.90 vs. limit=15.0 2023-12-23 09:10:12,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1059053.3333333333, ans=0.125 2023-12-23 09:10:17,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1059053.3333333333, ans=0.1 2023-12-23 09:10:21,188 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.143e+01 3.468e+01 3.577e+01 3.734e+01 4.228e+01, threshold=7.153e+01, percent-clipped=0.0 2023-12-23 09:10:29,766 INFO [train.py:886] (1/4) Epoch 34, batch 1600, loss[loss=0.0119, audio_tagging_loss=0.0119, over 24750.00 frames. ], tot_loss[loss=0.01236, audio_tagging_loss=0.01236, over 4944705.67 frames. ], batch size: 99, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:10:36,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1059186.6666666667, ans=0.0 2023-12-23 09:10:36,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1059186.6666666667, ans=0.125 2023-12-23 09:10:50,269 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.61 vs. limit=15.0 2023-12-23 09:10:57,577 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.33 vs. limit=12.0 2023-12-23 09:11:20,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1059453.3333333333, ans=0.1 2023-12-23 09:11:22,330 INFO [train.py:886] (1/4) Epoch 34, batch 1650, loss[loss=0.01506, audio_tagging_loss=0.01506, over 24750.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4944539.46 frames. ], batch size: 99, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:11:34,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1059586.6666666667, ans=0.125 2023-12-23 09:11:40,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1059586.6666666667, ans=0.1 2023-12-23 09:11:42,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1059653.3333333333, ans=0.125 2023-12-23 09:12:03,326 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.071e+01 3.325e+01 3.519e+01 3.722e+01 4.583e+01, threshold=7.038e+01, percent-clipped=0.0 2023-12-23 09:12:09,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.01 vs. limit=6.0 2023-12-23 09:12:09,515 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.66 vs. limit=15.0 2023-12-23 09:12:13,267 INFO [train.py:886] (1/4) Epoch 34, batch 1700, loss[loss=0.01434, audio_tagging_loss=0.01434, over 25000.00 frames. ], tot_loss[loss=0.01221, audio_tagging_loss=0.01221, over 4940514.45 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:12:13,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1059853.3333333333, ans=0.2 2023-12-23 09:12:24,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1059920.0, ans=0.0 2023-12-23 09:12:35,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=1059986.6666666667, ans=0.95 2023-12-23 09:12:40,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1059986.6666666667, ans=0.1 2023-12-23 09:12:42,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1059986.6666666667, ans=0.125 2023-12-23 09:12:43,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1059986.6666666667, ans=0.1 2023-12-23 09:12:47,027 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.48 vs. limit=22.5 2023-12-23 09:12:47,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1060053.3333333333, ans=0.0 2023-12-23 09:12:59,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1060120.0, ans=0.125 2023-12-23 09:13:05,129 INFO [train.py:886] (1/4) Epoch 34, batch 1750, loss[loss=0.01175, audio_tagging_loss=0.01175, over 25000.00 frames. ], tot_loss[loss=0.01214, audio_tagging_loss=0.01214, over 4944500.66 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:13:28,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1060320.0, ans=0.125 2023-12-23 09:13:37,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1060386.6666666667, ans=0.125 2023-12-23 09:13:47,291 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.064e+01 3.371e+01 3.527e+01 3.701e+01 4.388e+01, threshold=7.054e+01, percent-clipped=0.0 2023-12-23 09:13:54,160 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.33 vs. limit=15.0 2023-12-23 09:13:57,077 INFO [train.py:886] (1/4) Epoch 34, batch 1800, loss[loss=0.01232, audio_tagging_loss=0.01232, over 25000.00 frames. ], tot_loss[loss=0.01215, audio_tagging_loss=0.01215, over 4951533.18 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:13:57,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1060520.0, ans=0.125 2023-12-23 09:14:02,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1060520.0, ans=0.2 2023-12-23 09:14:21,406 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=22.5 2023-12-23 09:14:28,528 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:14:43,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1060786.6666666667, ans=0.125 2023-12-23 09:14:47,626 INFO [train.py:886] (1/4) Epoch 34, batch 1850, loss[loss=0.01132, audio_tagging_loss=0.01132, over 25000.00 frames. ], tot_loss[loss=0.01222, audio_tagging_loss=0.01222, over 4947394.80 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:14:48,272 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.75 vs. limit=10.0 2023-12-23 09:14:48,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1060853.3333333333, ans=0.1 2023-12-23 09:14:48,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1060853.3333333333, ans=0.1 2023-12-23 09:14:59,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1060920.0, ans=0.0 2023-12-23 09:15:06,283 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.59 vs. limit=10.0 2023-12-23 09:15:15,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1060986.6666666667, ans=0.125 2023-12-23 09:15:17,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1061053.3333333333, ans=0.125 2023-12-23 09:15:30,277 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.188e+01 3.414e+01 3.585e+01 3.770e+01 4.253e+01, threshold=7.170e+01, percent-clipped=0.0 2023-12-23 09:15:36,482 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.07 vs. limit=8.0 2023-12-23 09:15:37,176 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.40 vs. limit=15.0 2023-12-23 09:15:40,225 INFO [train.py:886] (1/4) Epoch 34, batch 1900, loss[loss=0.0105, audio_tagging_loss=0.0105, over 24750.00 frames. ], tot_loss[loss=0.0123, audio_tagging_loss=0.0123, over 4942856.18 frames. ], batch size: 99, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:15:40,390 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:15:41,911 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.90 vs. limit=15.0 2023-12-23 09:16:19,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1061386.6666666667, ans=0.125 2023-12-23 09:16:31,205 INFO [train.py:886] (1/4) Epoch 34, batch 1950, loss[loss=0.01304, audio_tagging_loss=0.01304, over 25000.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4941229.61 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:17:08,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1061720.0, ans=0.125 2023-12-23 09:17:14,343 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.990e+01 3.384e+01 3.570e+01 3.708e+01 4.243e+01, threshold=7.140e+01, percent-clipped=0.0 2023-12-23 09:17:17,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1061786.6666666667, ans=0.0 2023-12-23 09:17:23,728 INFO [train.py:886] (1/4) Epoch 34, batch 2000, loss[loss=0.01123, audio_tagging_loss=0.01123, over 24750.00 frames. ], tot_loss[loss=0.01215, audio_tagging_loss=0.01215, over 4940461.16 frames. ], batch size: 99, lr: 3.17e-03, grad_scale: 64.0 2023-12-23 09:17:23,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1061853.3333333333, ans=0.125 2023-12-23 09:17:24,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=1061853.3333333333, ans=0.1 2023-12-23 09:17:31,348 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.27 vs. limit=15.0 2023-12-23 09:17:31,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1061853.3333333333, ans=0.125 2023-12-23 09:18:06,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1062120.0, ans=0.0 2023-12-23 09:18:13,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1062120.0, ans=0.2 2023-12-23 09:18:16,320 INFO [train.py:886] (1/4) Epoch 34, batch 2050, loss[loss=0.01136, audio_tagging_loss=0.01136, over 25000.00 frames. ], tot_loss[loss=0.01214, audio_tagging_loss=0.01214, over 4948715.21 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 64.0 2023-12-23 09:18:20,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1062186.6666666667, ans=0.0 2023-12-23 09:18:22,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1062186.6666666667, ans=0.125 2023-12-23 09:18:53,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=1062386.6666666667, ans=0.05 2023-12-23 09:18:58,855 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.046e+01 3.394e+01 3.550e+01 3.730e+01 4.740e+01, threshold=7.100e+01, percent-clipped=0.0 2023-12-23 09:19:02,370 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.20 vs. limit=15.0 2023-12-23 09:19:07,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1062520.0, ans=0.0 2023-12-23 09:19:08,954 INFO [train.py:886] (1/4) Epoch 34, batch 2100, loss[loss=0.01378, audio_tagging_loss=0.01378, over 25000.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 4949476.75 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 64.0 2023-12-23 09:19:12,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1062520.0, ans=0.0 2023-12-23 09:19:14,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1062520.0, ans=0.035 2023-12-23 09:19:38,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1062720.0, ans=0.0 2023-12-23 09:19:51,999 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.97 vs. limit=15.0 2023-12-23 09:19:59,896 INFO [train.py:886] (1/4) Epoch 34, batch 2150, loss[loss=0.01251, audio_tagging_loss=0.01251, over 25000.00 frames. ], tot_loss[loss=0.01218, audio_tagging_loss=0.01218, over 4956237.69 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 64.0 2023-12-23 09:20:04,441 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.47 vs. limit=5.0 2023-12-23 09:20:10,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1062920.0, ans=0.0 2023-12-23 09:20:25,884 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.95 vs. limit=6.0 2023-12-23 09:20:42,091 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.148e+01 3.426e+01 3.569e+01 3.716e+01 4.443e+01, threshold=7.139e+01, percent-clipped=0.0 2023-12-23 09:20:43,435 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.87 vs. limit=22.5 2023-12-23 09:20:50,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1063120.0, ans=0.1 2023-12-23 09:20:51,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1063186.6666666667, ans=0.125 2023-12-23 09:20:52,024 INFO [train.py:886] (1/4) Epoch 34, batch 2200, loss[loss=0.01207, audio_tagging_loss=0.01207, over 24750.00 frames. ], tot_loss[loss=0.01226, audio_tagging_loss=0.01226, over 4952719.48 frames. ], batch size: 99, lr: 3.17e-03, grad_scale: 64.0 2023-12-23 09:20:52,679 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.87 vs. limit=15.0 2023-12-23 09:21:18,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1063320.0, ans=0.125 2023-12-23 09:21:30,421 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.52 vs. limit=15.0 2023-12-23 09:21:31,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1063386.6666666667, ans=0.0 2023-12-23 09:21:43,782 INFO [train.py:886] (1/4) Epoch 34, batch 2250, loss[loss=0.01165, audio_tagging_loss=0.01165, over 24750.00 frames. ], tot_loss[loss=0.01224, audio_tagging_loss=0.01224, over 4948355.73 frames. ], batch size: 99, lr: 3.17e-03, grad_scale: 64.0 2023-12-23 09:21:52,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1063520.0, ans=0.0 2023-12-23 09:21:59,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1063586.6666666667, ans=0.125 2023-12-23 09:22:12,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1063653.3333333333, ans=0.0 2023-12-23 09:22:27,020 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.046e+01 3.422e+01 3.558e+01 3.753e+01 4.731e+01, threshold=7.116e+01, percent-clipped=0.0 2023-12-23 09:22:31,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1063786.6666666667, ans=0.125 2023-12-23 09:22:36,333 INFO [train.py:886] (1/4) Epoch 34, batch 2300, loss[loss=0.0102, audio_tagging_loss=0.0102, over 24750.00 frames. ], tot_loss[loss=0.0122, audio_tagging_loss=0.0122, over 4947672.79 frames. ], batch size: 99, lr: 3.17e-03, grad_scale: 64.0 2023-12-23 09:22:44,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1063853.3333333333, ans=0.0 2023-12-23 09:22:46,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1063920.0, ans=0.125 2023-12-23 09:23:09,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1064053.3333333333, ans=0.125 2023-12-23 09:23:10,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1064053.3333333333, ans=0.0 2023-12-23 09:23:11,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=1064053.3333333333, ans=10.0 2023-12-23 09:23:18,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1064120.0, ans=0.0 2023-12-23 09:23:29,151 INFO [train.py:886] (1/4) Epoch 34, batch 2350, loss[loss=0.01459, audio_tagging_loss=0.01459, over 24750.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 4945484.86 frames. ], batch size: 99, lr: 3.17e-03, grad_scale: 64.0 2023-12-23 09:23:43,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1064253.3333333333, ans=0.125 2023-12-23 09:24:11,376 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.992e+01 3.391e+01 3.504e+01 3.673e+01 5.436e+01, threshold=7.008e+01, percent-clipped=0.0 2023-12-23 09:24:12,902 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.95 vs. limit=22.5 2023-12-23 09:24:19,902 INFO [train.py:886] (1/4) Epoch 34, batch 2400, loss[loss=0.01505, audio_tagging_loss=0.01505, over 25000.00 frames. ], tot_loss[loss=0.0122, audio_tagging_loss=0.0122, over 4953486.79 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 64.0 2023-12-23 09:24:30,745 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.41 vs. limit=6.0 2023-12-23 09:24:32,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1064586.6666666667, ans=0.125 2023-12-23 09:24:52,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1064720.0, ans=0.125 2023-12-23 09:24:52,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1064720.0, ans=0.05 2023-12-23 09:24:59,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1064786.6666666667, ans=0.1 2023-12-23 09:25:07,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1064786.6666666667, ans=15.0 2023-12-23 09:25:10,217 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:25:10,992 INFO [train.py:886] (1/4) Epoch 34, batch 2450, loss[loss=0.01373, audio_tagging_loss=0.01373, over 25000.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4955721.23 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 64.0 2023-12-23 09:25:16,940 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.90 vs. limit=10.0 2023-12-23 09:25:19,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1064853.3333333333, ans=0.125 2023-12-23 09:25:20,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1064920.0, ans=0.1 2023-12-23 09:25:21,267 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:25:52,955 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.052e+01 3.384e+01 3.531e+01 3.725e+01 4.723e+01, threshold=7.062e+01, percent-clipped=0.0 2023-12-23 09:26:01,421 INFO [train.py:886] (1/4) Epoch 34, batch 2500, loss[loss=0.01128, audio_tagging_loss=0.01128, over 24750.00 frames. ], tot_loss[loss=0.01225, audio_tagging_loss=0.01225, over 4954493.93 frames. ], batch size: 99, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:26:13,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1065253.3333333333, ans=0.0 2023-12-23 09:26:34,082 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.23 vs. limit=12.0 2023-12-23 09:26:49,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1065453.3333333333, ans=0.1 2023-12-23 09:26:52,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1065453.3333333333, ans=0.1 2023-12-23 09:26:54,408 INFO [train.py:886] (1/4) Epoch 34, batch 2550, loss[loss=0.01215, audio_tagging_loss=0.01215, over 22312.00 frames. ], tot_loss[loss=0.01227, audio_tagging_loss=0.01227, over 4947515.58 frames. ], batch size: 107, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:26:58,847 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.35 vs. limit=8.0 2023-12-23 09:27:23,173 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.30 vs. limit=10.0 2023-12-23 09:27:23,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1065653.3333333333, ans=0.0 2023-12-23 09:27:24,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1065720.0, ans=0.0 2023-12-23 09:27:28,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1065720.0, ans=0.125 2023-12-23 09:27:31,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1065720.0, ans=0.2 2023-12-23 09:27:34,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1065786.6666666667, ans=0.1 2023-12-23 09:27:35,931 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.037e+01 3.394e+01 3.621e+01 3.808e+01 4.469e+01, threshold=7.242e+01, percent-clipped=0.0 2023-12-23 09:27:38,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1065786.6666666667, ans=0.125 2023-12-23 09:27:45,517 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.38 vs. limit=15.0 2023-12-23 09:27:46,496 INFO [train.py:886] (1/4) Epoch 34, batch 2600, loss[loss=0.01466, audio_tagging_loss=0.01466, over 24750.00 frames. ], tot_loss[loss=0.01216, audio_tagging_loss=0.01216, over 4942887.93 frames. ], batch size: 99, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:27:46,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1065853.3333333333, ans=0.025 2023-12-23 09:27:52,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1065853.3333333333, ans=0.1 2023-12-23 09:27:52,351 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.63 vs. limit=15.0 2023-12-23 09:27:53,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1065853.3333333333, ans=0.125 2023-12-23 09:27:55,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1065920.0, ans=0.125 2023-12-23 09:28:37,534 INFO [train.py:886] (1/4) Epoch 34, batch 2650, loss[loss=0.01228, audio_tagging_loss=0.01228, over 25000.00 frames. ], tot_loss[loss=0.01211, audio_tagging_loss=0.01211, over 4948702.78 frames. ], batch size: 100, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:29:20,770 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.086e+01 3.338e+01 3.500e+01 3.671e+01 4.069e+01, threshold=7.000e+01, percent-clipped=0.0 2023-12-23 09:29:24,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1066453.3333333333, ans=0.0 2023-12-23 09:29:30,361 INFO [train.py:886] (1/4) Epoch 34, batch 2700, loss[loss=0.01444, audio_tagging_loss=0.01444, over 21767.00 frames. ], tot_loss[loss=0.01206, audio_tagging_loss=0.01206, over 4947877.12 frames. ], batch size: 107, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:29:34,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1066520.0, ans=0.125 2023-12-23 09:29:37,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1066520.0, ans=0.0 2023-12-23 09:29:48,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1066586.6666666667, ans=0.0 2023-12-23 09:29:56,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1066653.3333333333, ans=0.125 2023-12-23 09:29:57,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.24 vs. limit=22.5 2023-12-23 09:30:16,348 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.22 vs. limit=15.0 2023-12-23 09:30:22,360 INFO [train.py:886] (1/4) Epoch 34, batch 2750, loss[loss=0.01209, audio_tagging_loss=0.01209, over 25000.00 frames. ], tot_loss[loss=0.01202, audio_tagging_loss=0.01202, over 4946220.82 frames. ], batch size: 100, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:30:55,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1067053.3333333333, ans=0.125 2023-12-23 09:30:58,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1067053.3333333333, ans=0.1 2023-12-23 09:30:59,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.78 vs. limit=10.0 2023-12-23 09:31:03,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1067120.0, ans=0.2 2023-12-23 09:31:04,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1067120.0, ans=0.125 2023-12-23 09:31:05,355 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.165e+01 3.431e+01 3.588e+01 3.825e+01 4.310e+01, threshold=7.176e+01, percent-clipped=0.0 2023-12-23 09:31:05,534 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:31:13,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1067120.0, ans=0.05 2023-12-23 09:31:14,668 INFO [train.py:886] (1/4) Epoch 34, batch 2800, loss[loss=0.01129, audio_tagging_loss=0.01129, over 25000.00 frames. ], tot_loss[loss=0.01223, audio_tagging_loss=0.01223, over 4952456.97 frames. ], batch size: 100, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:31:31,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1067253.3333333333, ans=0.0 2023-12-23 09:31:32,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1067253.3333333333, ans=10.0 2023-12-23 09:31:32,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1067253.3333333333, ans=0.125 2023-12-23 09:32:07,299 INFO [train.py:886] (1/4) Epoch 34, batch 2850, loss[loss=0.01288, audio_tagging_loss=0.01288, over 24750.00 frames. ], tot_loss[loss=0.01223, audio_tagging_loss=0.01223, over 4943415.82 frames. ], batch size: 99, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:32:13,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1067520.0, ans=0.125 2023-12-23 09:32:31,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1067653.3333333333, ans=0.1 2023-12-23 09:32:39,279 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2023-12-23 09:32:40,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1067720.0, ans=0.07 2023-12-23 09:32:49,736 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.095e+01 3.409e+01 3.533e+01 3.696e+01 4.223e+01, threshold=7.066e+01, percent-clipped=0.0 2023-12-23 09:32:58,213 INFO [train.py:886] (1/4) Epoch 34, batch 2900, loss[loss=0.01197, audio_tagging_loss=0.01197, over 24750.00 frames. ], tot_loss[loss=0.01215, audio_tagging_loss=0.01215, over 4943390.72 frames. ], batch size: 99, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:33:04,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1067853.3333333333, ans=0.0 2023-12-23 09:33:05,803 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.05 vs. limit=10.0 2023-12-23 09:33:10,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1067920.0, ans=0.0 2023-12-23 09:33:15,779 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.15 vs. limit=15.0 2023-12-23 09:33:25,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1067986.6666666667, ans=0.125 2023-12-23 09:33:25,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1067986.6666666667, ans=0.125 2023-12-23 09:33:41,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1068120.0, ans=0.1 2023-12-23 09:33:44,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1068120.0, ans=0.125 2023-12-23 09:33:47,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1068120.0, ans=0.0 2023-12-23 09:33:50,601 INFO [train.py:886] (1/4) Epoch 34, batch 2950, loss[loss=0.01397, audio_tagging_loss=0.01397, over 25000.00 frames. ], tot_loss[loss=0.01207, audio_tagging_loss=0.01207, over 4946011.24 frames. ], batch size: 100, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:33:52,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1068186.6666666667, ans=10.0 2023-12-23 09:34:00,322 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.29 vs. limit=15.0 2023-12-23 09:34:05,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1068253.3333333333, ans=0.125 2023-12-23 09:34:06,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1068253.3333333333, ans=0.0 2023-12-23 09:34:13,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1068320.0, ans=0.125 2023-12-23 09:34:31,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1068453.3333333333, ans=0.0 2023-12-23 09:34:32,919 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.061e+01 3.362e+01 3.516e+01 3.709e+01 4.574e+01, threshold=7.032e+01, percent-clipped=0.0 2023-12-23 09:34:42,807 INFO [train.py:886] (1/4) Epoch 34, batch 3000, loss[loss=0.01111, audio_tagging_loss=0.01111, over 25000.00 frames. ], tot_loss[loss=0.012, audio_tagging_loss=0.012, over 4950408.22 frames. ], batch size: 100, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:34:42,808 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 09:35:04,055 INFO [train.py:917] (1/4) Epoch 34, validation: loss=0.03414, audio_tagging_loss=0.03414, over 3737520.00 frames. 2023-12-23 09:35:04,056 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 09:35:12,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1068520.0, ans=0.1 2023-12-23 09:35:19,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1068586.6666666667, ans=0.2 2023-12-23 09:35:23,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1068653.3333333333, ans=0.5 2023-12-23 09:35:24,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1068653.3333333333, ans=0.125 2023-12-23 09:35:41,948 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.59 vs. limit=15.0 2023-12-23 09:35:47,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1068786.6666666667, ans=0.0 2023-12-23 09:35:52,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1068786.6666666667, ans=0.0 2023-12-23 09:35:54,418 INFO [train.py:886] (1/4) Epoch 34, batch 3050, loss[loss=0.01072, audio_tagging_loss=0.01072, over 25000.00 frames. ], tot_loss[loss=0.01197, audio_tagging_loss=0.01197, over 4951940.71 frames. ], batch size: 100, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:36:00,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1068853.3333333333, ans=0.0 2023-12-23 09:36:36,790 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.053e+01 3.409e+01 3.553e+01 3.721e+01 5.104e+01, threshold=7.106e+01, percent-clipped=0.0 2023-12-23 09:36:37,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1069120.0, ans=0.0 2023-12-23 09:36:45,910 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:36:46,701 INFO [train.py:886] (1/4) Epoch 34, batch 3100, loss[loss=0.01331, audio_tagging_loss=0.01331, over 25000.00 frames. ], tot_loss[loss=0.01196, audio_tagging_loss=0.01196, over 4959633.78 frames. ], batch size: 100, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:37:05,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1069320.0, ans=0.125 2023-12-23 09:37:19,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1069386.6666666667, ans=0.09899494936611666 2023-12-23 09:37:21,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1069386.6666666667, ans=0.125 2023-12-23 09:37:23,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1069386.6666666667, ans=0.125 2023-12-23 09:37:31,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1069453.3333333333, ans=0.035 2023-12-23 09:37:31,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1069453.3333333333, ans=0.125 2023-12-23 09:37:31,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1069453.3333333333, ans=0.07 2023-12-23 09:37:35,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1069453.3333333333, ans=0.2 2023-12-23 09:37:37,804 INFO [train.py:886] (1/4) Epoch 34, batch 3150, loss[loss=0.01182, audio_tagging_loss=0.01182, over 24750.00 frames. ], tot_loss[loss=0.01209, audio_tagging_loss=0.01209, over 4946403.15 frames. ], batch size: 99, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:37:45,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1069520.0, ans=0.125 2023-12-23 09:37:53,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1069586.6666666667, ans=0.1 2023-12-23 09:37:54,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1069586.6666666667, ans=0.05 2023-12-23 09:38:17,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1069720.0, ans=0.125 2023-12-23 09:38:19,118 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.31 vs. limit=15.0 2023-12-23 09:38:21,537 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.098e+01 3.466e+01 3.603e+01 3.773e+01 5.785e+01, threshold=7.206e+01, percent-clipped=0.0 2023-12-23 09:38:30,593 INFO [train.py:886] (1/4) Epoch 34, batch 3200, loss[loss=0.01364, audio_tagging_loss=0.01364, over 25000.00 frames. ], tot_loss[loss=0.01215, audio_tagging_loss=0.01215, over 4941849.80 frames. ], batch size: 100, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:38:53,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1069986.6666666667, ans=0.125 2023-12-23 09:38:57,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1069986.6666666667, ans=0.0 2023-12-23 09:39:01,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1070053.3333333333, ans=0.125 2023-12-23 09:39:07,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1070053.3333333333, ans=0.2 2023-12-23 09:39:10,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1070120.0, ans=0.125 2023-12-23 09:39:21,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1070120.0, ans=0.2 2023-12-23 09:39:23,028 INFO [train.py:886] (1/4) Epoch 34, batch 3250, loss[loss=0.01116, audio_tagging_loss=0.01116, over 24750.00 frames. ], tot_loss[loss=0.01212, audio_tagging_loss=0.01212, over 4945102.20 frames. ], batch size: 99, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:39:25,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1070186.6666666667, ans=0.5 2023-12-23 09:39:41,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1070320.0, ans=0.07 2023-12-23 09:39:44,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1070320.0, ans=0.125 2023-12-23 09:39:47,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1070320.0, ans=0.125 2023-12-23 09:39:50,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1070320.0, ans=0.125 2023-12-23 09:40:05,134 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.908e+01 3.341e+01 3.558e+01 3.738e+01 4.204e+01, threshold=7.116e+01, percent-clipped=0.0 2023-12-23 09:40:13,618 INFO [train.py:886] (1/4) Epoch 34, batch 3300, loss[loss=0.01387, audio_tagging_loss=0.01387, over 24750.00 frames. ], tot_loss[loss=0.01209, audio_tagging_loss=0.01209, over 4945040.30 frames. ], batch size: 99, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:40:25,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1070586.6666666667, ans=0.125 2023-12-23 09:40:27,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1070586.6666666667, ans=0.125 2023-12-23 09:40:36,249 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.81 vs. limit=8.0 2023-12-23 09:40:49,731 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.47 vs. limit=15.0 2023-12-23 09:40:52,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1070720.0, ans=0.2 2023-12-23 09:41:03,459 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.56 vs. limit=6.0 2023-12-23 09:41:05,513 INFO [train.py:886] (1/4) Epoch 34, batch 3350, loss[loss=0.01122, audio_tagging_loss=0.01122, over 25000.00 frames. ], tot_loss[loss=0.01209, audio_tagging_loss=0.01209, over 4954868.79 frames. ], batch size: 100, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:41:13,493 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.06 vs. limit=15.0 2023-12-23 09:41:14,431 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.38 vs. limit=6.0 2023-12-23 09:41:23,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1070986.6666666667, ans=0.0 2023-12-23 09:41:47,082 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.108e+01 3.417e+01 3.570e+01 3.745e+01 4.410e+01, threshold=7.140e+01, percent-clipped=0.0 2023-12-23 09:41:55,625 INFO [train.py:886] (1/4) Epoch 34, batch 3400, loss[loss=0.01179, audio_tagging_loss=0.01179, over 25000.00 frames. ], tot_loss[loss=0.01208, audio_tagging_loss=0.01208, over 4958436.22 frames. ], batch size: 100, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:41:56,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1071186.6666666667, ans=0.05 2023-12-23 09:42:01,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1071186.6666666667, ans=0.1 2023-12-23 09:42:03,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=1071186.6666666667, ans=10.0 2023-12-23 09:42:17,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1071320.0, ans=0.05 2023-12-23 09:42:21,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1071320.0, ans=0.2 2023-12-23 09:42:28,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1071386.6666666667, ans=0.1 2023-12-23 09:42:35,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1071386.6666666667, ans=0.2 2023-12-23 09:42:41,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1071453.3333333333, ans=0.2 2023-12-23 09:42:48,485 INFO [train.py:886] (1/4) Epoch 34, batch 3450, loss[loss=0.01213, audio_tagging_loss=0.01213, over 24750.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4958370.40 frames. ], batch size: 99, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:42:48,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1071520.0, ans=0.125 2023-12-23 09:42:59,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1071586.6666666667, ans=0.125 2023-12-23 09:42:59,349 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.07 vs. limit=15.0 2023-12-23 09:43:00,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1071586.6666666667, ans=0.0 2023-12-23 09:43:17,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1071653.3333333333, ans=0.2 2023-12-23 09:43:17,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1071653.3333333333, ans=0.125 2023-12-23 09:43:20,415 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.21 vs. limit=10.0 2023-12-23 09:43:29,990 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.229e+01 3.457e+01 3.606e+01 3.781e+01 4.224e+01, threshold=7.212e+01, percent-clipped=0.0 2023-12-23 09:43:40,610 INFO [train.py:886] (1/4) Epoch 34, batch 3500, loss[loss=0.01051, audio_tagging_loss=0.01051, over 24750.00 frames. ], tot_loss[loss=0.01221, audio_tagging_loss=0.01221, over 4952504.69 frames. ], batch size: 99, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:43:50,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1071920.0, ans=0.0 2023-12-23 09:43:51,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1071920.0, ans=0.1 2023-12-23 09:44:01,081 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.10 vs. limit=12.0 2023-12-23 09:44:10,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1072053.3333333333, ans=0.0 2023-12-23 09:44:19,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1072053.3333333333, ans=0.125 2023-12-23 09:44:28,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1072120.0, ans=0.125 2023-12-23 09:44:31,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=1072186.6666666667, ans=0.1 2023-12-23 09:44:31,681 INFO [train.py:886] (1/4) Epoch 34, batch 3550, loss[loss=0.01235, audio_tagging_loss=0.01235, over 25000.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 4952801.02 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:44:40,311 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.33 vs. limit=22.5 2023-12-23 09:44:48,134 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2023-12-23 09:44:52,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1072320.0, ans=0.125 2023-12-23 09:45:04,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1072386.6666666667, ans=0.125 2023-12-23 09:45:14,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1072453.3333333333, ans=0.5 2023-12-23 09:45:14,667 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.056e+01 3.412e+01 3.528e+01 3.703e+01 4.575e+01, threshold=7.057e+01, percent-clipped=0.0 2023-12-23 09:45:19,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1072453.3333333333, ans=0.125 2023-12-23 09:45:24,647 INFO [train.py:886] (1/4) Epoch 34, batch 3600, loss[loss=0.01105, audio_tagging_loss=0.01105, over 24750.00 frames. ], tot_loss[loss=0.01211, audio_tagging_loss=0.01211, over 4951418.94 frames. ], batch size: 99, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:45:32,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1072520.0, ans=0.125 2023-12-23 09:45:36,556 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.05 vs. limit=15.0 2023-12-23 09:46:00,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1072720.0, ans=0.0 2023-12-23 09:46:14,856 INFO [train.py:886] (1/4) Epoch 34, batch 3650, loss[loss=0.0112, audio_tagging_loss=0.0112, over 24041.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4954227.40 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:46:21,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1072853.3333333333, ans=0.125 2023-12-23 09:46:48,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1073053.3333333333, ans=0.125 2023-12-23 09:46:57,543 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.017e+01 3.373e+01 3.534e+01 3.700e+01 5.262e+01, threshold=7.068e+01, percent-clipped=0.0 2023-12-23 09:47:06,753 INFO [train.py:886] (1/4) Epoch 34, batch 3700, loss[loss=0.01217, audio_tagging_loss=0.01217, over 25000.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 4958575.26 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:47:31,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1073320.0, ans=0.125 2023-12-23 09:47:32,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1073320.0, ans=0.1 2023-12-23 09:47:45,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1073386.6666666667, ans=0.05 2023-12-23 09:47:54,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1073453.3333333333, ans=0.125 2023-12-23 09:47:58,695 INFO [train.py:886] (1/4) Epoch 34, batch 3750, loss[loss=0.01118, audio_tagging_loss=0.01118, over 25000.00 frames. ], tot_loss[loss=0.01216, audio_tagging_loss=0.01216, over 4957980.87 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:48:11,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1073586.6666666667, ans=0.125 2023-12-23 09:48:25,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1073653.3333333333, ans=0.125 2023-12-23 09:48:29,469 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2023-12-23 09:48:32,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1073720.0, ans=0.0 2023-12-23 09:48:35,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1073720.0, ans=0.125 2023-12-23 09:48:42,085 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.136e+01 3.444e+01 3.639e+01 3.754e+01 4.905e+01, threshold=7.278e+01, percent-clipped=0.0 2023-12-23 09:48:49,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1073786.6666666667, ans=0.0 2023-12-23 09:48:50,769 INFO [train.py:886] (1/4) Epoch 34, batch 3800, loss[loss=0.01341, audio_tagging_loss=0.01341, over 24750.00 frames. ], tot_loss[loss=0.01222, audio_tagging_loss=0.01222, over 4949924.43 frames. ], batch size: 99, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:49:02,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1073920.0, ans=0.0 2023-12-23 09:49:10,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1073920.0, ans=0.025 2023-12-23 09:49:42,834 INFO [train.py:886] (1/4) Epoch 34, batch 3850, loss[loss=0.01284, audio_tagging_loss=0.01284, over 25000.00 frames. ], tot_loss[loss=0.01226, audio_tagging_loss=0.01226, over 4943223.56 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:49:43,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1074186.6666666667, ans=0.125 2023-12-23 09:49:48,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1074186.6666666667, ans=0.1 2023-12-23 09:49:49,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1074186.6666666667, ans=10.0 2023-12-23 09:50:02,951 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.45 vs. limit=6.0 2023-12-23 09:50:03,551 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:50:23,464 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.088e+01 3.418e+01 3.601e+01 3.736e+01 4.189e+01, threshold=7.201e+01, percent-clipped=0.0 2023-12-23 09:50:32,670 INFO [train.py:886] (1/4) Epoch 34, batch 3900, loss[loss=0.01489, audio_tagging_loss=0.01489, over 24750.00 frames. ], tot_loss[loss=0.01218, audio_tagging_loss=0.01218, over 4935445.59 frames. ], batch size: 99, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:50:51,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1074586.6666666667, ans=0.0 2023-12-23 09:50:54,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1074653.3333333333, ans=0.125 2023-12-23 09:51:05,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1074720.0, ans=0.125 2023-12-23 09:51:06,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.81 vs. limit=15.0 2023-12-23 09:51:24,469 INFO [train.py:886] (1/4) Epoch 34, batch 3950, loss[loss=0.01191, audio_tagging_loss=0.01191, over 25000.00 frames. ], tot_loss[loss=0.01216, audio_tagging_loss=0.01216, over 4944286.95 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:51:40,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1074920.0, ans=0.125 2023-12-23 09:51:44,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1074986.6666666667, ans=0.5 2023-12-23 09:51:51,715 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.98 vs. limit=15.0 2023-12-23 09:52:07,665 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.046e+01 3.354e+01 3.526e+01 3.698e+01 4.132e+01, threshold=7.051e+01, percent-clipped=0.0 2023-12-23 09:52:16,943 INFO [train.py:886] (1/4) Epoch 34, batch 4000, loss[loss=0.01245, audio_tagging_loss=0.01245, over 25000.00 frames. ], tot_loss[loss=0.01205, audio_tagging_loss=0.01205, over 4942327.71 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 128.0 2023-12-23 09:52:24,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1075186.6666666667, ans=0.125 2023-12-23 09:52:31,607 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.34 vs. limit=6.0 2023-12-23 09:53:00,159 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.27 vs. limit=15.0 2023-12-23 09:53:01,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1075453.3333333333, ans=0.125 2023-12-23 09:53:08,053 INFO [train.py:886] (1/4) Epoch 34, batch 4050, loss[loss=0.01484, audio_tagging_loss=0.01484, over 24954.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 4947793.95 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:53:08,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1075520.0, ans=0.125 2023-12-23 09:53:22,560 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.48 vs. limit=10.0 2023-12-23 09:53:26,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1075586.6666666667, ans=0.125 2023-12-23 09:53:51,621 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.048e+01 3.432e+01 3.586e+01 3.724e+01 5.445e+01, threshold=7.172e+01, percent-clipped=0.0 2023-12-23 09:53:59,194 INFO [train.py:886] (1/4) Epoch 34, batch 4100, loss[loss=0.01191, audio_tagging_loss=0.01191, over 24053.00 frames. ], tot_loss[loss=0.0122, audio_tagging_loss=0.0122, over 4942844.22 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:54:12,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1075920.0, ans=0.1 2023-12-23 09:54:34,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1076053.3333333333, ans=0.1 2023-12-23 09:54:52,434 INFO [train.py:886] (1/4) Epoch 34, batch 4150, loss[loss=0.008585, audio_tagging_loss=0.008585, over 24041.00 frames. ], tot_loss[loss=0.01216, audio_tagging_loss=0.01216, over 4946045.95 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:55:18,534 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=15.0 2023-12-23 09:55:27,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1076386.6666666667, ans=0.125 2023-12-23 09:55:34,922 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.051e+01 3.411e+01 3.544e+01 3.753e+01 4.282e+01, threshold=7.087e+01, percent-clipped=0.0 2023-12-23 09:55:42,524 INFO [train.py:886] (1/4) Epoch 34, batch 4200, loss[loss=0.0121, audio_tagging_loss=0.0121, over 25000.00 frames. ], tot_loss[loss=0.01212, audio_tagging_loss=0.01212, over 4949879.95 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:55:49,213 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.41 vs. limit=12.0 2023-12-23 09:56:12,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1076653.3333333333, ans=0.125 2023-12-23 09:56:13,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.89 vs. limit=8.0 2023-12-23 09:56:14,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1076720.0, ans=0.0 2023-12-23 09:56:19,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1076720.0, ans=0.2 2023-12-23 09:56:22,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1076720.0, ans=0.0 2023-12-23 09:56:29,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1076786.6666666667, ans=0.2 2023-12-23 09:56:35,488 INFO [train.py:886] (1/4) Epoch 34, batch 4250, loss[loss=0.01404, audio_tagging_loss=0.01404, over 25000.00 frames. ], tot_loss[loss=0.01208, audio_tagging_loss=0.01208, over 4950294.80 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 32.0 2023-12-23 09:56:36,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2023-12-23 09:56:36,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1076853.3333333333, ans=0.0 2023-12-23 09:56:47,260 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.33 vs. limit=15.0 2023-12-23 09:56:48,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1076920.0, ans=0.125 2023-12-23 09:56:48,406 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.34 vs. limit=22.5 2023-12-23 09:57:18,334 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.979e+01 3.377e+01 3.552e+01 3.782e+01 4.205e+01, threshold=7.103e+01, percent-clipped=0.0 2023-12-23 09:57:26,362 INFO [train.py:886] (1/4) Epoch 34, batch 4300, loss[loss=0.01134, audio_tagging_loss=0.01134, over 25000.00 frames. ], tot_loss[loss=0.01206, audio_tagging_loss=0.01206, over 4954683.50 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 32.0 2023-12-23 09:57:30,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1077186.6666666667, ans=0.09899494936611666 2023-12-23 09:57:36,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1077253.3333333333, ans=0.125 2023-12-23 09:57:42,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1077253.3333333333, ans=0.125 2023-12-23 09:57:47,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1077320.0, ans=0.125 2023-12-23 09:57:50,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1077320.0, ans=0.2 2023-12-23 09:58:07,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1077453.3333333333, ans=0.09899494936611666 2023-12-23 09:58:17,344 INFO [train.py:886] (1/4) Epoch 34, batch 4350, loss[loss=0.01192, audio_tagging_loss=0.01192, over 24750.00 frames. ], tot_loss[loss=0.01215, audio_tagging_loss=0.01215, over 4958689.93 frames. ], batch size: 99, lr: 3.15e-03, grad_scale: 32.0 2023-12-23 09:58:27,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1077586.6666666667, ans=0.2 2023-12-23 09:58:30,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1077586.6666666667, ans=0.125 2023-12-23 09:58:54,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1077720.0, ans=0.07 2023-12-23 09:58:55,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1077720.0, ans=0.125 2023-12-23 09:59:00,728 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.247e+01 3.499e+01 3.632e+01 3.842e+01 4.825e+01, threshold=7.264e+01, percent-clipped=0.0 2023-12-23 09:59:09,495 INFO [train.py:886] (1/4) Epoch 34, batch 4400, loss[loss=0.01031, audio_tagging_loss=0.01031, over 24750.00 frames. ], tot_loss[loss=0.01226, audio_tagging_loss=0.01226, over 4958816.07 frames. ], batch size: 99, lr: 3.15e-03, grad_scale: 32.0 2023-12-23 09:59:14,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1077853.3333333333, ans=0.2 2023-12-23 09:59:18,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1077920.0, ans=0.0 2023-12-23 09:59:24,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1077920.0, ans=0.125 2023-12-23 09:59:25,801 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.76 vs. limit=6.0 2023-12-23 09:59:37,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1077986.6666666667, ans=0.0 2023-12-23 09:59:39,940 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.58 vs. limit=15.0 2023-12-23 09:59:40,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1078053.3333333333, ans=0.0 2023-12-23 09:59:40,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1078053.3333333333, ans=0.125 2023-12-23 09:59:41,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1078053.3333333333, ans=0.09899494936611666 2023-12-23 09:59:59,439 INFO [train.py:886] (1/4) Epoch 34, batch 4450, loss[loss=0.01216, audio_tagging_loss=0.01216, over 25000.00 frames. ], tot_loss[loss=0.01222, audio_tagging_loss=0.01222, over 4955976.57 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 32.0 2023-12-23 10:00:13,543 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.87 vs. limit=15.0 2023-12-23 10:00:15,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1078253.3333333333, ans=0.0 2023-12-23 10:00:16,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1078253.3333333333, ans=0.125 2023-12-23 10:00:23,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1078320.0, ans=0.0 2023-12-23 10:00:45,202 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.115e+01 3.445e+01 3.585e+01 3.809e+01 4.204e+01, threshold=7.171e+01, percent-clipped=0.0 2023-12-23 10:00:45,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1078453.3333333333, ans=0.125 2023-12-23 10:00:51,844 INFO [train.py:886] (1/4) Epoch 34, batch 4500, loss[loss=0.01214, audio_tagging_loss=0.01214, over 24750.00 frames. ], tot_loss[loss=0.01218, audio_tagging_loss=0.01218, over 4951152.98 frames. ], batch size: 99, lr: 3.14e-03, grad_scale: 32.0 2023-12-23 10:00:57,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1078520.0, ans=0.07 2023-12-23 10:01:03,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=1078586.6666666667, ans=0.02 2023-12-23 10:01:09,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1078586.6666666667, ans=0.125 2023-12-23 10:01:16,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1078653.3333333333, ans=0.125 2023-12-23 10:01:27,249 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.32 vs. limit=12.0 2023-12-23 10:01:34,358 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.10 vs. limit=12.0 2023-12-23 10:01:38,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1078786.6666666667, ans=0.0 2023-12-23 10:01:39,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1078786.6666666667, ans=0.125 2023-12-23 10:01:41,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1078786.6666666667, ans=0.125 2023-12-23 10:01:43,528 INFO [train.py:886] (1/4) Epoch 34, batch 4550, loss[loss=0.01466, audio_tagging_loss=0.01466, over 25000.00 frames. ], tot_loss[loss=0.01218, audio_tagging_loss=0.01218, over 4956347.55 frames. ], batch size: 100, lr: 3.14e-03, grad_scale: 32.0 2023-12-23 10:01:48,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.64 vs. limit=15.0 2023-12-23 10:02:26,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1079120.0, ans=0.125 2023-12-23 10:02:27,922 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 10:02:28,641 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.071e+01 3.416e+01 3.558e+01 3.707e+01 4.537e+01, threshold=7.116e+01, percent-clipped=0.0 2023-12-23 10:02:32,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1079120.0, ans=0.0 2023-12-23 10:02:35,210 INFO [train.py:886] (1/4) Epoch 34, batch 4600, loss[loss=0.01158, audio_tagging_loss=0.01158, over 25000.00 frames. ], tot_loss[loss=0.01216, audio_tagging_loss=0.01216, over 4955647.07 frames. ], batch size: 100, lr: 3.14e-03, grad_scale: 32.0 2023-12-23 10:02:37,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1079186.6666666667, ans=0.0 2023-12-23 10:02:41,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1079186.6666666667, ans=0.1 2023-12-23 10:02:41,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1079186.6666666667, ans=0.125 2023-12-23 10:02:52,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1079253.3333333333, ans=0.2 2023-12-23 10:03:04,888 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.33 vs. limit=6.0 2023-12-23 10:03:11,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1079386.6666666667, ans=0.1 2023-12-23 10:03:13,641 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.65 vs. limit=15.0 2023-12-23 10:03:19,804 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.64 vs. limit=6.0 2023-12-23 10:03:27,343 INFO [train.py:886] (1/4) Epoch 34, batch 4650, loss[loss=0.01344, audio_tagging_loss=0.01344, over 25000.00 frames. ], tot_loss[loss=0.01215, audio_tagging_loss=0.01215, over 4953781.17 frames. ], batch size: 100, lr: 3.14e-03, grad_scale: 32.0 2023-12-23 10:03:33,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1079520.0, ans=0.0 2023-12-23 10:03:37,921 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.71 vs. limit=15.0 2023-12-23 10:03:39,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1079586.6666666667, ans=0.1 2023-12-23 10:03:48,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1079653.3333333333, ans=0.125 2023-12-23 10:04:02,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1079720.0, ans=0.125 2023-12-23 10:04:04,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1079720.0, ans=0.0 2023-12-23 10:04:11,224 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.081e+01 3.447e+01 3.567e+01 3.799e+01 4.284e+01, threshold=7.135e+01, percent-clipped=0.0 2023-12-23 10:04:13,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1079786.6666666667, ans=0.125 2023-12-23 10:04:14,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1079786.6666666667, ans=0.1 2023-12-23 10:04:17,168 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.36 vs. limit=15.0 2023-12-23 10:04:17,754 INFO [train.py:886] (1/4) Epoch 34, batch 4700, loss[loss=0.01316, audio_tagging_loss=0.01316, over 24750.00 frames. ], tot_loss[loss=0.01221, audio_tagging_loss=0.01221, over 4948551.56 frames. ], batch size: 99, lr: 3.14e-03, grad_scale: 32.0 2023-12-23 10:04:22,802 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.81 vs. limit=15.0 2023-12-23 10:04:35,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=1079986.6666666667, ans=0.025 2023-12-23 10:04:42,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1079986.6666666667, ans=0.125 2023-12-23 10:04:44,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1079986.6666666667, ans=0.125 2023-12-23 10:04:49,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1080053.3333333333, ans=0.125 2023-12-23 10:04:53,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1080053.3333333333, ans=0.125 2023-12-23 10:04:57,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1080120.0, ans=0.125 2023-12-23 10:04:57,493 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.16 vs. limit=12.0 2023-12-23 10:05:00,357 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.81 vs. limit=15.0 2023-12-23 10:05:05,723 INFO [train.py:886] (1/4) Epoch 34, batch 4750, loss[loss=0.01327, audio_tagging_loss=0.01327, over 25000.00 frames. ], tot_loss[loss=0.0123, audio_tagging_loss=0.0123, over 4945162.75 frames. ], batch size: 100, lr: 3.14e-03, grad_scale: 32.0 2023-12-23 10:05:40,662 INFO [train.py:886] (1/4) Epoch 35, batch 0, loss[loss=0.02425, audio_tagging_loss=0.02425, over 24036.00 frames. ], tot_loss[loss=0.02425, audio_tagging_loss=0.02425, over 24036.00 frames. ], batch size: 100, lr: 3.10e-03, grad_scale: 32.0 2023-12-23 10:05:40,663 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 10:06:00,371 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3248, 3.5018, 4.2351, 3.8533], device='cuda:1') 2023-12-23 10:06:02,109 INFO [train.py:917] (1/4) Epoch 35, validation: loss=0.03353, audio_tagging_loss=0.03353, over 3737520.00 frames. 2023-12-23 10:06:02,110 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 10:06:08,288 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.55 vs. limit=10.0 2023-12-23 10:06:26,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1080426.6666666667, ans=0.0 2023-12-23 10:06:29,558 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.161e+01 3.510e+01 3.765e+01 4.838e+01 9.519e+01, threshold=7.530e+01, percent-clipped=6.0 2023-12-23 10:06:52,673 INFO [train.py:886] (1/4) Epoch 35, batch 50, loss[loss=0.01368, audio_tagging_loss=0.01368, over 25000.00 frames. ], tot_loss[loss=0.01903, audio_tagging_loss=0.01903, over 1119732.05 frames. ], batch size: 100, lr: 3.10e-03, grad_scale: 32.0 2023-12-23 10:06:57,139 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.25 vs. limit=22.5 2023-12-23 10:07:05,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1080693.3333333333, ans=0.125 2023-12-23 10:07:10,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1080693.3333333333, ans=0.125 2023-12-23 10:07:24,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1080826.6666666667, ans=0.125 2023-12-23 10:07:44,762 INFO [train.py:886] (1/4) Epoch 35, batch 100, loss[loss=0.01387, audio_tagging_loss=0.01387, over 25000.00 frames. ], tot_loss[loss=0.01666, audio_tagging_loss=0.01666, over 1974194.56 frames. ], batch size: 100, lr: 3.10e-03, grad_scale: 32.0 2023-12-23 10:07:54,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=1081026.6666666667, ans=0.05 2023-12-23 10:07:55,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1081026.6666666667, ans=0.0 2023-12-23 10:08:12,433 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.410e+01 3.823e+01 4.080e+01 4.340e+01 5.302e+01, threshold=8.159e+01, percent-clipped=0.0 2023-12-23 10:08:36,360 INFO [train.py:886] (1/4) Epoch 35, batch 150, loss[loss=0.01239, audio_tagging_loss=0.01239, over 25000.00 frames. ], tot_loss[loss=0.01517, audio_tagging_loss=0.01517, over 2641167.68 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:08:44,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1081293.3333333333, ans=0.125 2023-12-23 10:08:52,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1081360.0, ans=0.07 2023-12-23 10:08:52,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1081360.0, ans=0.125 2023-12-23 10:08:53,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1081360.0, ans=0.0 2023-12-23 10:09:01,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1081426.6666666667, ans=0.95 2023-12-23 10:09:02,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1081426.6666666667, ans=0.125 2023-12-23 10:09:13,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1081493.3333333333, ans=0.125 2023-12-23 10:09:28,092 INFO [train.py:886] (1/4) Epoch 35, batch 200, loss[loss=0.01344, audio_tagging_loss=0.01344, over 24073.00 frames. ], tot_loss[loss=0.01442, audio_tagging_loss=0.01442, over 3156322.79 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:09:36,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1081626.6666666667, ans=0.1 2023-12-23 10:09:43,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1081693.3333333333, ans=0.125 2023-12-23 10:09:43,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1081693.3333333333, ans=0.125 2023-12-23 10:09:55,677 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.255e+01 3.529e+01 3.676e+01 3.871e+01 4.435e+01, threshold=7.352e+01, percent-clipped=0.0 2023-12-23 10:09:58,886 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=15.0 2023-12-23 10:10:18,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1081893.3333333333, ans=0.1 2023-12-23 10:10:20,455 INFO [train.py:886] (1/4) Epoch 35, batch 250, loss[loss=0.01356, audio_tagging_loss=0.01356, over 25000.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 3558538.68 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:10:41,464 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 10:10:45,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1082093.3333333333, ans=0.2 2023-12-23 10:10:45,656 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.67 vs. limit=15.0 2023-12-23 10:10:51,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1082160.0, ans=0.1 2023-12-23 10:10:55,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1082160.0, ans=0.125 2023-12-23 10:11:11,753 INFO [train.py:886] (1/4) Epoch 35, batch 300, loss[loss=0.01355, audio_tagging_loss=0.01355, over 24750.00 frames. ], tot_loss[loss=0.01355, audio_tagging_loss=0.01355, over 3865394.00 frames. ], batch size: 99, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:11:22,231 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.17 vs. limit=6.0 2023-12-23 10:11:40,257 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.154e+01 3.473e+01 3.613e+01 3.760e+01 4.806e+01, threshold=7.226e+01, percent-clipped=0.0 2023-12-23 10:11:52,518 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.92 vs. limit=15.0 2023-12-23 10:11:54,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1082560.0, ans=0.2 2023-12-23 10:11:56,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1082560.0, ans=0.0 2023-12-23 10:11:56,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1082560.0, ans=0.04949747468305833 2023-12-23 10:12:04,017 INFO [train.py:886] (1/4) Epoch 35, batch 350, loss[loss=0.01253, audio_tagging_loss=0.01253, over 24750.00 frames. ], tot_loss[loss=0.01328, audio_tagging_loss=0.01328, over 4104670.87 frames. ], batch size: 99, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:12:06,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1082626.6666666667, ans=0.125 2023-12-23 10:12:09,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1082626.6666666667, ans=0.0 2023-12-23 10:12:12,008 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.13 vs. limit=15.0 2023-12-23 10:12:18,304 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.85 vs. limit=15.0 2023-12-23 10:12:38,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1082826.6666666667, ans=0.0 2023-12-23 10:12:57,060 INFO [train.py:886] (1/4) Epoch 35, batch 400, loss[loss=0.01215, audio_tagging_loss=0.01215, over 24750.00 frames. ], tot_loss[loss=0.01283, audio_tagging_loss=0.01283, over 4294602.62 frames. ], batch size: 99, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:12:57,513 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.04 vs. limit=15.0 2023-12-23 10:12:59,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1082960.0, ans=0.0 2023-12-23 10:13:01,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1082960.0, ans=0.125 2023-12-23 10:13:04,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1082960.0, ans=0.0 2023-12-23 10:13:07,975 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.87 vs. limit=15.0 2023-12-23 10:13:09,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1083026.6666666667, ans=0.2 2023-12-23 10:13:12,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.73 vs. limit=22.5 2023-12-23 10:13:13,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1083026.6666666667, ans=0.0 2023-12-23 10:13:15,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1083026.6666666667, ans=0.125 2023-12-23 10:13:17,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1083093.3333333333, ans=0.015 2023-12-23 10:13:24,736 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.006e+01 3.393e+01 3.521e+01 3.659e+01 4.330e+01, threshold=7.042e+01, percent-clipped=0.0 2023-12-23 10:13:48,036 INFO [train.py:886] (1/4) Epoch 35, batch 450, loss[loss=0.009392, audio_tagging_loss=0.009392, over 25000.00 frames. ], tot_loss[loss=0.01259, audio_tagging_loss=0.01259, over 4441011.80 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:13:54,886 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.33 vs. limit=15.0 2023-12-23 10:14:00,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1083360.0, ans=0.0 2023-12-23 10:14:17,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1083426.6666666667, ans=0.1 2023-12-23 10:14:30,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1083560.0, ans=0.09899494936611666 2023-12-23 10:14:35,583 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.08 vs. limit=15.0 2023-12-23 10:14:39,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2023-12-23 10:14:40,525 INFO [train.py:886] (1/4) Epoch 35, batch 500, loss[loss=0.01209, audio_tagging_loss=0.01209, over 25000.00 frames. ], tot_loss[loss=0.01246, audio_tagging_loss=0.01246, over 4553760.53 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:14:56,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1083693.3333333333, ans=0.0 2023-12-23 10:15:02,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1083760.0, ans=0.125 2023-12-23 10:15:09,076 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.163e+01 3.425e+01 3.572e+01 3.739e+01 4.112e+01, threshold=7.144e+01, percent-clipped=0.0 2023-12-23 10:15:17,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1083826.6666666667, ans=0.2 2023-12-23 10:15:23,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1083893.3333333333, ans=0.0 2023-12-23 10:15:29,682 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.15 vs. limit=15.0 2023-12-23 10:15:32,478 INFO [train.py:886] (1/4) Epoch 35, batch 550, loss[loss=0.01266, audio_tagging_loss=0.01266, over 25000.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4638656.59 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:15:39,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1083960.0, ans=0.1 2023-12-23 10:15:42,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1084026.6666666667, ans=0.0 2023-12-23 10:15:44,026 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.35 vs. limit=22.5 2023-12-23 10:15:49,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1084026.6666666667, ans=0.125 2023-12-23 10:15:53,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1084093.3333333333, ans=0.0 2023-12-23 10:16:02,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1084160.0, ans=0.1 2023-12-23 10:16:17,401 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.38 vs. limit=15.0 2023-12-23 10:16:20,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1084226.6666666667, ans=0.0 2023-12-23 10:16:24,366 INFO [train.py:886] (1/4) Epoch 35, batch 600, loss[loss=0.01398, audio_tagging_loss=0.01398, over 25000.00 frames. ], tot_loss[loss=0.01242, audio_tagging_loss=0.01242, over 4708116.03 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:16:44,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1084360.0, ans=0.1 2023-12-23 10:16:44,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.82 vs. limit=15.0 2023-12-23 10:16:52,610 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.983e+01 3.476e+01 3.624e+01 3.793e+01 4.486e+01, threshold=7.249e+01, percent-clipped=0.0 2023-12-23 10:16:58,736 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.13 vs. limit=15.0 2023-12-23 10:17:08,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1084560.0, ans=0.0 2023-12-23 10:17:16,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1084626.6666666667, ans=0.125 2023-12-23 10:17:16,800 INFO [train.py:886] (1/4) Epoch 35, batch 650, loss[loss=0.01044, audio_tagging_loss=0.01044, over 25000.00 frames. ], tot_loss[loss=0.01246, audio_tagging_loss=0.01246, over 4760464.67 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:17:29,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1084693.3333333333, ans=0.025 2023-12-23 10:17:36,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1084760.0, ans=0.0 2023-12-23 10:17:42,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1084760.0, ans=0.125 2023-12-23 10:17:44,203 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.40 vs. limit=15.0 2023-12-23 10:17:50,381 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.26 vs. limit=22.5 2023-12-23 10:17:56,926 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.48 vs. limit=22.5 2023-12-23 10:18:02,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1084893.3333333333, ans=0.0 2023-12-23 10:18:06,872 INFO [train.py:886] (1/4) Epoch 35, batch 700, loss[loss=0.01141, audio_tagging_loss=0.01141, over 25000.00 frames. ], tot_loss[loss=0.01244, audio_tagging_loss=0.01244, over 4792935.46 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:18:10,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1084960.0, ans=0.5 2023-12-23 10:18:13,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1084960.0, ans=0.125 2023-12-23 10:18:14,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1084960.0, ans=0.125 2023-12-23 10:18:18,724 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.21 vs. limit=10.0 2023-12-23 10:18:35,078 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.034e+01 3.416e+01 3.588e+01 3.767e+01 4.947e+01, threshold=7.176e+01, percent-clipped=0.0 2023-12-23 10:18:38,386 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.37 vs. limit=12.0 2023-12-23 10:18:51,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1085226.6666666667, ans=0.0 2023-12-23 10:18:52,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1085226.6666666667, ans=0.125 2023-12-23 10:18:54,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1085226.6666666667, ans=0.2 2023-12-23 10:18:54,361 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=15.0 2023-12-23 10:18:59,687 INFO [train.py:886] (1/4) Epoch 35, batch 750, loss[loss=0.0139, audio_tagging_loss=0.0139, over 25000.00 frames. ], tot_loss[loss=0.01236, audio_tagging_loss=0.01236, over 4827871.07 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:19:00,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1085293.3333333333, ans=0.0 2023-12-23 10:19:03,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1085293.3333333333, ans=0.0 2023-12-23 10:19:31,292 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.07 vs. limit=10.0 2023-12-23 10:19:39,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1085560.0, ans=0.0 2023-12-23 10:19:51,857 INFO [train.py:886] (1/4) Epoch 35, batch 800, loss[loss=0.01032, audio_tagging_loss=0.01032, over 24750.00 frames. ], tot_loss[loss=0.0122, audio_tagging_loss=0.0122, over 4856562.31 frames. ], batch size: 99, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:19:52,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1085626.6666666667, ans=0.0 2023-12-23 10:20:05,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1085693.3333333333, ans=0.125 2023-12-23 10:20:13,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1085760.0, ans=0.0 2023-12-23 10:20:18,777 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.132e+01 3.460e+01 3.638e+01 3.746e+01 4.332e+01, threshold=7.276e+01, percent-clipped=0.0 2023-12-23 10:20:42,801 INFO [train.py:886] (1/4) Epoch 35, batch 850, loss[loss=0.01273, audio_tagging_loss=0.01273, over 25000.00 frames. ], tot_loss[loss=0.01214, audio_tagging_loss=0.01214, over 4880382.99 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:20:47,086 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.83 vs. limit=15.0 2023-12-23 10:20:56,935 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.82 vs. limit=10.0 2023-12-23 10:21:09,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1086093.3333333333, ans=0.1 2023-12-23 10:21:14,225 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.34 vs. limit=15.0 2023-12-23 10:21:29,027 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.53 vs. limit=15.0 2023-12-23 10:21:35,518 INFO [train.py:886] (1/4) Epoch 35, batch 900, loss[loss=0.008844, audio_tagging_loss=0.008844, over 25000.00 frames. ], tot_loss[loss=0.0121, audio_tagging_loss=0.0121, over 4901526.98 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:21:35,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1086293.3333333333, ans=0.0 2023-12-23 10:21:45,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1086360.0, ans=0.1 2023-12-23 10:21:46,291 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.16 vs. limit=22.5 2023-12-23 10:21:46,382 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.42 vs. limit=22.5 2023-12-23 10:21:56,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1086426.6666666667, ans=0.1 2023-12-23 10:22:03,189 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.210e+01 3.429e+01 3.563e+01 3.739e+01 4.144e+01, threshold=7.126e+01, percent-clipped=0.0 2023-12-23 10:22:09,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1086493.3333333333, ans=0.125 2023-12-23 10:22:19,519 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 10:22:25,580 INFO [train.py:886] (1/4) Epoch 35, batch 950, loss[loss=0.01303, audio_tagging_loss=0.01303, over 24935.00 frames. ], tot_loss[loss=0.0122, audio_tagging_loss=0.0122, over 4912310.66 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:22:34,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1086626.6666666667, ans=0.2 2023-12-23 10:22:54,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1086760.0, ans=0.0 2023-12-23 10:23:11,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1086893.3333333333, ans=0.1 2023-12-23 10:23:12,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1086893.3333333333, ans=0.125 2023-12-23 10:23:18,177 INFO [train.py:886] (1/4) Epoch 35, batch 1000, loss[loss=0.01328, audio_tagging_loss=0.01328, over 25000.00 frames. ], tot_loss[loss=0.01224, audio_tagging_loss=0.01224, over 4919922.23 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:23:31,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1087026.6666666667, ans=0.5 2023-12-23 10:23:46,653 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.127e+01 3.397e+01 3.527e+01 3.697e+01 4.160e+01, threshold=7.054e+01, percent-clipped=0.0 2023-12-23 10:23:46,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1087093.3333333333, ans=0.125 2023-12-23 10:23:51,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1087160.0, ans=0.0 2023-12-23 10:24:10,466 INFO [train.py:886] (1/4) Epoch 35, batch 1050, loss[loss=0.00966, audio_tagging_loss=0.00966, over 24750.00 frames. ], tot_loss[loss=0.01207, audio_tagging_loss=0.01207, over 4919906.89 frames. ], batch size: 99, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:24:14,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1087293.3333333333, ans=10.0 2023-12-23 10:24:17,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1087293.3333333333, ans=0.0 2023-12-23 10:24:24,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1087360.0, ans=0.1 2023-12-23 10:24:27,606 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.62 vs. limit=22.5 2023-12-23 10:24:50,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1087560.0, ans=0.0 2023-12-23 10:24:54,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1087560.0, ans=0.125 2023-12-23 10:25:00,975 INFO [train.py:886] (1/4) Epoch 35, batch 1100, loss[loss=0.01131, audio_tagging_loss=0.01131, over 25000.00 frames. ], tot_loss[loss=0.01205, audio_tagging_loss=0.01205, over 4931036.88 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:25:09,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1087626.6666666667, ans=0.125 2023-12-23 10:25:29,195 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.089e+01 3.393e+01 3.590e+01 3.785e+01 4.427e+01, threshold=7.180e+01, percent-clipped=0.0 2023-12-23 10:25:30,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1087760.0, ans=0.2 2023-12-23 10:25:53,743 INFO [train.py:886] (1/4) Epoch 35, batch 1150, loss[loss=0.01204, audio_tagging_loss=0.01204, over 25000.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4942143.59 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:25:55,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1087960.0, ans=0.125 2023-12-23 10:26:10,428 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.67 vs. limit=22.5 2023-12-23 10:26:24,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1088160.0, ans=0.0 2023-12-23 10:26:33,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1088160.0, ans=0.125 2023-12-23 10:26:34,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1088226.6666666667, ans=0.0 2023-12-23 10:26:35,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1088226.6666666667, ans=0.125 2023-12-23 10:26:40,098 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.89 vs. limit=22.5 2023-12-23 10:26:44,946 INFO [train.py:886] (1/4) Epoch 35, batch 1200, loss[loss=0.01276, audio_tagging_loss=0.01276, over 24750.00 frames. ], tot_loss[loss=0.01209, audio_tagging_loss=0.01209, over 4947364.30 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 32.0 2023-12-23 10:26:49,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1088293.3333333333, ans=0.0 2023-12-23 10:26:56,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1088360.0, ans=0.0 2023-12-23 10:27:04,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1088360.0, ans=0.04949747468305833 2023-12-23 10:27:12,516 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.047e+01 3.487e+01 3.620e+01 3.766e+01 4.374e+01, threshold=7.240e+01, percent-clipped=0.0 2023-12-23 10:27:31,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1088560.0, ans=0.2 2023-12-23 10:27:34,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1088560.0, ans=0.2 2023-12-23 10:27:36,894 INFO [train.py:886] (1/4) Epoch 35, batch 1250, loss[loss=0.0114, audio_tagging_loss=0.0114, over 24750.00 frames. ], tot_loss[loss=0.0122, audio_tagging_loss=0.0122, over 4942995.14 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 32.0 2023-12-23 10:27:48,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1088693.3333333333, ans=0.125 2023-12-23 10:28:17,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1088893.3333333333, ans=0.2 2023-12-23 10:28:29,145 INFO [train.py:886] (1/4) Epoch 35, batch 1300, loss[loss=0.01238, audio_tagging_loss=0.01238, over 25000.00 frames. ], tot_loss[loss=0.01227, audio_tagging_loss=0.01227, over 4936841.44 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 32.0 2023-12-23 10:28:49,908 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.42 vs. limit=22.5 2023-12-23 10:28:57,284 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.209e+01 3.442e+01 3.551e+01 3.705e+01 4.359e+01, threshold=7.103e+01, percent-clipped=0.0 2023-12-23 10:29:04,646 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.10 vs. limit=12.0 2023-12-23 10:29:19,569 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.22 vs. limit=12.0 2023-12-23 10:29:19,902 INFO [train.py:886] (1/4) Epoch 35, batch 1350, loss[loss=0.01184, audio_tagging_loss=0.01184, over 25000.00 frames. ], tot_loss[loss=0.0122, audio_tagging_loss=0.0122, over 4942314.32 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 32.0 2023-12-23 10:29:23,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1089293.3333333333, ans=0.0 2023-12-23 10:29:35,761 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.93 vs. limit=15.0 2023-12-23 10:29:50,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1089493.3333333333, ans=0.2 2023-12-23 10:29:56,420 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.98 vs. limit=15.0 2023-12-23 10:30:12,251 INFO [train.py:886] (1/4) Epoch 35, batch 1400, loss[loss=0.01178, audio_tagging_loss=0.01178, over 25000.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 4943493.43 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 32.0 2023-12-23 10:30:13,651 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.85 vs. limit=22.5 2023-12-23 10:30:22,872 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.15 vs. limit=15.0 2023-12-23 10:30:23,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1089693.3333333333, ans=0.1 2023-12-23 10:30:37,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1089760.0, ans=0.125 2023-12-23 10:30:37,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1089760.0, ans=0.1 2023-12-23 10:30:39,918 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.952e+01 3.418e+01 3.570e+01 3.781e+01 4.202e+01, threshold=7.140e+01, percent-clipped=0.0 2023-12-23 10:30:40,519 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.94 vs. limit=15.0 2023-12-23 10:30:44,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1089826.6666666667, ans=0.125 2023-12-23 10:31:00,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.09 vs. limit=15.0 2023-12-23 10:31:04,678 INFO [train.py:886] (1/4) Epoch 35, batch 1450, loss[loss=0.01325, audio_tagging_loss=0.01325, over 24750.00 frames. ], tot_loss[loss=0.01209, audio_tagging_loss=0.01209, over 4947715.96 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:31:16,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1090026.6666666667, ans=0.0 2023-12-23 10:31:32,121 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.09 vs. limit=12.0 2023-12-23 10:31:35,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1090160.0, ans=0.125 2023-12-23 10:31:36,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1090160.0, ans=0.0 2023-12-23 10:31:51,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1090226.6666666667, ans=0.0 2023-12-23 10:31:53,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1090293.3333333333, ans=0.5 2023-12-23 10:31:54,687 INFO [train.py:886] (1/4) Epoch 35, batch 1500, loss[loss=0.0124, audio_tagging_loss=0.0124, over 24750.00 frames. ], tot_loss[loss=0.0121, audio_tagging_loss=0.0121, over 4953727.63 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:31:55,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1090293.3333333333, ans=0.025 2023-12-23 10:32:17,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1090426.6666666667, ans=0.2 2023-12-23 10:32:22,402 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.067e+01 3.460e+01 3.584e+01 3.712e+01 4.259e+01, threshold=7.168e+01, percent-clipped=0.0 2023-12-23 10:32:26,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1090493.3333333333, ans=0.0 2023-12-23 10:32:30,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1090493.3333333333, ans=0.1 2023-12-23 10:32:46,368 INFO [train.py:886] (1/4) Epoch 35, batch 1550, loss[loss=0.01176, audio_tagging_loss=0.01176, over 24750.00 frames. ], tot_loss[loss=0.01215, audio_tagging_loss=0.01215, over 4947922.72 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:32:46,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1090626.6666666667, ans=0.2 2023-12-23 10:32:54,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1090626.6666666667, ans=0.125 2023-12-23 10:32:55,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1090693.3333333333, ans=0.0 2023-12-23 10:32:55,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1090693.3333333333, ans=0.125 2023-12-23 10:33:07,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1090760.0, ans=0.1 2023-12-23 10:33:20,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1090826.6666666667, ans=0.1 2023-12-23 10:33:21,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1090826.6666666667, ans=0.05 2023-12-23 10:33:21,783 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2023-12-23 10:33:28,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1090893.3333333333, ans=0.125 2023-12-23 10:33:29,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1090893.3333333333, ans=0.125 2023-12-23 10:33:31,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1090893.3333333333, ans=0.1 2023-12-23 10:33:34,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1090893.3333333333, ans=0.04949747468305833 2023-12-23 10:33:37,822 INFO [train.py:886] (1/4) Epoch 35, batch 1600, loss[loss=0.01153, audio_tagging_loss=0.01153, over 24750.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4939331.16 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:33:42,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1090960.0, ans=0.0 2023-12-23 10:33:45,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1090960.0, ans=0.0 2023-12-23 10:33:56,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1091093.3333333333, ans=0.125 2023-12-23 10:34:03,965 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.241e+01 3.514e+01 3.658e+01 3.797e+01 4.440e+01, threshold=7.316e+01, percent-clipped=0.0 2023-12-23 10:34:27,912 INFO [train.py:886] (1/4) Epoch 35, batch 1650, loss[loss=0.01196, audio_tagging_loss=0.01196, over 25000.00 frames. ], tot_loss[loss=0.01226, audio_tagging_loss=0.01226, over 4934674.57 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:34:38,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1091360.0, ans=0.125 2023-12-23 10:34:42,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1091360.0, ans=0.0 2023-12-23 10:34:55,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1091426.6666666667, ans=0.04949747468305833 2023-12-23 10:35:05,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1091493.3333333333, ans=0.2 2023-12-23 10:35:05,532 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.35 vs. limit=15.0 2023-12-23 10:35:19,581 INFO [train.py:886] (1/4) Epoch 35, batch 1700, loss[loss=0.01284, audio_tagging_loss=0.01284, over 24750.00 frames. ], tot_loss[loss=0.01211, audio_tagging_loss=0.01211, over 4933100.91 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:35:22,816 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.39 vs. limit=15.0 2023-12-23 10:35:27,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1091626.6666666667, ans=0.0 2023-12-23 10:35:34,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1091693.3333333333, ans=0.1 2023-12-23 10:35:46,552 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.088e+01 3.407e+01 3.580e+01 3.752e+01 4.487e+01, threshold=7.159e+01, percent-clipped=0.0 2023-12-23 10:35:52,366 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.37 vs. limit=15.0 2023-12-23 10:35:59,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1091893.3333333333, ans=0.0 2023-12-23 10:36:01,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=1091893.3333333333, ans=15.0 2023-12-23 10:36:02,917 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.23 vs. limit=15.0 2023-12-23 10:36:04,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1091893.3333333333, ans=0.2 2023-12-23 10:36:09,048 INFO [train.py:886] (1/4) Epoch 35, batch 1750, loss[loss=0.01212, audio_tagging_loss=0.01212, over 25000.00 frames. ], tot_loss[loss=0.01204, audio_tagging_loss=0.01204, over 4940554.63 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:36:28,332 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 10:36:33,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.34 vs. limit=15.0 2023-12-23 10:36:44,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1092160.0, ans=0.1 2023-12-23 10:36:56,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1092226.6666666667, ans=0.035 2023-12-23 10:37:00,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1092226.6666666667, ans=0.125 2023-12-23 10:37:01,891 INFO [train.py:886] (1/4) Epoch 35, batch 1800, loss[loss=0.0147, audio_tagging_loss=0.0147, over 25000.00 frames. ], tot_loss[loss=0.01208, audio_tagging_loss=0.01208, over 4939633.88 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:37:29,524 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.056e+01 3.488e+01 3.614e+01 3.772e+01 4.751e+01, threshold=7.228e+01, percent-clipped=0.0 2023-12-23 10:37:42,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1092560.0, ans=0.1 2023-12-23 10:37:51,346 INFO [train.py:886] (1/4) Epoch 35, batch 1850, loss[loss=0.01278, audio_tagging_loss=0.01278, over 24750.00 frames. ], tot_loss[loss=0.01208, audio_tagging_loss=0.01208, over 4940259.59 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:38:10,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1092693.3333333333, ans=0.125 2023-12-23 10:38:17,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1092760.0, ans=0.05 2023-12-23 10:38:21,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1092826.6666666667, ans=0.125 2023-12-23 10:38:28,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1092826.6666666667, ans=0.0 2023-12-23 10:38:34,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1092893.3333333333, ans=0.1 2023-12-23 10:38:42,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1092960.0, ans=0.1 2023-12-23 10:38:42,691 INFO [train.py:886] (1/4) Epoch 35, batch 1900, loss[loss=0.01301, audio_tagging_loss=0.01301, over 24750.00 frames. ], tot_loss[loss=0.01215, audio_tagging_loss=0.01215, over 4934223.62 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:38:46,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1092960.0, ans=0.125 2023-12-23 10:38:49,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1092960.0, ans=0.0 2023-12-23 10:39:02,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1093026.6666666667, ans=0.125 2023-12-23 10:39:07,591 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.55 vs. limit=15.0 2023-12-23 10:39:09,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1093093.3333333333, ans=0.0 2023-12-23 10:39:10,605 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.113e+01 3.449e+01 3.631e+01 3.772e+01 4.886e+01, threshold=7.262e+01, percent-clipped=0.0 2023-12-23 10:39:11,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1093093.3333333333, ans=0.125 2023-12-23 10:39:17,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1093160.0, ans=0.125 2023-12-23 10:39:27,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.17 vs. limit=22.5 2023-12-23 10:39:33,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1093226.6666666667, ans=0.0 2023-12-23 10:39:35,181 INFO [train.py:886] (1/4) Epoch 35, batch 1950, loss[loss=0.01003, audio_tagging_loss=0.01003, over 25000.00 frames. ], tot_loss[loss=0.01205, audio_tagging_loss=0.01205, over 4930236.67 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:39:45,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1093293.3333333333, ans=0.125 2023-12-23 10:39:46,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1093360.0, ans=0.125 2023-12-23 10:39:47,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1093360.0, ans=0.125 2023-12-23 10:39:48,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1093360.0, ans=0.125 2023-12-23 10:39:58,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1093426.6666666667, ans=0.0 2023-12-23 10:40:08,828 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.76 vs. limit=15.0 2023-12-23 10:40:15,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1093493.3333333333, ans=0.125 2023-12-23 10:40:27,286 INFO [train.py:886] (1/4) Epoch 35, batch 2000, loss[loss=0.009606, audio_tagging_loss=0.009606, over 24750.00 frames. ], tot_loss[loss=0.01205, audio_tagging_loss=0.01205, over 4935005.75 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:40:37,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1093626.6666666667, ans=0.1 2023-12-23 10:40:55,725 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.152e+01 3.397e+01 3.574e+01 3.710e+01 4.548e+01, threshold=7.148e+01, percent-clipped=0.0 2023-12-23 10:40:55,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1093760.0, ans=0.0 2023-12-23 10:40:57,122 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.54 vs. limit=22.5 2023-12-23 10:40:58,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1093826.6666666667, ans=0.1 2023-12-23 10:41:20,407 INFO [train.py:886] (1/4) Epoch 35, batch 2050, loss[loss=0.01273, audio_tagging_loss=0.01273, over 25000.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4943314.48 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:41:23,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1093960.0, ans=0.125 2023-12-23 10:41:26,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1093960.0, ans=0.125 2023-12-23 10:41:26,730 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.47 vs. limit=22.5 2023-12-23 10:41:28,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1093960.0, ans=0.0 2023-12-23 10:41:28,599 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2023-12-23 10:41:33,339 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.47 vs. limit=6.0 2023-12-23 10:41:36,613 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=15.0 2023-12-23 10:41:42,680 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 10:41:46,494 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.52 vs. limit=10.0 2023-12-23 10:42:10,508 INFO [train.py:886] (1/4) Epoch 35, batch 2100, loss[loss=0.009846, audio_tagging_loss=0.009846, over 21614.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4942743.63 frames. ], batch size: 107, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:42:12,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1094293.3333333333, ans=0.125 2023-12-23 10:42:15,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1094293.3333333333, ans=0.125 2023-12-23 10:42:37,894 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.139e+01 3.441e+01 3.605e+01 3.842e+01 4.378e+01, threshold=7.210e+01, percent-clipped=0.0 2023-12-23 10:42:47,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1094493.3333333333, ans=0.125 2023-12-23 10:42:47,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1094493.3333333333, ans=0.2 2023-12-23 10:42:59,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1094560.0, ans=0.125 2023-12-23 10:43:02,012 INFO [train.py:886] (1/4) Epoch 35, batch 2150, loss[loss=0.01401, audio_tagging_loss=0.01401, over 25000.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4946959.98 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:43:02,456 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.99 vs. limit=15.0 2023-12-23 10:43:28,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1094760.0, ans=0.125 2023-12-23 10:43:40,542 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 10:43:52,028 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.13 vs. limit=15.0 2023-12-23 10:43:53,293 INFO [train.py:886] (1/4) Epoch 35, batch 2200, loss[loss=0.01306, audio_tagging_loss=0.01306, over 24750.00 frames. ], tot_loss[loss=0.01201, audio_tagging_loss=0.01201, over 4941893.29 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 32.0 2023-12-23 10:43:56,415 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.87 vs. limit=15.0 2023-12-23 10:44:22,587 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.222e+01 3.461e+01 3.667e+01 3.781e+01 4.293e+01, threshold=7.335e+01, percent-clipped=0.0 2023-12-23 10:44:22,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1095093.3333333333, ans=0.0 2023-12-23 10:44:38,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1095226.6666666667, ans=0.1 2023-12-23 10:44:44,097 INFO [train.py:886] (1/4) Epoch 35, batch 2250, loss[loss=0.01301, audio_tagging_loss=0.01301, over 25000.00 frames. ], tot_loss[loss=0.0121, audio_tagging_loss=0.0121, over 4940475.15 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 32.0 2023-12-23 10:44:50,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1095293.3333333333, ans=0.0 2023-12-23 10:44:50,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1095293.3333333333, ans=0.2 2023-12-23 10:44:53,440 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.06 vs. limit=15.0 2023-12-23 10:44:57,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1095360.0, ans=0.0 2023-12-23 10:44:59,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1095360.0, ans=0.2 2023-12-23 10:45:03,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1095360.0, ans=0.1 2023-12-23 10:45:05,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1095426.6666666667, ans=0.0 2023-12-23 10:45:11,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1095426.6666666667, ans=0.0 2023-12-23 10:45:24,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1095560.0, ans=0.125 2023-12-23 10:45:28,642 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.26 vs. limit=15.0 2023-12-23 10:45:31,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1095560.0, ans=0.0 2023-12-23 10:45:35,338 INFO [train.py:886] (1/4) Epoch 35, batch 2300, loss[loss=0.01099, audio_tagging_loss=0.01099, over 24750.00 frames. ], tot_loss[loss=0.01209, audio_tagging_loss=0.01209, over 4944772.45 frames. ], batch size: 99, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:45:42,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1095626.6666666667, ans=0.1 2023-12-23 10:45:54,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1095693.3333333333, ans=0.125 2023-12-23 10:45:56,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1095760.0, ans=0.125 2023-12-23 10:46:03,978 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.037e+01 3.412e+01 3.576e+01 3.677e+01 4.204e+01, threshold=7.151e+01, percent-clipped=0.0 2023-12-23 10:46:04,441 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.55 vs. limit=12.0 2023-12-23 10:46:08,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1095826.6666666667, ans=0.2 2023-12-23 10:46:20,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=1095893.3333333333, ans=0.02 2023-12-23 10:46:27,859 INFO [train.py:886] (1/4) Epoch 35, batch 2350, loss[loss=0.01092, audio_tagging_loss=0.01092, over 25000.00 frames. ], tot_loss[loss=0.01199, audio_tagging_loss=0.01199, over 4939066.70 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:46:49,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1096093.3333333333, ans=0.1 2023-12-23 10:47:01,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1096160.0, ans=10.0 2023-12-23 10:47:06,150 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 10:47:10,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1096226.6666666667, ans=0.1 2023-12-23 10:47:14,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1096226.6666666667, ans=0.125 2023-12-23 10:47:19,151 INFO [train.py:886] (1/4) Epoch 35, batch 2400, loss[loss=0.01103, audio_tagging_loss=0.01103, over 25000.00 frames. ], tot_loss[loss=0.01195, audio_tagging_loss=0.01195, over 4944874.29 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:47:19,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1096293.3333333333, ans=0.025 2023-12-23 10:47:25,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1096293.3333333333, ans=0.125 2023-12-23 10:47:41,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1096426.6666666667, ans=0.125 2023-12-23 10:47:42,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1096426.6666666667, ans=0.125 2023-12-23 10:47:48,610 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.150e+01 3.451e+01 3.579e+01 3.689e+01 4.162e+01, threshold=7.158e+01, percent-clipped=0.0 2023-12-23 10:47:50,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1096493.3333333333, ans=0.125 2023-12-23 10:47:52,911 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.64 vs. limit=22.5 2023-12-23 10:47:54,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1096493.3333333333, ans=0.0 2023-12-23 10:48:10,846 INFO [train.py:886] (1/4) Epoch 35, batch 2450, loss[loss=0.01182, audio_tagging_loss=0.01182, over 25000.00 frames. ], tot_loss[loss=0.01198, audio_tagging_loss=0.01198, over 4946271.56 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:48:20,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1096693.3333333333, ans=0.2 2023-12-23 10:48:25,365 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.92 vs. limit=15.0 2023-12-23 10:48:43,664 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.86 vs. limit=22.5 2023-12-23 10:48:57,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1096893.3333333333, ans=0.0 2023-12-23 10:49:01,558 INFO [train.py:886] (1/4) Epoch 35, batch 2500, loss[loss=0.01395, audio_tagging_loss=0.01395, over 24750.00 frames. ], tot_loss[loss=0.01208, audio_tagging_loss=0.01208, over 4943849.62 frames. ], batch size: 99, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:49:10,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1097026.6666666667, ans=0.0 2023-12-23 10:49:30,652 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.928e+01 3.456e+01 3.603e+01 3.843e+01 4.868e+01, threshold=7.207e+01, percent-clipped=0.0 2023-12-23 10:49:44,044 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.76 vs. limit=6.0 2023-12-23 10:49:52,983 INFO [train.py:886] (1/4) Epoch 35, batch 2550, loss[loss=0.01121, audio_tagging_loss=0.01121, over 24750.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 4945768.90 frames. ], batch size: 99, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:50:08,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1097360.0, ans=0.125 2023-12-23 10:50:09,032 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.58 vs. limit=15.0 2023-12-23 10:50:09,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1097360.0, ans=15.0 2023-12-23 10:50:30,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1097493.3333333333, ans=0.09899494936611666 2023-12-23 10:50:31,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1097493.3333333333, ans=0.1 2023-12-23 10:50:46,621 INFO [train.py:886] (1/4) Epoch 35, batch 2600, loss[loss=0.01014, audio_tagging_loss=0.01014, over 24750.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 4948659.49 frames. ], batch size: 99, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:50:46,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1097626.6666666667, ans=0.2 2023-12-23 10:50:50,769 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.20 vs. limit=12.0 2023-12-23 10:51:14,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1097760.0, ans=0.1 2023-12-23 10:51:15,871 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.061e+01 3.460e+01 3.619e+01 3.732e+01 4.232e+01, threshold=7.237e+01, percent-clipped=0.0 2023-12-23 10:51:21,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1097826.6666666667, ans=0.125 2023-12-23 10:51:28,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=1097893.3333333333, ans=0.025 2023-12-23 10:51:29,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1097893.3333333333, ans=0.2 2023-12-23 10:51:29,844 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.78 vs. limit=15.0 2023-12-23 10:51:36,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1097960.0, ans=0.0 2023-12-23 10:51:37,642 INFO [train.py:886] (1/4) Epoch 35, batch 2650, loss[loss=0.01263, audio_tagging_loss=0.01263, over 25000.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4950479.37 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:51:47,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1097960.0, ans=0.1 2023-12-23 10:52:06,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1098093.3333333333, ans=0.125 2023-12-23 10:52:28,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1098226.6666666667, ans=0.125 2023-12-23 10:52:29,779 INFO [train.py:886] (1/4) Epoch 35, batch 2700, loss[loss=0.01348, audio_tagging_loss=0.01348, over 25000.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4957094.05 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:52:31,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1098293.3333333333, ans=0.125 2023-12-23 10:52:58,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=1098426.6666666667, ans=10.0 2023-12-23 10:52:59,082 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.192e+01 3.447e+01 3.571e+01 3.720e+01 4.339e+01, threshold=7.142e+01, percent-clipped=0.0 2023-12-23 10:53:16,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.41 vs. limit=15.0 2023-12-23 10:53:18,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1098560.0, ans=0.125 2023-12-23 10:53:18,891 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.86 vs. limit=15.0 2023-12-23 10:53:19,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=1098560.0, ans=0.95 2023-12-23 10:53:21,974 INFO [train.py:886] (1/4) Epoch 35, batch 2750, loss[loss=0.008925, audio_tagging_loss=0.008925, over 24750.00 frames. ], tot_loss[loss=0.01202, audio_tagging_loss=0.01202, over 4955427.84 frames. ], batch size: 99, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:53:28,340 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.40 vs. limit=15.0 2023-12-23 10:53:30,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1098693.3333333333, ans=10.0 2023-12-23 10:53:35,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1098693.3333333333, ans=0.0 2023-12-23 10:53:57,844 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.10 vs. limit=22.5 2023-12-23 10:54:10,357 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.41 vs. limit=6.0 2023-12-23 10:54:11,636 INFO [train.py:886] (1/4) Epoch 35, batch 2800, loss[loss=0.01202, audio_tagging_loss=0.01202, over 24750.00 frames. ], tot_loss[loss=0.01204, audio_tagging_loss=0.01204, over 4956894.02 frames. ], batch size: 99, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:54:15,845 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.07 vs. limit=15.0 2023-12-23 10:54:28,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1099026.6666666667, ans=0.125 2023-12-23 10:54:28,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1099026.6666666667, ans=0.0 2023-12-23 10:54:37,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1099093.3333333333, ans=0.2 2023-12-23 10:54:40,785 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=22.5 2023-12-23 10:54:41,035 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.165e+01 3.455e+01 3.628e+01 3.845e+01 4.485e+01, threshold=7.256e+01, percent-clipped=0.0 2023-12-23 10:54:41,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1099093.3333333333, ans=10.0 2023-12-23 10:54:45,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1099160.0, ans=0.125 2023-12-23 10:54:47,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1099160.0, ans=0.05 2023-12-23 10:55:00,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1099226.6666666667, ans=0.125 2023-12-23 10:55:00,340 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.07 vs. limit=15.0 2023-12-23 10:55:04,714 INFO [train.py:886] (1/4) Epoch 35, batch 2850, loss[loss=0.01289, audio_tagging_loss=0.01289, over 24750.00 frames. ], tot_loss[loss=0.01207, audio_tagging_loss=0.01207, over 4945242.04 frames. ], batch size: 99, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:55:05,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1099293.3333333333, ans=0.1 2023-12-23 10:55:06,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1099293.3333333333, ans=0.1 2023-12-23 10:55:35,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1099493.3333333333, ans=0.2 2023-12-23 10:55:42,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1099493.3333333333, ans=0.125 2023-12-23 10:55:49,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1099560.0, ans=0.0 2023-12-23 10:55:57,047 INFO [train.py:886] (1/4) Epoch 35, batch 2900, loss[loss=0.01436, audio_tagging_loss=0.01436, over 24750.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 4948331.86 frames. ], batch size: 99, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:55:58,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1099626.6666666667, ans=0.125 2023-12-23 10:56:24,423 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.115e+01 3.411e+01 3.569e+01 3.817e+01 4.301e+01, threshold=7.139e+01, percent-clipped=0.0 2023-12-23 10:56:40,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.47 vs. limit=6.0 2023-12-23 10:56:48,113 INFO [train.py:886] (1/4) Epoch 35, batch 2950, loss[loss=0.01344, audio_tagging_loss=0.01344, over 24750.00 frames. ], tot_loss[loss=0.01205, audio_tagging_loss=0.01205, over 4956326.50 frames. ], batch size: 99, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:57:00,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1100026.6666666667, ans=0.1 2023-12-23 10:57:19,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1100160.0, ans=0.125 2023-12-23 10:57:22,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1100160.0, ans=0.1 2023-12-23 10:57:22,949 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.84 vs. limit=15.0 2023-12-23 10:57:25,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=26.39 vs. limit=22.5 2023-12-23 10:57:28,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1100226.6666666667, ans=0.125 2023-12-23 10:57:30,158 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.51 vs. limit=15.0 2023-12-23 10:57:38,463 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.63 vs. limit=15.0 2023-12-23 10:57:40,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1100293.3333333333, ans=0.125 2023-12-23 10:57:41,381 INFO [train.py:886] (1/4) Epoch 35, batch 3000, loss[loss=0.01282, audio_tagging_loss=0.01282, over 24750.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4955957.59 frames. ], batch size: 99, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:57:41,381 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 10:57:54,319 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.4584, 3.3020, 3.9864, 4.1267], device='cuda:1') 2023-12-23 10:58:02,714 INFO [train.py:917] (1/4) Epoch 35, validation: loss=0.03345, audio_tagging_loss=0.03345, over 3737520.00 frames. 2023-12-23 10:58:02,714 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 10:58:10,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1100293.3333333333, ans=0.5 2023-12-23 10:58:25,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1100426.6666666667, ans=0.125 2023-12-23 10:58:29,236 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.73 vs. limit=10.0 2023-12-23 10:58:30,593 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.128e+01 3.436e+01 3.631e+01 3.835e+01 4.770e+01, threshold=7.261e+01, percent-clipped=0.0 2023-12-23 10:58:36,304 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.07 vs. limit=15.0 2023-12-23 10:58:45,479 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.48 vs. limit=10.0 2023-12-23 10:58:54,476 INFO [train.py:886] (1/4) Epoch 35, batch 3050, loss[loss=0.01156, audio_tagging_loss=0.01156, over 25000.00 frames. ], tot_loss[loss=0.01191, audio_tagging_loss=0.01191, over 4958949.14 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:59:12,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1100693.3333333333, ans=0.125 2023-12-23 10:59:23,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1100760.0, ans=0.09899494936611666 2023-12-23 10:59:35,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2023-12-23 10:59:39,002 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.05 vs. limit=22.5 2023-12-23 10:59:45,923 INFO [train.py:886] (1/4) Epoch 35, batch 3100, loss[loss=0.01073, audio_tagging_loss=0.01073, over 25000.00 frames. ], tot_loss[loss=0.01195, audio_tagging_loss=0.01195, over 4959961.05 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:59:51,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1100960.0, ans=15.0 2023-12-23 10:59:54,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1100960.0, ans=0.125 2023-12-23 10:59:58,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1101026.6666666667, ans=0.1 2023-12-23 11:00:15,065 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.234e+01 3.536e+01 3.677e+01 3.842e+01 4.191e+01, threshold=7.354e+01, percent-clipped=0.0 2023-12-23 11:00:36,562 INFO [train.py:886] (1/4) Epoch 35, batch 3150, loss[loss=0.01156, audio_tagging_loss=0.01156, over 24750.00 frames. ], tot_loss[loss=0.01207, audio_tagging_loss=0.01207, over 4956465.75 frames. ], batch size: 99, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 11:00:51,768 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.79 vs. limit=15.0 2023-12-23 11:00:53,359 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2023-12-23 11:00:55,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1101360.0, ans=0.125 2023-12-23 11:00:59,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1101426.6666666667, ans=0.125 2023-12-23 11:01:03,689 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.80 vs. limit=12.0 2023-12-23 11:01:21,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1101560.0, ans=0.125 2023-12-23 11:01:28,648 INFO [train.py:886] (1/4) Epoch 35, batch 3200, loss[loss=0.01241, audio_tagging_loss=0.01241, over 24750.00 frames. ], tot_loss[loss=0.01208, audio_tagging_loss=0.01208, over 4951301.20 frames. ], batch size: 99, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 11:01:49,423 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.53 vs. limit=22.5 2023-12-23 11:01:57,269 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.006e+01 3.428e+01 3.617e+01 3.805e+01 4.182e+01, threshold=7.234e+01, percent-clipped=0.0 2023-12-23 11:02:19,493 INFO [train.py:886] (1/4) Epoch 35, batch 3250, loss[loss=0.01064, audio_tagging_loss=0.01064, over 24750.00 frames. ], tot_loss[loss=0.01195, audio_tagging_loss=0.01195, over 4950054.43 frames. ], batch size: 99, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 11:02:20,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1101960.0, ans=0.125 2023-12-23 11:02:38,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1102093.3333333333, ans=0.125 2023-12-23 11:02:49,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1102160.0, ans=0.125 2023-12-23 11:02:56,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1102160.0, ans=0.125 2023-12-23 11:03:07,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1102226.6666666667, ans=0.125 2023-12-23 11:03:07,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1102226.6666666667, ans=0.95 2023-12-23 11:03:09,895 INFO [train.py:886] (1/4) Epoch 35, batch 3300, loss[loss=0.01168, audio_tagging_loss=0.01168, over 25000.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4949310.83 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 11:03:39,475 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.083e+01 3.432e+01 3.586e+01 3.721e+01 4.248e+01, threshold=7.173e+01, percent-clipped=0.0 2023-12-23 11:03:39,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1102426.6666666667, ans=0.0 2023-12-23 11:03:41,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1102493.3333333333, ans=0.125 2023-12-23 11:03:51,148 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.94 vs. limit=22.5 2023-12-23 11:03:56,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1102560.0, ans=0.1 2023-12-23 11:03:57,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1102560.0, ans=0.1 2023-12-23 11:04:02,292 INFO [train.py:886] (1/4) Epoch 35, batch 3350, loss[loss=0.01268, audio_tagging_loss=0.01268, over 25000.00 frames. ], tot_loss[loss=0.01197, audio_tagging_loss=0.01197, over 4953882.93 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:04:09,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1102626.6666666667, ans=0.0 2023-12-23 11:04:14,030 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.22 vs. limit=22.5 2023-12-23 11:04:52,946 INFO [train.py:886] (1/4) Epoch 35, batch 3400, loss[loss=0.01188, audio_tagging_loss=0.01188, over 25000.00 frames. ], tot_loss[loss=0.01197, audio_tagging_loss=0.01197, over 4957769.03 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:05:14,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1103093.3333333333, ans=0.0 2023-12-23 11:05:17,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1103093.3333333333, ans=0.125 2023-12-23 11:05:22,239 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.114e+01 3.482e+01 3.648e+01 3.813e+01 4.164e+01, threshold=7.297e+01, percent-clipped=0.0 2023-12-23 11:05:28,324 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.69 vs. limit=15.0 2023-12-23 11:05:33,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1103160.0, ans=0.125 2023-12-23 11:05:35,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1103226.6666666667, ans=0.0 2023-12-23 11:05:45,977 INFO [train.py:886] (1/4) Epoch 35, batch 3450, loss[loss=0.00853, audio_tagging_loss=0.00853, over 21941.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4947131.10 frames. ], batch size: 107, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:05:49,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1103293.3333333333, ans=0.0 2023-12-23 11:05:50,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1103293.3333333333, ans=0.2 2023-12-23 11:06:09,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1103426.6666666667, ans=0.2 2023-12-23 11:06:12,904 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:06:18,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1103493.3333333333, ans=0.0 2023-12-23 11:06:21,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.41 vs. limit=22.5 2023-12-23 11:06:26,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1103560.0, ans=0.2 2023-12-23 11:06:38,341 INFO [train.py:886] (1/4) Epoch 35, batch 3500, loss[loss=0.01461, audio_tagging_loss=0.01461, over 24750.00 frames. ], tot_loss[loss=0.01211, audio_tagging_loss=0.01211, over 4944621.47 frames. ], batch size: 99, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:06:44,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1103626.6666666667, ans=0.2 2023-12-23 11:06:49,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1103693.3333333333, ans=0.125 2023-12-23 11:06:51,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=1103693.3333333333, ans=0.1 2023-12-23 11:07:07,534 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.194e+01 3.520e+01 3.666e+01 3.849e+01 4.626e+01, threshold=7.332e+01, percent-clipped=0.0 2023-12-23 11:07:28,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1103960.0, ans=0.125 2023-12-23 11:07:29,129 INFO [train.py:886] (1/4) Epoch 35, batch 3550, loss[loss=0.0101, audio_tagging_loss=0.0101, over 24750.00 frames. ], tot_loss[loss=0.01206, audio_tagging_loss=0.01206, over 4942334.11 frames. ], batch size: 99, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:07:34,056 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.32 vs. limit=12.0 2023-12-23 11:07:34,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1103960.0, ans=0.0 2023-12-23 11:07:46,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1104026.6666666667, ans=0.0 2023-12-23 11:07:48,778 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.01 vs. limit=22.5 2023-12-23 11:08:06,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1104160.0, ans=0.125 2023-12-23 11:08:11,892 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.56 vs. limit=6.0 2023-12-23 11:08:15,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1104226.6666666667, ans=0.125 2023-12-23 11:08:21,774 INFO [train.py:886] (1/4) Epoch 35, batch 3600, loss[loss=0.01303, audio_tagging_loss=0.01303, over 25000.00 frames. ], tot_loss[loss=0.01202, audio_tagging_loss=0.01202, over 4951053.03 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:08:22,064 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:08:34,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1104360.0, ans=0.125 2023-12-23 11:08:44,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1104426.6666666667, ans=0.125 2023-12-23 11:08:51,252 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.150e+01 3.469e+01 3.628e+01 3.807e+01 4.642e+01, threshold=7.257e+01, percent-clipped=0.0 2023-12-23 11:08:56,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2023-12-23 11:09:07,684 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.50 vs. limit=15.0 2023-12-23 11:09:09,463 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.25 vs. limit=10.0 2023-12-23 11:09:14,215 INFO [train.py:886] (1/4) Epoch 35, batch 3650, loss[loss=0.01062, audio_tagging_loss=0.01062, over 25000.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4953773.12 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:09:46,084 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.03 vs. limit=12.0 2023-12-23 11:09:56,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1104893.3333333333, ans=10.0 2023-12-23 11:10:02,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1104893.3333333333, ans=0.125 2023-12-23 11:10:05,076 INFO [train.py:886] (1/4) Epoch 35, batch 3700, loss[loss=0.0135, audio_tagging_loss=0.0135, over 25000.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4959164.88 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:10:34,223 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.068e+01 3.497e+01 3.613e+01 3.767e+01 4.191e+01, threshold=7.225e+01, percent-clipped=0.0 2023-12-23 11:10:52,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.97 vs. limit=15.0 2023-12-23 11:10:56,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.43 vs. limit=22.5 2023-12-23 11:10:58,103 INFO [train.py:886] (1/4) Epoch 35, batch 3750, loss[loss=0.009445, audio_tagging_loss=0.009445, over 24750.00 frames. ], tot_loss[loss=0.01212, audio_tagging_loss=0.01212, over 4957820.19 frames. ], batch size: 99, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:10:59,420 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=15.0 2023-12-23 11:11:14,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1105360.0, ans=0.2 2023-12-23 11:11:29,971 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.10 vs. limit=15.0 2023-12-23 11:11:39,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1105560.0, ans=0.125 2023-12-23 11:11:43,016 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=15.0 2023-12-23 11:11:49,009 INFO [train.py:886] (1/4) Epoch 35, batch 3800, loss[loss=0.01068, audio_tagging_loss=0.01068, over 24750.00 frames. ], tot_loss[loss=0.01218, audio_tagging_loss=0.01218, over 4947609.90 frames. ], batch size: 99, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:11:55,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1105626.6666666667, ans=0.125 2023-12-23 11:11:55,572 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.96 vs. limit=10.0 2023-12-23 11:11:56,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1105626.6666666667, ans=0.125 2023-12-23 11:12:12,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1105760.0, ans=0.125 2023-12-23 11:12:15,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1105760.0, ans=0.0 2023-12-23 11:12:15,618 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.55 vs. limit=12.0 2023-12-23 11:12:16,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1105760.0, ans=0.125 2023-12-23 11:12:17,674 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.190e+01 3.510e+01 3.632e+01 3.780e+01 4.294e+01, threshold=7.263e+01, percent-clipped=0.0 2023-12-23 11:12:20,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1105826.6666666667, ans=0.95 2023-12-23 11:12:24,382 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2023-12-23 11:12:24,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1105826.6666666667, ans=0.125 2023-12-23 11:12:32,102 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:12:37,107 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.16 vs. limit=15.0 2023-12-23 11:12:41,330 INFO [train.py:886] (1/4) Epoch 35, batch 3850, loss[loss=0.01028, audio_tagging_loss=0.01028, over 24750.00 frames. ], tot_loss[loss=0.01204, audio_tagging_loss=0.01204, over 4944293.03 frames. ], batch size: 99, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:12:44,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1105960.0, ans=0.035 2023-12-23 11:13:06,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1106093.3333333333, ans=0.125 2023-12-23 11:13:11,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1106160.0, ans=0.1 2023-12-23 11:13:15,139 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.35 vs. limit=22.5 2023-12-23 11:13:22,074 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.98 vs. limit=22.5 2023-12-23 11:13:33,065 INFO [train.py:886] (1/4) Epoch 35, batch 3900, loss[loss=0.01382, audio_tagging_loss=0.01382, over 25000.00 frames. ], tot_loss[loss=0.01198, audio_tagging_loss=0.01198, over 4949444.22 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:13:35,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1106293.3333333333, ans=0.125 2023-12-23 11:13:39,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1106293.3333333333, ans=0.125 2023-12-23 11:13:47,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1106360.0, ans=0.0 2023-12-23 11:14:01,024 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.137e+01 3.458e+01 3.613e+01 3.736e+01 4.379e+01, threshold=7.225e+01, percent-clipped=0.0 2023-12-23 11:14:04,021 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.25 vs. limit=15.0 2023-12-23 11:14:07,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1106493.3333333333, ans=0.0 2023-12-23 11:14:09,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1106493.3333333333, ans=0.1 2023-12-23 11:14:17,436 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2023-12-23 11:14:22,789 INFO [train.py:886] (1/4) Epoch 35, batch 3950, loss[loss=0.01275, audio_tagging_loss=0.01275, over 25000.00 frames. ], tot_loss[loss=0.01201, audio_tagging_loss=0.01201, over 4954166.43 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:14:24,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1106626.6666666667, ans=0.125 2023-12-23 11:14:48,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1106760.0, ans=0.125 2023-12-23 11:14:54,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1106826.6666666667, ans=0.1 2023-12-23 11:14:57,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1106826.6666666667, ans=0.0 2023-12-23 11:15:14,828 INFO [train.py:886] (1/4) Epoch 35, batch 4000, loss[loss=0.01123, audio_tagging_loss=0.01123, over 24750.00 frames. ], tot_loss[loss=0.01201, audio_tagging_loss=0.01201, over 4952274.06 frames. ], batch size: 99, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:15:15,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1106960.0, ans=0.125 2023-12-23 11:15:22,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1106960.0, ans=0.1 2023-12-23 11:15:23,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1107026.6666666667, ans=0.0 2023-12-23 11:15:26,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1107026.6666666667, ans=0.1 2023-12-23 11:15:42,650 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.59 vs. limit=22.5 2023-12-23 11:15:42,969 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.176e+01 3.470e+01 3.615e+01 3.743e+01 4.164e+01, threshold=7.230e+01, percent-clipped=0.0 2023-12-23 11:15:43,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1107093.3333333333, ans=0.0 2023-12-23 11:15:44,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1107160.0, ans=0.09899494936611666 2023-12-23 11:15:54,731 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:15:57,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1107226.6666666667, ans=0.0 2023-12-23 11:15:57,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1107226.6666666667, ans=0.2 2023-12-23 11:15:58,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1107226.6666666667, ans=0.95 2023-12-23 11:16:03,890 INFO [train.py:886] (1/4) Epoch 35, batch 4050, loss[loss=0.01134, audio_tagging_loss=0.01134, over 25000.00 frames. ], tot_loss[loss=0.01204, audio_tagging_loss=0.01204, over 4948822.11 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:16:04,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1107293.3333333333, ans=0.125 2023-12-23 11:16:10,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1107293.3333333333, ans=0.125 2023-12-23 11:16:13,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=1107360.0, ans=0.95 2023-12-23 11:16:24,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1107426.6666666667, ans=0.1 2023-12-23 11:16:25,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1107426.6666666667, ans=0.0 2023-12-23 11:16:40,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1107493.3333333333, ans=10.0 2023-12-23 11:16:53,989 INFO [train.py:886] (1/4) Epoch 35, batch 4100, loss[loss=0.01045, audio_tagging_loss=0.01045, over 24750.00 frames. ], tot_loss[loss=0.01211, audio_tagging_loss=0.01211, over 4947864.55 frames. ], batch size: 99, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:17:23,018 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.082e+01 3.446e+01 3.615e+01 3.831e+01 4.582e+01, threshold=7.231e+01, percent-clipped=0.0 2023-12-23 11:17:29,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1107826.6666666667, ans=0.125 2023-12-23 11:17:40,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1107893.3333333333, ans=0.125 2023-12-23 11:17:46,696 INFO [train.py:886] (1/4) Epoch 35, batch 4150, loss[loss=0.01376, audio_tagging_loss=0.01376, over 24750.00 frames. ], tot_loss[loss=0.01212, audio_tagging_loss=0.01212, over 4946686.58 frames. ], batch size: 99, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:17:48,109 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.22 vs. limit=22.5 2023-12-23 11:17:57,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1108026.6666666667, ans=0.07 2023-12-23 11:18:12,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.80 vs. limit=15.0 2023-12-23 11:18:21,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1108160.0, ans=0.125 2023-12-23 11:18:26,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1108226.6666666667, ans=0.0 2023-12-23 11:18:28,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1108226.6666666667, ans=0.1 2023-12-23 11:18:36,271 INFO [train.py:886] (1/4) Epoch 35, batch 4200, loss[loss=0.01115, audio_tagging_loss=0.01115, over 24750.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4944622.00 frames. ], batch size: 99, lr: 3.06e-03, grad_scale: 64.0 2023-12-23 11:18:46,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1108360.0, ans=0.125 2023-12-23 11:18:49,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=1108360.0, ans=0.1 2023-12-23 11:19:04,621 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.094e+01 3.379e+01 3.552e+01 3.711e+01 4.184e+01, threshold=7.105e+01, percent-clipped=0.0 2023-12-23 11:19:24,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=1108560.0, ans=0.1 2023-12-23 11:19:27,416 INFO [train.py:886] (1/4) Epoch 35, batch 4250, loss[loss=0.009895, audio_tagging_loss=0.009895, over 25000.00 frames. ], tot_loss[loss=0.012, audio_tagging_loss=0.012, over 4943080.61 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 64.0 2023-12-23 11:19:46,303 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.95 vs. limit=15.0 2023-12-23 11:19:50,952 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.75 vs. limit=15.0 2023-12-23 11:19:58,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1108826.6666666667, ans=0.125 2023-12-23 11:20:18,356 INFO [train.py:886] (1/4) Epoch 35, batch 4300, loss[loss=0.00983, audio_tagging_loss=0.00983, over 25000.00 frames. ], tot_loss[loss=0.01201, audio_tagging_loss=0.01201, over 4951369.99 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 64.0 2023-12-23 11:20:20,730 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.39 vs. limit=15.0 2023-12-23 11:20:21,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1108960.0, ans=0.0 2023-12-23 11:20:25,030 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.08 vs. limit=15.0 2023-12-23 11:20:32,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1109026.6666666667, ans=0.0 2023-12-23 11:20:41,961 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.27 vs. limit=15.0 2023-12-23 11:20:46,999 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.212e+01 3.455e+01 3.593e+01 3.734e+01 4.513e+01, threshold=7.186e+01, percent-clipped=0.0 2023-12-23 11:21:10,819 INFO [train.py:886] (1/4) Epoch 35, batch 4350, loss[loss=0.01434, audio_tagging_loss=0.01434, over 25000.00 frames. ], tot_loss[loss=0.01199, audio_tagging_loss=0.01199, over 4957196.49 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 64.0 2023-12-23 11:21:38,792 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.07 vs. limit=15.0 2023-12-23 11:22:01,780 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.84 vs. limit=22.5 2023-12-23 11:22:03,357 INFO [train.py:886] (1/4) Epoch 35, batch 4400, loss[loss=0.01476, audio_tagging_loss=0.01476, over 24750.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 4950595.96 frames. ], batch size: 99, lr: 3.06e-03, grad_scale: 64.0 2023-12-23 11:22:06,358 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.67 vs. limit=10.0 2023-12-23 11:22:28,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1109760.0, ans=0.0 2023-12-23 11:22:31,241 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.01 vs. limit=12.0 2023-12-23 11:22:32,613 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.229e+01 3.551e+01 3.706e+01 3.864e+01 4.550e+01, threshold=7.411e+01, percent-clipped=0.0 2023-12-23 11:22:40,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1109826.6666666667, ans=0.125 2023-12-23 11:22:40,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1109826.6666666667, ans=0.0 2023-12-23 11:22:43,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1109893.3333333333, ans=0.125 2023-12-23 11:22:53,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1109960.0, ans=0.125 2023-12-23 11:22:54,181 INFO [train.py:886] (1/4) Epoch 35, batch 4450, loss[loss=0.01149, audio_tagging_loss=0.01149, over 25000.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 4952053.99 frames. ], batch size: 100, lr: 3.05e-03, grad_scale: 64.0 2023-12-23 11:23:05,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1110026.6666666667, ans=0.1 2023-12-23 11:23:40,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1110226.6666666667, ans=0.025 2023-12-23 11:23:44,638 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:23:47,264 INFO [train.py:886] (1/4) Epoch 35, batch 4500, loss[loss=0.0105, audio_tagging_loss=0.0105, over 25000.00 frames. ], tot_loss[loss=0.01212, audio_tagging_loss=0.01212, over 4950996.03 frames. ], batch size: 100, lr: 3.05e-03, grad_scale: 64.0 2023-12-23 11:23:52,606 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.97 vs. limit=22.5 2023-12-23 11:24:16,596 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.136e+01 3.438e+01 3.620e+01 3.863e+01 4.550e+01, threshold=7.241e+01, percent-clipped=0.0 2023-12-23 11:24:20,060 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.52 vs. limit=15.0 2023-12-23 11:24:28,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=1110560.0, ans=22.5 2023-12-23 11:24:33,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1110560.0, ans=0.0 2023-12-23 11:24:39,143 INFO [train.py:886] (1/4) Epoch 35, batch 4550, loss[loss=0.01134, audio_tagging_loss=0.01134, over 25000.00 frames. ], tot_loss[loss=0.01205, audio_tagging_loss=0.01205, over 4951429.15 frames. ], batch size: 100, lr: 3.05e-03, grad_scale: 64.0 2023-12-23 11:24:48,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1110693.3333333333, ans=0.125 2023-12-23 11:24:52,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1110693.3333333333, ans=0.125 2023-12-23 11:25:09,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1110826.6666666667, ans=0.125 2023-12-23 11:25:12,742 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.21 vs. limit=12.0 2023-12-23 11:25:12,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.89 vs. limit=15.0 2023-12-23 11:25:26,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1110893.3333333333, ans=0.125 2023-12-23 11:25:29,953 INFO [train.py:886] (1/4) Epoch 35, batch 4600, loss[loss=0.01043, audio_tagging_loss=0.01043, over 24750.00 frames. ], tot_loss[loss=0.012, audio_tagging_loss=0.012, over 4954257.45 frames. ], batch size: 99, lr: 3.05e-03, grad_scale: 64.0 2023-12-23 11:25:46,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1111026.6666666667, ans=0.2 2023-12-23 11:25:48,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1111026.6666666667, ans=0.1 2023-12-23 11:25:59,321 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.087e+01 3.428e+01 3.565e+01 3.717e+01 4.348e+01, threshold=7.130e+01, percent-clipped=0.0 2023-12-23 11:26:22,427 INFO [train.py:886] (1/4) Epoch 35, batch 4650, loss[loss=0.01146, audio_tagging_loss=0.01146, over 25000.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4959534.82 frames. ], batch size: 100, lr: 3.05e-03, grad_scale: 64.0 2023-12-23 11:26:46,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1111426.6666666667, ans=0.0 2023-12-23 11:27:13,086 INFO [train.py:886] (1/4) Epoch 35, batch 4700, loss[loss=0.01261, audio_tagging_loss=0.01261, over 24750.00 frames. ], tot_loss[loss=0.01212, audio_tagging_loss=0.01212, over 4953060.93 frames. ], batch size: 99, lr: 3.05e-03, grad_scale: 64.0 2023-12-23 11:27:34,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1111760.0, ans=0.0 2023-12-23 11:27:38,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.58 vs. limit=12.0 2023-12-23 11:27:39,777 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.244e+01 3.493e+01 3.654e+01 3.823e+01 4.545e+01, threshold=7.308e+01, percent-clipped=0.0 2023-12-23 11:27:51,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1111893.3333333333, ans=0.1 2023-12-23 11:27:53,504 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.31 vs. limit=22.5 2023-12-23 11:28:00,261 INFO [train.py:886] (1/4) Epoch 35, batch 4750, loss[loss=0.0139, audio_tagging_loss=0.0139, over 24750.00 frames. ], tot_loss[loss=0.01221, audio_tagging_loss=0.01221, over 4949512.42 frames. ], batch size: 99, lr: 3.05e-03, grad_scale: 64.0 2023-12-23 11:28:02,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1111960.0, ans=0.125 2023-12-23 11:28:09,877 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.93 vs. limit=6.0 2023-12-23 11:28:35,990 INFO [train.py:886] (1/4) Epoch 36, batch 0, loss[loss=0.02497, audio_tagging_loss=0.02497, over 23977.00 frames. ], tot_loss[loss=0.02497, audio_tagging_loss=0.02497, over 23977.00 frames. ], batch size: 100, lr: 3.01e-03, grad_scale: 32.0 2023-12-23 11:28:35,991 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 11:28:52,684 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.1779, 0.9081, 4.4490, 4.3385], device='cuda:1') 2023-12-23 11:28:56,823 INFO [train.py:917] (1/4) Epoch 36, validation: loss=0.0339, audio_tagging_loss=0.0339, over 3737520.00 frames. 2023-12-23 11:28:56,823 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 11:28:57,357 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.54 vs. limit=15.0 2023-12-23 11:29:07,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1112133.3333333333, ans=0.025 2023-12-23 11:29:07,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1112133.3333333333, ans=0.125 2023-12-23 11:29:07,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1112133.3333333333, ans=0.2 2023-12-23 11:29:34,602 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.66 vs. limit=15.0 2023-12-23 11:29:48,234 INFO [train.py:886] (1/4) Epoch 36, batch 50, loss[loss=0.01794, audio_tagging_loss=0.01794, over 25000.00 frames. ], tot_loss[loss=0.01908, audio_tagging_loss=0.01908, over 1113783.60 frames. ], batch size: 100, lr: 3.01e-03, grad_scale: 32.0 2023-12-23 11:29:53,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=1112400.0, ans=0.02 2023-12-23 11:29:56,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1112400.0, ans=0.0 2023-12-23 11:30:01,871 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.062e+01 3.818e+01 4.375e+01 4.992e+01 9.452e+01, threshold=8.751e+01, percent-clipped=8.0 2023-12-23 11:30:09,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1112533.3333333333, ans=0.0 2023-12-23 11:30:23,263 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.88 vs. limit=12.0 2023-12-23 11:30:30,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1112666.6666666667, ans=0.0 2023-12-23 11:30:32,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1112666.6666666667, ans=0.1 2023-12-23 11:30:34,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1112666.6666666667, ans=0.125 2023-12-23 11:30:40,126 INFO [train.py:886] (1/4) Epoch 36, batch 100, loss[loss=0.01378, audio_tagging_loss=0.01378, over 25000.00 frames. ], tot_loss[loss=0.01651, audio_tagging_loss=0.01651, over 1962295.84 frames. ], batch size: 100, lr: 3.01e-03, grad_scale: 32.0 2023-12-23 11:31:14,907 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:31:17,985 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.88 vs. limit=12.0 2023-12-23 11:31:24,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1113000.0, ans=0.125 2023-12-23 11:31:31,079 INFO [train.py:886] (1/4) Epoch 36, batch 150, loss[loss=0.0111, audio_tagging_loss=0.0111, over 25000.00 frames. ], tot_loss[loss=0.01517, audio_tagging_loss=0.01517, over 2630243.39 frames. ], batch size: 100, lr: 3.01e-03, grad_scale: 32.0 2023-12-23 11:31:40,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1113133.3333333333, ans=0.0 2023-12-23 11:31:41,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1113133.3333333333, ans=0.125 2023-12-23 11:31:43,988 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.213e+01 3.709e+01 3.865e+01 4.019e+01 4.619e+01, threshold=7.729e+01, percent-clipped=0.0 2023-12-23 11:31:55,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1113200.0, ans=0.125 2023-12-23 11:31:55,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1113200.0, ans=0.125 2023-12-23 11:32:16,809 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.94 vs. limit=15.0 2023-12-23 11:32:22,761 INFO [train.py:886] (1/4) Epoch 36, batch 200, loss[loss=0.01482, audio_tagging_loss=0.01482, over 24750.00 frames. ], tot_loss[loss=0.01422, audio_tagging_loss=0.01422, over 3150475.36 frames. ], batch size: 99, lr: 3.01e-03, grad_scale: 32.0 2023-12-23 11:32:32,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1113400.0, ans=0.1 2023-12-23 11:32:33,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1113466.6666666667, ans=0.125 2023-12-23 11:32:33,205 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.68 vs. limit=15.0 2023-12-23 11:32:46,066 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:32:50,033 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.91 vs. limit=15.0 2023-12-23 11:33:00,201 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.24 vs. limit=22.5 2023-12-23 11:33:15,320 INFO [train.py:886] (1/4) Epoch 36, batch 250, loss[loss=0.01165, audio_tagging_loss=0.01165, over 25000.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 3552397.39 frames. ], batch size: 100, lr: 3.01e-03, grad_scale: 32.0 2023-12-23 11:33:19,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1113733.3333333333, ans=0.1 2023-12-23 11:33:19,265 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:33:28,143 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.045e+01 3.532e+01 3.658e+01 3.837e+01 4.468e+01, threshold=7.316e+01, percent-clipped=0.0 2023-12-23 11:33:35,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1113866.6666666667, ans=0.2 2023-12-23 11:33:54,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1113933.3333333333, ans=0.2 2023-12-23 11:33:58,479 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.42 vs. limit=6.0 2023-12-23 11:34:06,869 INFO [train.py:886] (1/4) Epoch 36, batch 300, loss[loss=0.01418, audio_tagging_loss=0.01418, over 25000.00 frames. ], tot_loss[loss=0.01332, audio_tagging_loss=0.01332, over 3864287.61 frames. ], batch size: 100, lr: 3.01e-03, grad_scale: 32.0 2023-12-23 11:34:13,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1114066.6666666667, ans=0.0 2023-12-23 11:34:13,543 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=12.0 2023-12-23 11:34:14,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1114066.6666666667, ans=0.1 2023-12-23 11:34:18,196 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.16 vs. limit=10.0 2023-12-23 11:34:35,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1114200.0, ans=0.125 2023-12-23 11:34:43,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1114266.6666666667, ans=0.125 2023-12-23 11:34:44,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1114266.6666666667, ans=0.125 2023-12-23 11:34:46,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1114266.6666666667, ans=0.0 2023-12-23 11:34:58,115 INFO [train.py:886] (1/4) Epoch 36, batch 350, loss[loss=0.01199, audio_tagging_loss=0.01199, over 24750.00 frames. ], tot_loss[loss=0.01294, audio_tagging_loss=0.01294, over 4104571.09 frames. ], batch size: 99, lr: 3.01e-03, grad_scale: 32.0 2023-12-23 11:35:12,703 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.147e+01 3.467e+01 3.650e+01 3.765e+01 4.145e+01, threshold=7.301e+01, percent-clipped=0.0 2023-12-23 11:35:26,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1114533.3333333333, ans=0.125 2023-12-23 11:35:27,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1114533.3333333333, ans=0.125 2023-12-23 11:35:34,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.90 vs. limit=15.0 2023-12-23 11:35:42,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1114666.6666666667, ans=0.125 2023-12-23 11:35:46,576 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.18 vs. limit=22.5 2023-12-23 11:35:48,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1114666.6666666667, ans=0.125 2023-12-23 11:35:51,307 INFO [train.py:886] (1/4) Epoch 36, batch 400, loss[loss=0.01213, audio_tagging_loss=0.01213, over 24750.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 4291824.27 frames. ], batch size: 99, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:35:58,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1114733.3333333333, ans=0.125 2023-12-23 11:36:02,334 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.48 vs. limit=15.0 2023-12-23 11:36:10,498 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:36:13,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1114866.6666666667, ans=0.125 2023-12-23 11:36:28,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1114933.3333333333, ans=0.125 2023-12-23 11:36:33,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1115000.0, ans=0.0 2023-12-23 11:36:42,391 INFO [train.py:886] (1/4) Epoch 36, batch 450, loss[loss=0.01081, audio_tagging_loss=0.01081, over 25000.00 frames. ], tot_loss[loss=0.01232, audio_tagging_loss=0.01232, over 4441838.87 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:36:56,756 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.749e+01 3.426e+01 3.574e+01 3.756e+01 4.682e+01, threshold=7.147e+01, percent-clipped=0.0 2023-12-23 11:37:07,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1115200.0, ans=0.025 2023-12-23 11:37:09,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1115200.0, ans=0.1 2023-12-23 11:37:14,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1115266.6666666667, ans=0.2 2023-12-23 11:37:22,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1115266.6666666667, ans=0.95 2023-12-23 11:37:34,602 INFO [train.py:886] (1/4) Epoch 36, batch 500, loss[loss=0.01286, audio_tagging_loss=0.01286, over 25000.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 4553769.03 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:37:37,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1115400.0, ans=0.125 2023-12-23 11:37:46,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1115466.6666666667, ans=0.1 2023-12-23 11:38:15,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1115666.6666666667, ans=0.125 2023-12-23 11:38:26,151 INFO [train.py:886] (1/4) Epoch 36, batch 550, loss[loss=0.01184, audio_tagging_loss=0.01184, over 25000.00 frames. ], tot_loss[loss=0.01209, audio_tagging_loss=0.01209, over 4646237.57 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:38:39,184 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.207e+01 3.469e+01 3.649e+01 3.829e+01 4.187e+01, threshold=7.298e+01, percent-clipped=0.0 2023-12-23 11:38:44,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1115800.0, ans=0.125 2023-12-23 11:38:50,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1115866.6666666667, ans=0.1 2023-12-23 11:39:02,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1115933.3333333333, ans=0.04949747468305833 2023-12-23 11:39:04,436 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.29 vs. limit=6.0 2023-12-23 11:39:04,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1115933.3333333333, ans=0.2 2023-12-23 11:39:17,353 INFO [train.py:886] (1/4) Epoch 36, batch 600, loss[loss=0.01466, audio_tagging_loss=0.01466, over 24750.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 4708301.23 frames. ], batch size: 99, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:39:21,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1116066.6666666667, ans=0.1 2023-12-23 11:39:25,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1116066.6666666667, ans=0.125 2023-12-23 11:39:31,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1116133.3333333333, ans=0.95 2023-12-23 11:39:39,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1116200.0, ans=0.125 2023-12-23 11:39:44,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1116200.0, ans=0.0 2023-12-23 11:40:05,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1116333.3333333333, ans=0.125 2023-12-23 11:40:08,966 INFO [train.py:886] (1/4) Epoch 36, batch 650, loss[loss=0.01106, audio_tagging_loss=0.01106, over 24750.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 4759020.60 frames. ], batch size: 99, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:40:11,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1116400.0, ans=0.0 2023-12-23 11:40:15,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1116400.0, ans=0.125 2023-12-23 11:40:21,169 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.182e+01 3.505e+01 3.653e+01 3.781e+01 4.331e+01, threshold=7.305e+01, percent-clipped=0.0 2023-12-23 11:40:36,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1116533.3333333333, ans=0.0 2023-12-23 11:40:40,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1116600.0, ans=0.125 2023-12-23 11:40:49,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1116666.6666666667, ans=0.0 2023-12-23 11:40:52,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1116666.6666666667, ans=0.2 2023-12-23 11:41:00,084 INFO [train.py:886] (1/4) Epoch 36, batch 700, loss[loss=0.01125, audio_tagging_loss=0.01125, over 25000.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 4798586.95 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:41:16,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1116800.0, ans=0.1 2023-12-23 11:41:46,829 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:41:52,375 INFO [train.py:886] (1/4) Epoch 36, batch 750, loss[loss=0.01063, audio_tagging_loss=0.01063, over 24750.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 4831951.93 frames. ], batch size: 99, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:42:02,121 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:42:06,172 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.142e+01 3.455e+01 3.620e+01 3.726e+01 4.614e+01, threshold=7.241e+01, percent-clipped=0.0 2023-12-23 11:42:07,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1117133.3333333333, ans=0.125 2023-12-23 11:42:13,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1117200.0, ans=0.125 2023-12-23 11:42:17,695 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.41 vs. limit=22.5 2023-12-23 11:42:22,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1117200.0, ans=0.125 2023-12-23 11:42:28,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1117266.6666666667, ans=0.1 2023-12-23 11:42:45,292 INFO [train.py:886] (1/4) Epoch 36, batch 800, loss[loss=0.01154, audio_tagging_loss=0.01154, over 25000.00 frames. ], tot_loss[loss=0.01202, audio_tagging_loss=0.01202, over 4862651.34 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:42:47,627 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2023-12-23 11:42:47,685 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.96 vs. limit=15.0 2023-12-23 11:42:52,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1117400.0, ans=0.0 2023-12-23 11:42:52,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1117400.0, ans=0.07 2023-12-23 11:42:54,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1117466.6666666667, ans=0.025 2023-12-23 11:42:56,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1117466.6666666667, ans=0.1 2023-12-23 11:43:00,690 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:43:11,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1117533.3333333333, ans=0.0 2023-12-23 11:43:18,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1117600.0, ans=0.125 2023-12-23 11:43:36,797 INFO [train.py:886] (1/4) Epoch 36, batch 850, loss[loss=0.0119, audio_tagging_loss=0.0119, over 25000.00 frames. ], tot_loss[loss=0.01205, audio_tagging_loss=0.01205, over 4883731.67 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:43:38,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1117733.3333333333, ans=0.125 2023-12-23 11:43:39,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1117733.3333333333, ans=0.125 2023-12-23 11:43:46,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.37 vs. limit=15.0 2023-12-23 11:43:50,410 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.033e+01 3.502e+01 3.619e+01 3.758e+01 4.758e+01, threshold=7.237e+01, percent-clipped=0.0 2023-12-23 11:43:53,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1117800.0, ans=0.125 2023-12-23 11:44:00,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2023-12-23 11:44:04,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1117866.6666666667, ans=0.0 2023-12-23 11:44:29,657 INFO [train.py:886] (1/4) Epoch 36, batch 900, loss[loss=0.01135, audio_tagging_loss=0.01135, over 25000.00 frames. ], tot_loss[loss=0.01211, audio_tagging_loss=0.01211, over 4898831.03 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:44:33,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1118066.6666666667, ans=0.125 2023-12-23 11:45:02,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1118266.6666666667, ans=0.125 2023-12-23 11:45:13,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1118333.3333333333, ans=0.125 2023-12-23 11:45:21,018 INFO [train.py:886] (1/4) Epoch 36, batch 950, loss[loss=0.01157, audio_tagging_loss=0.01157, over 24750.00 frames. ], tot_loss[loss=0.01222, audio_tagging_loss=0.01222, over 4907197.87 frames. ], batch size: 99, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:45:24,565 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.48 vs. limit=12.0 2023-12-23 11:45:25,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1118400.0, ans=0.125 2023-12-23 11:45:34,594 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.217e+01 3.528e+01 3.631e+01 3.836e+01 4.993e+01, threshold=7.263e+01, percent-clipped=0.0 2023-12-23 11:45:53,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1118600.0, ans=0.125 2023-12-23 11:46:12,683 INFO [train.py:886] (1/4) Epoch 36, batch 1000, loss[loss=0.01109, audio_tagging_loss=0.01109, over 25000.00 frames. ], tot_loss[loss=0.01224, audio_tagging_loss=0.01224, over 4913200.99 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:46:27,816 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.50 vs. limit=6.0 2023-12-23 11:46:31,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1118800.0, ans=0.05 2023-12-23 11:46:32,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1118866.6666666667, ans=0.05 2023-12-23 11:46:44,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1118933.3333333333, ans=0.0 2023-12-23 11:47:04,984 INFO [train.py:886] (1/4) Epoch 36, batch 1050, loss[loss=0.01121, audio_tagging_loss=0.01121, over 25000.00 frames. ], tot_loss[loss=0.0121, audio_tagging_loss=0.0121, over 4923488.06 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:47:05,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1119066.6666666667, ans=0.2 2023-12-23 11:47:07,610 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.43 vs. limit=15.0 2023-12-23 11:47:08,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1119066.6666666667, ans=0.1 2023-12-23 11:47:18,040 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.121e+01 3.507e+01 3.657e+01 3.818e+01 4.217e+01, threshold=7.313e+01, percent-clipped=0.0 2023-12-23 11:47:27,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1119200.0, ans=0.0 2023-12-23 11:47:45,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1119333.3333333333, ans=0.05 2023-12-23 11:47:56,207 INFO [train.py:886] (1/4) Epoch 36, batch 1100, loss[loss=0.01109, audio_tagging_loss=0.01109, over 21855.00 frames. ], tot_loss[loss=0.01212, audio_tagging_loss=0.01212, over 4925611.16 frames. ], batch size: 107, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:48:12,011 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.097e-02 2023-12-23 11:48:14,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1119466.6666666667, ans=0.125 2023-12-23 11:48:16,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1119466.6666666667, ans=0.125 2023-12-23 11:48:38,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1119666.6666666667, ans=0.125 2023-12-23 11:48:43,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1119666.6666666667, ans=0.0 2023-12-23 11:48:45,562 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.52 vs. limit=15.0 2023-12-23 11:48:48,639 INFO [train.py:886] (1/4) Epoch 36, batch 1150, loss[loss=0.01448, audio_tagging_loss=0.01448, over 25000.00 frames. ], tot_loss[loss=0.01207, audio_tagging_loss=0.01207, over 4935473.95 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:48:53,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1119733.3333333333, ans=0.2 2023-12-23 11:49:00,923 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.046e+01 3.437e+01 3.573e+01 3.725e+01 4.671e+01, threshold=7.145e+01, percent-clipped=0.0 2023-12-23 11:49:13,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1119866.6666666667, ans=0.1 2023-12-23 11:49:16,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1119866.6666666667, ans=0.0 2023-12-23 11:49:18,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1119933.3333333333, ans=0.0 2023-12-23 11:49:23,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1119933.3333333333, ans=10.0 2023-12-23 11:49:26,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1119933.3333333333, ans=0.125 2023-12-23 11:49:39,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1120000.0, ans=0.2 2023-12-23 11:49:41,573 INFO [train.py:886] (1/4) Epoch 36, batch 1200, loss[loss=0.01315, audio_tagging_loss=0.01315, over 25000.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4939657.49 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:50:05,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1120200.0, ans=0.0 2023-12-23 11:50:12,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1120266.6666666667, ans=0.0 2023-12-23 11:50:20,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1120266.6666666667, ans=0.125 2023-12-23 11:50:32,363 INFO [train.py:886] (1/4) Epoch 36, batch 1250, loss[loss=0.009599, audio_tagging_loss=0.009599, over 24750.00 frames. ], tot_loss[loss=0.01209, audio_tagging_loss=0.01209, over 4940823.23 frames. ], batch size: 99, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:50:45,961 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.245e+01 3.454e+01 3.601e+01 3.737e+01 4.840e+01, threshold=7.203e+01, percent-clipped=0.0 2023-12-23 11:50:47,277 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.77 vs. limit=15.0 2023-12-23 11:51:24,565 INFO [train.py:886] (1/4) Epoch 36, batch 1300, loss[loss=0.01406, audio_tagging_loss=0.01406, over 24750.00 frames. ], tot_loss[loss=0.01221, audio_tagging_loss=0.01221, over 4937550.88 frames. ], batch size: 99, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:51:26,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.83 vs. limit=15.0 2023-12-23 11:51:34,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1120800.0, ans=0.0 2023-12-23 11:51:37,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1120800.0, ans=0.0 2023-12-23 11:51:43,490 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.42 vs. limit=15.0 2023-12-23 11:51:47,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1120866.6666666667, ans=0.125 2023-12-23 11:52:16,840 INFO [train.py:886] (1/4) Epoch 36, batch 1350, loss[loss=0.009677, audio_tagging_loss=0.009677, over 25000.00 frames. ], tot_loss[loss=0.01209, audio_tagging_loss=0.01209, over 4943008.96 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:52:22,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1121066.6666666667, ans=0.0 2023-12-23 11:52:26,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1121133.3333333333, ans=0.2 2023-12-23 11:52:29,817 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.021e+01 3.453e+01 3.612e+01 3.766e+01 4.357e+01, threshold=7.223e+01, percent-clipped=0.0 2023-12-23 11:52:51,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1121266.6666666667, ans=0.125 2023-12-23 11:52:58,647 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.25 vs. limit=6.0 2023-12-23 11:53:02,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1121333.3333333333, ans=0.2 2023-12-23 11:53:07,518 INFO [train.py:886] (1/4) Epoch 36, batch 1400, loss[loss=0.01245, audio_tagging_loss=0.01245, over 25000.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 4948802.52 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:53:23,551 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.74 vs. limit=10.0 2023-12-23 11:53:36,662 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.13 vs. limit=15.0 2023-12-23 11:53:37,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1121533.3333333333, ans=0.125 2023-12-23 11:53:47,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1121600.0, ans=0.07 2023-12-23 11:53:48,790 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.85 vs. limit=15.0 2023-12-23 11:53:49,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1121666.6666666667, ans=0.0 2023-12-23 11:53:50,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1121666.6666666667, ans=0.125 2023-12-23 11:53:54,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1121666.6666666667, ans=0.125 2023-12-23 11:53:55,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1121666.6666666667, ans=0.0 2023-12-23 11:53:58,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1121666.6666666667, ans=0.1 2023-12-23 11:53:59,932 INFO [train.py:886] (1/4) Epoch 36, batch 1450, loss[loss=0.01231, audio_tagging_loss=0.01231, over 24916.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4952705.53 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:54:12,996 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.148e+01 3.437e+01 3.604e+01 3.782e+01 4.556e+01, threshold=7.209e+01, percent-clipped=0.0 2023-12-23 11:54:19,537 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:54:25,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1121866.6666666667, ans=0.2 2023-12-23 11:54:27,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1121866.6666666667, ans=0.0 2023-12-23 11:54:33,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1121933.3333333333, ans=0.0 2023-12-23 11:54:38,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1121933.3333333333, ans=0.125 2023-12-23 11:54:39,707 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:54:42,862 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.65 vs. limit=12.0 2023-12-23 11:54:50,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1122066.6666666667, ans=0.2 2023-12-23 11:54:50,917 INFO [train.py:886] (1/4) Epoch 36, batch 1500, loss[loss=0.01159, audio_tagging_loss=0.01159, over 25000.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4954363.92 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 32.0 2023-12-23 11:54:51,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1122066.6666666667, ans=10.0 2023-12-23 11:55:09,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1122133.3333333333, ans=0.1 2023-12-23 11:55:30,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1122266.6666666667, ans=0.2 2023-12-23 11:55:42,814 INFO [train.py:886] (1/4) Epoch 36, batch 1550, loss[loss=0.01153, audio_tagging_loss=0.01153, over 24750.00 frames. ], tot_loss[loss=0.01205, audio_tagging_loss=0.01205, over 4949273.80 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 32.0 2023-12-23 11:55:55,144 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.215e+01 3.501e+01 3.689e+01 3.879e+01 4.418e+01, threshold=7.378e+01, percent-clipped=0.0 2023-12-23 11:55:58,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1122466.6666666667, ans=0.1 2023-12-23 11:56:18,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1122600.0, ans=0.0 2023-12-23 11:56:29,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1122666.6666666667, ans=0.1 2023-12-23 11:56:34,940 INFO [train.py:886] (1/4) Epoch 36, batch 1600, loss[loss=0.01222, audio_tagging_loss=0.01222, over 24750.00 frames. ], tot_loss[loss=0.01211, audio_tagging_loss=0.01211, over 4945855.40 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 32.0 2023-12-23 11:56:38,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1122733.3333333333, ans=0.1 2023-12-23 11:56:45,776 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=22.5 2023-12-23 11:56:50,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1122800.0, ans=0.125 2023-12-23 11:56:52,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1122800.0, ans=0.125 2023-12-23 11:56:56,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1122866.6666666667, ans=0.125 2023-12-23 11:57:04,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1122933.3333333333, ans=0.125 2023-12-23 11:57:24,975 INFO [train.py:886] (1/4) Epoch 36, batch 1650, loss[loss=0.01277, audio_tagging_loss=0.01277, over 25000.00 frames. ], tot_loss[loss=0.01209, audio_tagging_loss=0.01209, over 4943514.84 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 32.0 2023-12-23 11:57:38,148 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.059e+01 3.477e+01 3.648e+01 3.896e+01 4.999e+01, threshold=7.295e+01, percent-clipped=0.0 2023-12-23 11:57:42,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1123133.3333333333, ans=0.0 2023-12-23 11:57:47,848 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.97 vs. limit=22.5 2023-12-23 11:58:07,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1123333.3333333333, ans=0.1 2023-12-23 11:58:16,229 INFO [train.py:886] (1/4) Epoch 36, batch 1700, loss[loss=0.01278, audio_tagging_loss=0.01278, over 24750.00 frames. ], tot_loss[loss=0.01207, audio_tagging_loss=0.01207, over 4945680.73 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 32.0 2023-12-23 11:58:25,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1123466.6666666667, ans=0.125 2023-12-23 11:58:34,432 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:58:41,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1123533.3333333333, ans=0.1 2023-12-23 11:58:52,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1123600.0, ans=0.0 2023-12-23 11:59:01,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1123666.6666666667, ans=0.125 2023-12-23 11:59:05,916 INFO [train.py:886] (1/4) Epoch 36, batch 1750, loss[loss=0.01302, audio_tagging_loss=0.01302, over 25000.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4948082.38 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 32.0 2023-12-23 11:59:08,046 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.89 vs. limit=22.5 2023-12-23 11:59:20,315 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.178e+01 3.484e+01 3.617e+01 3.775e+01 4.286e+01, threshold=7.233e+01, percent-clipped=0.0 2023-12-23 11:59:34,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1123866.6666666667, ans=0.125 2023-12-23 11:59:36,018 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.80 vs. limit=6.0 2023-12-23 11:59:42,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1123933.3333333333, ans=0.0 2023-12-23 11:59:43,363 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.13 vs. limit=12.0 2023-12-23 11:59:44,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1123933.3333333333, ans=0.125 2023-12-23 11:59:44,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1123933.3333333333, ans=0.125 2023-12-23 11:59:46,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1124000.0, ans=0.125 2023-12-23 11:59:57,796 INFO [train.py:886] (1/4) Epoch 36, batch 1800, loss[loss=0.01318, audio_tagging_loss=0.01318, over 25000.00 frames. ], tot_loss[loss=0.01198, audio_tagging_loss=0.01198, over 4949552.14 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 32.0 2023-12-23 12:00:07,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1124133.3333333333, ans=0.025 2023-12-23 12:00:11,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1124133.3333333333, ans=0.1 2023-12-23 12:00:15,046 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.41 vs. limit=6.0 2023-12-23 12:00:18,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1124200.0, ans=0.0 2023-12-23 12:00:24,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1124200.0, ans=0.1 2023-12-23 12:00:26,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1124200.0, ans=0.0 2023-12-23 12:00:48,318 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.24 vs. limit=22.5 2023-12-23 12:00:48,661 INFO [train.py:886] (1/4) Epoch 36, batch 1850, loss[loss=0.01414, audio_tagging_loss=0.01414, over 24750.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4950850.33 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 32.0 2023-12-23 12:00:58,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1124466.6666666667, ans=0.035 2023-12-23 12:01:02,415 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.973e+01 3.520e+01 3.688e+01 3.897e+01 4.478e+01, threshold=7.376e+01, percent-clipped=0.0 2023-12-23 12:01:06,508 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 12:01:20,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1124600.0, ans=0.125 2023-12-23 12:01:25,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1124600.0, ans=0.125 2023-12-23 12:01:29,359 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.29 vs. limit=10.0 2023-12-23 12:01:39,399 INFO [train.py:886] (1/4) Epoch 36, batch 1900, loss[loss=0.0103, audio_tagging_loss=0.0103, over 24750.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 4945007.30 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 32.0 2023-12-23 12:01:48,066 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2023-12-23 12:01:51,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1124800.0, ans=0.125 2023-12-23 12:01:56,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1124800.0, ans=0.125 2023-12-23 12:01:59,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1124800.0, ans=0.125 2023-12-23 12:02:00,735 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.37 vs. limit=10.0 2023-12-23 12:02:26,053 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.58 vs. limit=15.0 2023-12-23 12:02:32,623 INFO [train.py:886] (1/4) Epoch 36, batch 1950, loss[loss=0.01193, audio_tagging_loss=0.01193, over 25000.00 frames. ], tot_loss[loss=0.01206, audio_tagging_loss=0.01206, over 4940643.57 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 32.0 2023-12-23 12:02:36,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1125066.6666666667, ans=0.0 2023-12-23 12:02:43,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1125133.3333333333, ans=0.125 2023-12-23 12:02:45,206 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.139e+01 3.533e+01 3.647e+01 3.862e+01 4.201e+01, threshold=7.294e+01, percent-clipped=0.0 2023-12-23 12:03:04,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=1125266.6666666667, ans=10.0 2023-12-23 12:03:06,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1125266.6666666667, ans=0.125 2023-12-23 12:03:09,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1125266.6666666667, ans=0.2 2023-12-23 12:03:15,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1125333.3333333333, ans=0.04949747468305833 2023-12-23 12:03:20,728 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.00 vs. limit=15.0 2023-12-23 12:03:24,581 INFO [train.py:886] (1/4) Epoch 36, batch 2000, loss[loss=0.01208, audio_tagging_loss=0.01208, over 25000.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4947775.43 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:03:40,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1125466.6666666667, ans=0.2 2023-12-23 12:03:57,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1125600.0, ans=0.125 2023-12-23 12:04:07,727 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.09 vs. limit=22.5 2023-12-23 12:04:14,855 INFO [train.py:886] (1/4) Epoch 36, batch 2050, loss[loss=0.01194, audio_tagging_loss=0.01194, over 24750.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4946174.83 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:04:28,473 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.177e+01 3.473e+01 3.640e+01 3.754e+01 4.279e+01, threshold=7.281e+01, percent-clipped=0.0 2023-12-23 12:04:29,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1125800.0, ans=0.125 2023-12-23 12:04:30,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1125800.0, ans=0.125 2023-12-23 12:04:34,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=1125866.6666666667, ans=10.0 2023-12-23 12:04:37,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1125866.6666666667, ans=0.0 2023-12-23 12:04:54,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1126000.0, ans=0.125 2023-12-23 12:05:06,212 INFO [train.py:886] (1/4) Epoch 36, batch 2100, loss[loss=0.0125, audio_tagging_loss=0.0125, over 25000.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 4949330.01 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:05:35,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1126200.0, ans=0.0 2023-12-23 12:05:36,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1126266.6666666667, ans=0.0 2023-12-23 12:05:37,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1126266.6666666667, ans=0.2 2023-12-23 12:05:39,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1126266.6666666667, ans=0.125 2023-12-23 12:05:53,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1126333.3333333333, ans=0.125 2023-12-23 12:05:58,009 INFO [train.py:886] (1/4) Epoch 36, batch 2150, loss[loss=0.01275, audio_tagging_loss=0.01275, over 24750.00 frames. ], tot_loss[loss=0.01191, audio_tagging_loss=0.01191, over 4943368.80 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:06:06,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1126400.0, ans=0.0 2023-12-23 12:06:11,654 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.150e+01 3.510e+01 3.673e+01 3.806e+01 4.496e+01, threshold=7.347e+01, percent-clipped=0.0 2023-12-23 12:06:20,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1126533.3333333333, ans=0.2 2023-12-23 12:06:24,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1126533.3333333333, ans=0.125 2023-12-23 12:06:25,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1126533.3333333333, ans=0.1 2023-12-23 12:06:33,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1126600.0, ans=0.5 2023-12-23 12:06:46,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1126666.6666666667, ans=0.125 2023-12-23 12:06:50,294 INFO [train.py:886] (1/4) Epoch 36, batch 2200, loss[loss=0.01038, audio_tagging_loss=0.01038, over 24750.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4943524.29 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:07:10,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.08 vs. limit=10.0 2023-12-23 12:07:11,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1126866.6666666667, ans=0.125 2023-12-23 12:07:34,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1127000.0, ans=0.125 2023-12-23 12:07:35,848 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.34 vs. limit=15.0 2023-12-23 12:07:38,503 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.96 vs. limit=15.0 2023-12-23 12:07:41,586 INFO [train.py:886] (1/4) Epoch 36, batch 2250, loss[loss=0.01303, audio_tagging_loss=0.01303, over 25000.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4941382.77 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:07:46,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1127066.6666666667, ans=0.125 2023-12-23 12:07:50,347 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.95 vs. limit=22.5 2023-12-23 12:07:54,577 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.130e+01 3.509e+01 3.634e+01 3.761e+01 4.553e+01, threshold=7.267e+01, percent-clipped=0.0 2023-12-23 12:07:57,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1127133.3333333333, ans=0.0 2023-12-23 12:08:18,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1127266.6666666667, ans=0.0 2023-12-23 12:08:33,258 INFO [train.py:886] (1/4) Epoch 36, batch 2300, loss[loss=0.01092, audio_tagging_loss=0.01092, over 25000.00 frames. ], tot_loss[loss=0.01195, audio_tagging_loss=0.01195, over 4945541.46 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:08:33,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1127400.0, ans=0.2 2023-12-23 12:08:36,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1127400.0, ans=10.0 2023-12-23 12:08:37,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1127400.0, ans=0.015 2023-12-23 12:09:21,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=1127666.6666666667, ans=0.2 2023-12-23 12:09:25,059 INFO [train.py:886] (1/4) Epoch 36, batch 2350, loss[loss=0.01028, audio_tagging_loss=0.01028, over 25000.00 frames. ], tot_loss[loss=0.01182, audio_tagging_loss=0.01182, over 4953203.17 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:09:32,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1127733.3333333333, ans=0.125 2023-12-23 12:09:39,021 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.137e+01 3.388e+01 3.533e+01 3.734e+01 4.498e+01, threshold=7.065e+01, percent-clipped=0.0 2023-12-23 12:09:50,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1127866.6666666667, ans=0.125 2023-12-23 12:09:51,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1127866.6666666667, ans=0.125 2023-12-23 12:10:13,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1128000.0, ans=0.125 2023-12-23 12:10:17,019 INFO [train.py:886] (1/4) Epoch 36, batch 2400, loss[loss=0.01273, audio_tagging_loss=0.01273, over 25000.00 frames. ], tot_loss[loss=0.01183, audio_tagging_loss=0.01183, over 4955193.07 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:10:25,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1128066.6666666667, ans=0.0 2023-12-23 12:10:28,541 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.07 vs. limit=15.0 2023-12-23 12:10:33,772 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=15.0 2023-12-23 12:10:36,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1128133.3333333333, ans=0.125 2023-12-23 12:10:50,702 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.57 vs. limit=15.0 2023-12-23 12:10:52,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1128266.6666666667, ans=0.0 2023-12-23 12:11:09,470 INFO [train.py:886] (1/4) Epoch 36, batch 2450, loss[loss=0.01156, audio_tagging_loss=0.01156, over 25000.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4956739.87 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:11:09,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.75 vs. limit=6.0 2023-12-23 12:11:12,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1128400.0, ans=0.0 2023-12-23 12:11:20,319 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=4.58 vs. limit=15.0 2023-12-23 12:11:22,549 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.115e+01 3.491e+01 3.672e+01 3.810e+01 4.386e+01, threshold=7.343e+01, percent-clipped=0.0 2023-12-23 12:11:23,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1128466.6666666667, ans=0.0 2023-12-23 12:11:32,108 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.90 vs. limit=15.0 2023-12-23 12:11:42,057 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=15.0 2023-12-23 12:11:49,079 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.85 vs. limit=15.0 2023-12-23 12:12:02,007 INFO [train.py:886] (1/4) Epoch 36, batch 2500, loss[loss=0.01189, audio_tagging_loss=0.01189, over 24750.00 frames. ], tot_loss[loss=0.01199, audio_tagging_loss=0.01199, over 4955105.07 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:12:14,433 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.39 vs. limit=10.0 2023-12-23 12:12:17,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1128800.0, ans=0.2 2023-12-23 12:12:19,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1128800.0, ans=0.125 2023-12-23 12:12:29,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.10 vs. limit=22.5 2023-12-23 12:12:52,158 INFO [train.py:886] (1/4) Epoch 36, batch 2550, loss[loss=0.0124, audio_tagging_loss=0.0124, over 24750.00 frames. ], tot_loss[loss=0.01199, audio_tagging_loss=0.01199, over 4950710.70 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:12:55,108 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.58 vs. limit=15.0 2023-12-23 12:13:01,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1129066.6666666667, ans=0.5 2023-12-23 12:13:02,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1129066.6666666667, ans=0.125 2023-12-23 12:13:02,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1129133.3333333333, ans=0.1 2023-12-23 12:13:06,438 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.215e+01 3.554e+01 3.683e+01 3.809e+01 4.296e+01, threshold=7.365e+01, percent-clipped=0.0 2023-12-23 12:13:36,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1129333.3333333333, ans=0.0 2023-12-23 12:13:44,968 INFO [train.py:886] (1/4) Epoch 36, batch 2600, loss[loss=0.0107, audio_tagging_loss=0.0107, over 24750.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4941740.98 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:13:49,264 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.15 vs. limit=15.0 2023-12-23 12:13:49,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1129400.0, ans=0.125 2023-12-23 12:13:50,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1129400.0, ans=0.0 2023-12-23 12:14:01,091 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.99 vs. limit=22.5 2023-12-23 12:14:02,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1129466.6666666667, ans=0.0 2023-12-23 12:14:07,603 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.97 vs. limit=15.0 2023-12-23 12:14:16,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1129600.0, ans=0.0 2023-12-23 12:14:20,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1129600.0, ans=0.0 2023-12-23 12:14:20,662 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.11 vs. limit=22.5 2023-12-23 12:14:22,522 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.06 vs. limit=12.0 2023-12-23 12:14:34,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1129733.3333333333, ans=0.1 2023-12-23 12:14:35,675 INFO [train.py:886] (1/4) Epoch 36, batch 2650, loss[loss=0.01183, audio_tagging_loss=0.01183, over 23996.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4947597.84 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:14:43,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1129733.3333333333, ans=0.1 2023-12-23 12:14:48,682 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.201e+01 3.506e+01 3.658e+01 3.796e+01 4.295e+01, threshold=7.317e+01, percent-clipped=0.0 2023-12-23 12:15:02,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1129866.6666666667, ans=10.0 2023-12-23 12:15:26,045 INFO [train.py:886] (1/4) Epoch 36, batch 2700, loss[loss=0.01274, audio_tagging_loss=0.01274, over 25000.00 frames. ], tot_loss[loss=0.01186, audio_tagging_loss=0.01186, over 4945548.56 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:15:29,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1130066.6666666667, ans=0.0 2023-12-23 12:15:56,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1130266.6666666667, ans=0.125 2023-12-23 12:16:05,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1130333.3333333333, ans=0.1 2023-12-23 12:16:16,482 INFO [train.py:886] (1/4) Epoch 36, batch 2750, loss[loss=0.01358, audio_tagging_loss=0.01358, over 25000.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4949450.55 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:16:29,407 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.206e+01 3.461e+01 3.594e+01 3.787e+01 4.376e+01, threshold=7.188e+01, percent-clipped=0.0 2023-12-23 12:16:29,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1130466.6666666667, ans=0.2 2023-12-23 12:16:31,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1130466.6666666667, ans=0.0 2023-12-23 12:17:06,864 INFO [train.py:886] (1/4) Epoch 36, batch 2800, loss[loss=0.01294, audio_tagging_loss=0.01294, over 25000.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 4942797.20 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:17:08,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1130733.3333333333, ans=0.1 2023-12-23 12:17:39,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1130933.3333333333, ans=0.125 2023-12-23 12:17:48,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1131000.0, ans=0.0 2023-12-23 12:17:48,779 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.38 vs. limit=22.5 2023-12-23 12:17:59,706 INFO [train.py:886] (1/4) Epoch 36, batch 2850, loss[loss=0.01094, audio_tagging_loss=0.01094, over 22138.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4936762.87 frames. ], batch size: 107, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:18:03,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1131066.6666666667, ans=0.0 2023-12-23 12:18:04,595 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 12:18:11,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1131133.3333333333, ans=0.0 2023-12-23 12:18:11,936 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.234e+01 3.495e+01 3.630e+01 3.797e+01 4.361e+01, threshold=7.259e+01, percent-clipped=0.0 2023-12-23 12:18:38,379 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.96 vs. limit=15.0 2023-12-23 12:18:46,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1131333.3333333333, ans=0.125 2023-12-23 12:18:48,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1131333.3333333333, ans=0.125 2023-12-23 12:18:52,302 INFO [train.py:886] (1/4) Epoch 36, batch 2900, loss[loss=0.01136, audio_tagging_loss=0.01136, over 22196.00 frames. ], tot_loss[loss=0.01197, audio_tagging_loss=0.01197, over 4936934.23 frames. ], batch size: 107, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:19:01,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1131400.0, ans=0.0 2023-12-23 12:19:01,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.40 vs. limit=12.0 2023-12-23 12:19:06,351 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.64 vs. limit=15.0 2023-12-23 12:19:06,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1131466.6666666667, ans=0.125 2023-12-23 12:19:08,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1131466.6666666667, ans=0.0 2023-12-23 12:19:09,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1131466.6666666667, ans=0.125 2023-12-23 12:19:15,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1131533.3333333333, ans=0.125 2023-12-23 12:19:17,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1131533.3333333333, ans=0.2 2023-12-23 12:19:38,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1131666.6666666667, ans=0.125 2023-12-23 12:19:43,639 INFO [train.py:886] (1/4) Epoch 36, batch 2950, loss[loss=0.01277, audio_tagging_loss=0.01277, over 25000.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4939731.83 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:19:50,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1131733.3333333333, ans=0.0 2023-12-23 12:19:55,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1131800.0, ans=0.1 2023-12-23 12:19:57,307 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.076e+01 3.440e+01 3.602e+01 3.791e+01 4.339e+01, threshold=7.205e+01, percent-clipped=0.0 2023-12-23 12:20:09,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1131866.6666666667, ans=0.1 2023-12-23 12:20:14,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1131933.3333333333, ans=0.025 2023-12-23 12:20:18,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1131933.3333333333, ans=0.015 2023-12-23 12:20:28,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1132000.0, ans=0.125 2023-12-23 12:20:36,176 INFO [train.py:886] (1/4) Epoch 36, batch 3000, loss[loss=0.01427, audio_tagging_loss=0.01427, over 25000.00 frames. ], tot_loss[loss=0.0119, audio_tagging_loss=0.0119, over 4944191.12 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:20:36,177 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 12:20:57,375 INFO [train.py:917] (1/4) Epoch 36, validation: loss=0.0342, audio_tagging_loss=0.0342, over 3737520.00 frames. 2023-12-23 12:20:57,376 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 12:21:19,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1132200.0, ans=0.125 2023-12-23 12:21:48,963 INFO [train.py:886] (1/4) Epoch 36, batch 3050, loss[loss=0.01106, audio_tagging_loss=0.01106, over 25000.00 frames. ], tot_loss[loss=0.01186, audio_tagging_loss=0.01186, over 4953786.37 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:22:02,617 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.131e+01 3.482e+01 3.618e+01 3.774e+01 4.339e+01, threshold=7.235e+01, percent-clipped=0.0 2023-12-23 12:22:05,961 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.62 vs. limit=15.0 2023-12-23 12:22:21,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1132600.0, ans=0.1 2023-12-23 12:22:23,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1132600.0, ans=0.0 2023-12-23 12:22:40,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1132733.3333333333, ans=0.2 2023-12-23 12:22:41,232 INFO [train.py:886] (1/4) Epoch 36, batch 3100, loss[loss=0.01332, audio_tagging_loss=0.01332, over 24750.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4954268.71 frames. ], batch size: 99, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:22:57,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1132800.0, ans=0.125 2023-12-23 12:23:09,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1132866.6666666667, ans=0.2 2023-12-23 12:23:32,814 INFO [train.py:886] (1/4) Epoch 36, batch 3150, loss[loss=0.01026, audio_tagging_loss=0.01026, over 24030.00 frames. ], tot_loss[loss=0.01206, audio_tagging_loss=0.01206, over 4945881.14 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:23:34,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1133066.6666666667, ans=0.09899494936611666 2023-12-23 12:23:42,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1133133.3333333333, ans=0.125 2023-12-23 12:23:46,312 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.140e+01 3.564e+01 3.708e+01 3.854e+01 4.503e+01, threshold=7.417e+01, percent-clipped=0.0 2023-12-23 12:24:08,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1133266.6666666667, ans=0.1 2023-12-23 12:24:23,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1133400.0, ans=0.0 2023-12-23 12:24:24,401 INFO [train.py:886] (1/4) Epoch 36, batch 3200, loss[loss=0.01119, audio_tagging_loss=0.01119, over 25000.00 frames. ], tot_loss[loss=0.012, audio_tagging_loss=0.012, over 4942964.47 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:25:01,926 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.16 vs. limit=15.0 2023-12-23 12:25:15,743 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.12 vs. limit=10.0 2023-12-23 12:25:16,145 INFO [train.py:886] (1/4) Epoch 36, batch 3250, loss[loss=0.01306, audio_tagging_loss=0.01306, over 25000.00 frames. ], tot_loss[loss=0.01192, audio_tagging_loss=0.01192, over 4937328.66 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:25:17,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1133733.3333333333, ans=0.125 2023-12-23 12:25:18,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1133733.3333333333, ans=0.125 2023-12-23 12:25:29,901 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.117e+01 3.485e+01 3.604e+01 3.735e+01 4.433e+01, threshold=7.209e+01, percent-clipped=0.0 2023-12-23 12:25:50,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1133933.3333333333, ans=0.2 2023-12-23 12:26:00,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1134000.0, ans=0.0 2023-12-23 12:26:07,788 INFO [train.py:886] (1/4) Epoch 36, batch 3300, loss[loss=0.012, audio_tagging_loss=0.012, over 24750.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4948098.67 frames. ], batch size: 99, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:26:35,146 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.03 vs. limit=15.0 2023-12-23 12:26:36,924 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.72 vs. limit=15.0 2023-12-23 12:26:52,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1134333.3333333333, ans=0.2 2023-12-23 12:26:58,105 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.21 vs. limit=15.0 2023-12-23 12:27:00,485 INFO [train.py:886] (1/4) Epoch 36, batch 3350, loss[loss=0.009598, audio_tagging_loss=0.009598, over 25000.00 frames. ], tot_loss[loss=0.01191, audio_tagging_loss=0.01191, over 4956245.08 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:27:08,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1134400.0, ans=0.2 2023-12-23 12:27:10,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1134466.6666666667, ans=0.125 2023-12-23 12:27:13,422 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.068e+01 3.479e+01 3.642e+01 3.791e+01 4.255e+01, threshold=7.283e+01, percent-clipped=0.0 2023-12-23 12:27:53,067 INFO [train.py:886] (1/4) Epoch 36, batch 3400, loss[loss=0.01072, audio_tagging_loss=0.01072, over 25000.00 frames. ], tot_loss[loss=0.01191, audio_tagging_loss=0.01191, over 4954198.45 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:27:54,346 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.53 vs. limit=15.0 2023-12-23 12:27:59,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1134733.3333333333, ans=0.0 2023-12-23 12:28:01,033 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.22 vs. limit=15.0 2023-12-23 12:28:01,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1134800.0, ans=0.125 2023-12-23 12:28:08,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1134800.0, ans=0.5 2023-12-23 12:28:38,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1135000.0, ans=0.0 2023-12-23 12:28:45,264 INFO [train.py:886] (1/4) Epoch 36, batch 3450, loss[loss=0.01357, audio_tagging_loss=0.01357, over 21773.00 frames. ], tot_loss[loss=0.01197, audio_tagging_loss=0.01197, over 4945996.54 frames. ], batch size: 107, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:28:46,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1135066.6666666667, ans=0.1 2023-12-23 12:28:47,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1135066.6666666667, ans=0.2 2023-12-23 12:28:55,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1135133.3333333333, ans=0.0 2023-12-23 12:28:58,960 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.129e+01 3.599e+01 3.745e+01 3.958e+01 4.783e+01, threshold=7.490e+01, percent-clipped=0.0 2023-12-23 12:29:00,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1135133.3333333333, ans=0.125 2023-12-23 12:29:21,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1135266.6666666667, ans=0.035 2023-12-23 12:29:27,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.24 vs. limit=15.0 2023-12-23 12:29:29,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1135333.3333333333, ans=0.125 2023-12-23 12:29:34,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1135333.3333333333, ans=0.2 2023-12-23 12:29:37,384 INFO [train.py:886] (1/4) Epoch 36, batch 3500, loss[loss=0.01182, audio_tagging_loss=0.01182, over 24750.00 frames. ], tot_loss[loss=0.01195, audio_tagging_loss=0.01195, over 4943483.54 frames. ], batch size: 99, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:29:56,272 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.38 vs. limit=15.0 2023-12-23 12:30:07,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.65 vs. limit=15.0 2023-12-23 12:30:18,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1135666.6666666667, ans=0.125 2023-12-23 12:30:18,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1135666.6666666667, ans=0.0 2023-12-23 12:30:28,990 INFO [train.py:886] (1/4) Epoch 36, batch 3550, loss[loss=0.01246, audio_tagging_loss=0.01246, over 24750.00 frames. ], tot_loss[loss=0.01183, audio_tagging_loss=0.01183, over 4942520.03 frames. ], batch size: 99, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:30:37,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1135733.3333333333, ans=0.1 2023-12-23 12:30:42,185 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 12:30:42,832 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.203e+01 3.480e+01 3.652e+01 3.812e+01 4.664e+01, threshold=7.305e+01, percent-clipped=0.0 2023-12-23 12:31:09,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1135933.3333333333, ans=0.125 2023-12-23 12:31:09,306 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.60 vs. limit=10.0 2023-12-23 12:31:09,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1136000.0, ans=0.0 2023-12-23 12:31:17,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1136000.0, ans=0.2 2023-12-23 12:31:21,219 INFO [train.py:886] (1/4) Epoch 36, batch 3600, loss[loss=0.01109, audio_tagging_loss=0.01109, over 25000.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4947814.16 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:31:23,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1136066.6666666667, ans=0.1 2023-12-23 12:31:24,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1136066.6666666667, ans=0.2 2023-12-23 12:31:34,452 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.31 vs. limit=22.5 2023-12-23 12:31:38,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1136133.3333333333, ans=0.125 2023-12-23 12:31:48,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1136200.0, ans=0.125 2023-12-23 12:32:13,783 INFO [train.py:886] (1/4) Epoch 36, batch 3650, loss[loss=0.01028, audio_tagging_loss=0.01028, over 25000.00 frames. ], tot_loss[loss=0.01167, audio_tagging_loss=0.01167, over 4953887.75 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:32:24,459 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.79 vs. limit=6.0 2023-12-23 12:32:26,860 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.117e+01 3.484e+01 3.633e+01 3.763e+01 4.234e+01, threshold=7.265e+01, percent-clipped=0.0 2023-12-23 12:32:44,548 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.61 vs. limit=15.0 2023-12-23 12:32:53,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1136600.0, ans=0.125 2023-12-23 12:33:04,712 INFO [train.py:886] (1/4) Epoch 36, batch 3700, loss[loss=0.01221, audio_tagging_loss=0.01221, over 25000.00 frames. ], tot_loss[loss=0.01172, audio_tagging_loss=0.01172, over 4958347.48 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:33:21,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1136800.0, ans=0.125 2023-12-23 12:33:22,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1136800.0, ans=0.1 2023-12-23 12:33:36,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1136933.3333333333, ans=0.1 2023-12-23 12:33:42,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1136933.3333333333, ans=0.125 2023-12-23 12:33:50,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=1137000.0, ans=0.5 2023-12-23 12:33:51,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1137000.0, ans=0.2 2023-12-23 12:33:57,649 INFO [train.py:886] (1/4) Epoch 36, batch 3750, loss[loss=0.01123, audio_tagging_loss=0.01123, over 24750.00 frames. ], tot_loss[loss=0.01184, audio_tagging_loss=0.01184, over 4950524.47 frames. ], batch size: 99, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:34:08,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1137133.3333333333, ans=0.0 2023-12-23 12:34:09,919 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.235e+01 3.553e+01 3.740e+01 3.871e+01 4.273e+01, threshold=7.479e+01, percent-clipped=0.0 2023-12-23 12:34:10,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1137133.3333333333, ans=0.1 2023-12-23 12:34:10,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.71 vs. limit=10.0 2023-12-23 12:34:16,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1137133.3333333333, ans=0.0 2023-12-23 12:34:20,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1137200.0, ans=0.125 2023-12-23 12:34:21,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1137200.0, ans=0.125 2023-12-23 12:34:28,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1137266.6666666667, ans=0.125 2023-12-23 12:34:45,270 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.50 vs. limit=15.0 2023-12-23 12:34:49,190 INFO [train.py:886] (1/4) Epoch 36, batch 3800, loss[loss=0.01278, audio_tagging_loss=0.01278, over 24750.00 frames. ], tot_loss[loss=0.01192, audio_tagging_loss=0.01192, over 4943777.50 frames. ], batch size: 99, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:34:51,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1137400.0, ans=0.125 2023-12-23 12:34:51,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1137400.0, ans=0.1 2023-12-23 12:35:06,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1137466.6666666667, ans=0.0 2023-12-23 12:35:24,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1137600.0, ans=0.2 2023-12-23 12:35:40,319 INFO [train.py:886] (1/4) Epoch 36, batch 3850, loss[loss=0.01174, audio_tagging_loss=0.01174, over 25000.00 frames. ], tot_loss[loss=0.01184, audio_tagging_loss=0.01184, over 4947680.56 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:35:44,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=1137733.3333333333, ans=10.0 2023-12-23 12:35:54,884 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.273e+01 3.476e+01 3.694e+01 3.926e+01 4.509e+01, threshold=7.388e+01, percent-clipped=0.0 2023-12-23 12:36:22,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1138000.0, ans=0.0 2023-12-23 12:36:33,081 INFO [train.py:886] (1/4) Epoch 36, batch 3900, loss[loss=0.01176, audio_tagging_loss=0.01176, over 25000.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4953825.09 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:36:38,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1138066.6666666667, ans=0.125 2023-12-23 12:36:40,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.11 vs. limit=10.0 2023-12-23 12:36:49,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1138133.3333333333, ans=0.0 2023-12-23 12:37:05,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1138266.6666666667, ans=0.125 2023-12-23 12:37:13,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1138333.3333333333, ans=0.1 2023-12-23 12:37:20,089 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.45 vs. limit=12.0 2023-12-23 12:37:20,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1138333.3333333333, ans=0.0 2023-12-23 12:37:23,438 INFO [train.py:886] (1/4) Epoch 36, batch 3950, loss[loss=0.01249, audio_tagging_loss=0.01249, over 25000.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4961299.49 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:37:23,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1138400.0, ans=0.125 2023-12-23 12:37:29,387 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.86 vs. limit=12.0 2023-12-23 12:37:37,660 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.120e+01 3.438e+01 3.584e+01 3.728e+01 4.194e+01, threshold=7.169e+01, percent-clipped=0.0 2023-12-23 12:37:42,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1138466.6666666667, ans=0.125 2023-12-23 12:37:42,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1138466.6666666667, ans=0.125 2023-12-23 12:38:07,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1138666.6666666667, ans=0.0 2023-12-23 12:38:16,575 INFO [train.py:886] (1/4) Epoch 36, batch 4000, loss[loss=0.01025, audio_tagging_loss=0.01025, over 21609.00 frames. ], tot_loss[loss=0.01172, audio_tagging_loss=0.01172, over 4953227.02 frames. ], batch size: 107, lr: 2.97e-03, grad_scale: 128.0 2023-12-23 12:38:24,361 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 12:38:25,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1138800.0, ans=0.125 2023-12-23 12:38:28,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1138800.0, ans=0.0 2023-12-23 12:38:49,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1138933.3333333333, ans=0.125 2023-12-23 12:38:54,951 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=15.0 2023-12-23 12:39:07,336 INFO [train.py:886] (1/4) Epoch 36, batch 4050, loss[loss=0.01214, audio_tagging_loss=0.01214, over 24750.00 frames. ], tot_loss[loss=0.01181, audio_tagging_loss=0.01181, over 4955236.23 frames. ], batch size: 99, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:39:19,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1139133.3333333333, ans=0.2 2023-12-23 12:39:20,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1139133.3333333333, ans=0.125 2023-12-23 12:39:22,657 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.270e+01 3.561e+01 3.698e+01 3.887e+01 4.580e+01, threshold=7.397e+01, percent-clipped=0.0 2023-12-23 12:39:48,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1139266.6666666667, ans=0.125 2023-12-23 12:39:49,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1139333.3333333333, ans=0.0 2023-12-23 12:39:59,478 INFO [train.py:886] (1/4) Epoch 36, batch 4100, loss[loss=0.01185, audio_tagging_loss=0.01185, over 25000.00 frames. ], tot_loss[loss=0.01195, audio_tagging_loss=0.01195, over 4952030.19 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:40:07,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1139400.0, ans=10.0 2023-12-23 12:40:28,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1139533.3333333333, ans=0.125 2023-12-23 12:40:31,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1139600.0, ans=0.125 2023-12-23 12:40:32,173 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.04 vs. limit=15.0 2023-12-23 12:40:34,038 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.01 vs. limit=15.0 2023-12-23 12:40:38,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1139600.0, ans=0.1 2023-12-23 12:40:49,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1139666.6666666667, ans=0.125 2023-12-23 12:40:51,223 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.88 vs. limit=15.0 2023-12-23 12:40:52,660 INFO [train.py:886] (1/4) Epoch 36, batch 4150, loss[loss=0.009158, audio_tagging_loss=0.009158, over 22613.00 frames. ], tot_loss[loss=0.01197, audio_tagging_loss=0.01197, over 4951282.19 frames. ], batch size: 107, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:41:06,034 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.179e+01 3.544e+01 3.659e+01 3.852e+01 4.683e+01, threshold=7.319e+01, percent-clipped=0.0 2023-12-23 12:41:13,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1139866.6666666667, ans=0.0 2023-12-23 12:41:36,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1140000.0, ans=0.125 2023-12-23 12:41:37,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=1140000.0, ans=0.1 2023-12-23 12:41:43,951 INFO [train.py:886] (1/4) Epoch 36, batch 4200, loss[loss=0.01141, audio_tagging_loss=0.01141, over 24750.00 frames. ], tot_loss[loss=0.0119, audio_tagging_loss=0.0119, over 4951061.98 frames. ], batch size: 99, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:41:53,679 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.39 vs. limit=22.5 2023-12-23 12:42:01,749 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.03 vs. limit=15.0 2023-12-23 12:42:12,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1140200.0, ans=0.125 2023-12-23 12:42:21,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1140266.6666666667, ans=0.125 2023-12-23 12:42:26,623 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.70 vs. limit=15.0 2023-12-23 12:42:34,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1140333.3333333333, ans=0.125 2023-12-23 12:42:35,065 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.17 vs. limit=22.5 2023-12-23 12:42:36,326 INFO [train.py:886] (1/4) Epoch 36, batch 4250, loss[loss=0.01033, audio_tagging_loss=0.01033, over 24750.00 frames. ], tot_loss[loss=0.01192, audio_tagging_loss=0.01192, over 4950493.08 frames. ], batch size: 99, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:42:37,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1140400.0, ans=10.0 2023-12-23 12:42:41,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1140400.0, ans=0.125 2023-12-23 12:42:45,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1140466.6666666667, ans=0.2 2023-12-23 12:42:45,163 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.51 vs. limit=15.0 2023-12-23 12:42:50,260 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.180e+01 3.490e+01 3.625e+01 3.785e+01 4.316e+01, threshold=7.251e+01, percent-clipped=0.0 2023-12-23 12:42:50,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1140466.6666666667, ans=0.0 2023-12-23 12:42:57,411 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.89 vs. limit=15.0 2023-12-23 12:42:58,304 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.80 vs. limit=15.0 2023-12-23 12:43:05,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.81 vs. limit=15.0 2023-12-23 12:43:09,685 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.71 vs. limit=8.0 2023-12-23 12:43:27,474 INFO [train.py:886] (1/4) Epoch 36, batch 4300, loss[loss=0.01204, audio_tagging_loss=0.01204, over 25000.00 frames. ], tot_loss[loss=0.01183, audio_tagging_loss=0.01183, over 4954600.21 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:43:54,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1140866.6666666667, ans=0.125 2023-12-23 12:44:01,152 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.11 vs. limit=12.0 2023-12-23 12:44:15,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1141000.0, ans=0.125 2023-12-23 12:44:16,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1141000.0, ans=0.125 2023-12-23 12:44:18,285 INFO [train.py:886] (1/4) Epoch 36, batch 4350, loss[loss=0.0137, audio_tagging_loss=0.0137, over 24750.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4955183.99 frames. ], batch size: 99, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:44:26,021 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.62 vs. limit=15.0 2023-12-23 12:44:32,760 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.183e+01 3.485e+01 3.639e+01 3.859e+01 4.692e+01, threshold=7.279e+01, percent-clipped=0.0 2023-12-23 12:44:48,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1141266.6666666667, ans=0.5 2023-12-23 12:45:08,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1141333.3333333333, ans=0.0 2023-12-23 12:45:08,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1141333.3333333333, ans=0.125 2023-12-23 12:45:09,870 INFO [train.py:886] (1/4) Epoch 36, batch 4400, loss[loss=0.01481, audio_tagging_loss=0.01481, over 24750.00 frames. ], tot_loss[loss=0.01201, audio_tagging_loss=0.01201, over 4944775.94 frames. ], batch size: 99, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:45:15,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1141400.0, ans=0.0 2023-12-23 12:45:36,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1141533.3333333333, ans=10.0 2023-12-23 12:45:54,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1141666.6666666667, ans=0.04949747468305833 2023-12-23 12:45:55,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1141666.6666666667, ans=0.0 2023-12-23 12:46:01,349 INFO [train.py:886] (1/4) Epoch 36, batch 4450, loss[loss=0.01302, audio_tagging_loss=0.01302, over 24750.00 frames. ], tot_loss[loss=0.01205, audio_tagging_loss=0.01205, over 4943849.34 frames. ], batch size: 99, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:46:06,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1141733.3333333333, ans=0.1 2023-12-23 12:46:15,971 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.197e+01 3.605e+01 3.750e+01 3.906e+01 4.617e+01, threshold=7.499e+01, percent-clipped=0.0 2023-12-23 12:46:42,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1142000.0, ans=0.07 2023-12-23 12:46:53,751 INFO [train.py:886] (1/4) Epoch 36, batch 4500, loss[loss=0.01081, audio_tagging_loss=0.01081, over 25000.00 frames. ], tot_loss[loss=0.01199, audio_tagging_loss=0.01199, over 4937511.92 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:46:53,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1142066.6666666667, ans=0.125 2023-12-23 12:46:56,958 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.92 vs. limit=6.0 2023-12-23 12:47:17,394 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.06 vs. limit=10.0 2023-12-23 12:47:23,775 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.48 vs. limit=15.0 2023-12-23 12:47:29,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1142266.6666666667, ans=0.125 2023-12-23 12:47:36,151 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 12:47:45,892 INFO [train.py:886] (1/4) Epoch 36, batch 4550, loss[loss=0.01, audio_tagging_loss=0.01, over 22179.00 frames. ], tot_loss[loss=0.01192, audio_tagging_loss=0.01192, over 4940938.98 frames. ], batch size: 107, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:47:46,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1142400.0, ans=0.0 2023-12-23 12:47:53,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1142400.0, ans=0.2 2023-12-23 12:47:57,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1142466.6666666667, ans=0.125 2023-12-23 12:48:00,332 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.214e+01 3.540e+01 3.639e+01 3.809e+01 4.565e+01, threshold=7.278e+01, percent-clipped=0.0 2023-12-23 12:48:02,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1142466.6666666667, ans=0.0 2023-12-23 12:48:02,830 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.48 vs. limit=15.0 2023-12-23 12:48:13,769 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.93 vs. limit=15.0 2023-12-23 12:48:13,801 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.24 vs. limit=15.0 2023-12-23 12:48:17,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1142600.0, ans=0.0 2023-12-23 12:48:37,320 INFO [train.py:886] (1/4) Epoch 36, batch 4600, loss[loss=0.01372, audio_tagging_loss=0.01372, over 25000.00 frames. ], tot_loss[loss=0.01196, audio_tagging_loss=0.01196, over 4948314.27 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:48:43,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1142733.3333333333, ans=0.125 2023-12-23 12:48:50,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1142800.0, ans=0.125 2023-12-23 12:49:07,099 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.77 vs. limit=6.0 2023-12-23 12:49:08,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1142933.3333333333, ans=0.2 2023-12-23 12:49:13,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1142933.3333333333, ans=0.125 2023-12-23 12:49:29,419 INFO [train.py:886] (1/4) Epoch 36, batch 4650, loss[loss=0.01298, audio_tagging_loss=0.01298, over 25000.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4953384.72 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:49:38,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1143133.3333333333, ans=0.125 2023-12-23 12:49:38,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1143133.3333333333, ans=0.125 2023-12-23 12:49:43,309 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.146e+01 3.508e+01 3.620e+01 3.811e+01 5.127e+01, threshold=7.240e+01, percent-clipped=0.0 2023-12-23 12:49:55,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1143200.0, ans=0.0 2023-12-23 12:49:56,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1143200.0, ans=0.125 2023-12-23 12:49:59,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1143266.6666666667, ans=0.0 2023-12-23 12:50:14,719 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.69 vs. limit=22.5 2023-12-23 12:50:19,605 INFO [train.py:886] (1/4) Epoch 36, batch 4700, loss[loss=0.01478, audio_tagging_loss=0.01478, over 24945.00 frames. ], tot_loss[loss=0.012, audio_tagging_loss=0.012, over 4954155.08 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:50:19,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1143400.0, ans=0.125 2023-12-23 12:50:37,054 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.81 vs. limit=15.0 2023-12-23 12:50:39,452 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 12:50:52,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1143600.0, ans=0.05 2023-12-23 12:51:06,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1143733.3333333333, ans=0.125 2023-12-23 12:51:06,990 INFO [train.py:886] (1/4) Epoch 36, batch 4750, loss[loss=0.01675, audio_tagging_loss=0.01675, over 24750.00 frames. ], tot_loss[loss=0.01214, audio_tagging_loss=0.01214, over 4948792.42 frames. ], batch size: 99, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:51:14,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1143733.3333333333, ans=0.0 2023-12-23 12:51:17,419 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.03 vs. limit=12.0 2023-12-23 12:51:19,670 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.214e+01 3.564e+01 3.739e+01 3.867e+01 4.563e+01, threshold=7.477e+01, percent-clipped=0.0 2023-12-23 12:51:19,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1143800.0, ans=0.0 2023-12-23 12:51:19,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1143800.0, ans=0.125 2023-12-23 12:51:42,460 INFO [train.py:886] (1/4) Epoch 37, batch 0, loss[loss=0.02885, audio_tagging_loss=0.02885, over 21450.00 frames. ], tot_loss[loss=0.02885, audio_tagging_loss=0.02885, over 21450.00 frames. ], batch size: 107, lr: 2.93e-03, grad_scale: 32.0 2023-12-23 12:51:42,461 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 12:51:52,483 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3570, 4.6432, 5.2251, 4.7402], device='cuda:1') 2023-12-23 12:52:03,035 INFO [train.py:917] (1/4) Epoch 37, validation: loss=0.03436, audio_tagging_loss=0.03436, over 3737520.00 frames. 2023-12-23 12:52:03,036 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 12:52:04,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1143840.0, ans=0.2 2023-12-23 12:52:14,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1143906.6666666667, ans=0.125 2023-12-23 12:52:16,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1143906.6666666667, ans=0.0 2023-12-23 12:52:19,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1143906.6666666667, ans=0.125 2023-12-23 12:52:31,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1143973.3333333333, ans=0.125 2023-12-23 12:52:47,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1144106.6666666667, ans=0.0 2023-12-23 12:52:53,469 INFO [train.py:886] (1/4) Epoch 37, batch 50, loss[loss=0.01401, audio_tagging_loss=0.01401, over 25000.00 frames. ], tot_loss[loss=0.01857, audio_tagging_loss=0.01857, over 1115691.43 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 12:53:04,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1144240.0, ans=0.1 2023-12-23 12:53:12,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=1144240.0, ans=0.95 2023-12-23 12:53:22,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1144306.6666666667, ans=0.125 2023-12-23 12:53:26,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1144373.3333333333, ans=0.0 2023-12-23 12:53:36,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1144440.0, ans=0.125 2023-12-23 12:53:39,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1144440.0, ans=0.0 2023-12-23 12:53:42,401 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.737e+01 4.184e+01 4.552e+01 5.178e+01 9.780e+01, threshold=9.104e+01, percent-clipped=7.0 2023-12-23 12:53:44,073 INFO [train.py:886] (1/4) Epoch 37, batch 100, loss[loss=0.01474, audio_tagging_loss=0.01474, over 25000.00 frames. ], tot_loss[loss=0.01641, audio_tagging_loss=0.01641, over 1973334.51 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 12:53:59,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1144573.3333333333, ans=0.0 2023-12-23 12:54:01,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1144573.3333333333, ans=0.1 2023-12-23 12:54:20,420 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.75 vs. limit=10.0 2023-12-23 12:54:34,845 INFO [train.py:886] (1/4) Epoch 37, batch 150, loss[loss=0.01344, audio_tagging_loss=0.01344, over 25000.00 frames. ], tot_loss[loss=0.01506, audio_tagging_loss=0.01506, over 2638138.95 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 12:55:00,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1144973.3333333333, ans=0.0 2023-12-23 12:55:14,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1145106.6666666667, ans=0.2 2023-12-23 12:55:24,538 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.092e+01 3.546e+01 3.734e+01 3.978e+01 4.632e+01, threshold=7.469e+01, percent-clipped=0.0 2023-12-23 12:55:25,503 INFO [train.py:886] (1/4) Epoch 37, batch 200, loss[loss=0.01429, audio_tagging_loss=0.01429, over 25000.00 frames. ], tot_loss[loss=0.0142, audio_tagging_loss=0.0142, over 3155970.58 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 12:55:26,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1145173.3333333333, ans=0.0 2023-12-23 12:55:29,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1145173.3333333333, ans=0.2 2023-12-23 12:55:38,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1145240.0, ans=0.125 2023-12-23 12:56:16,878 INFO [train.py:886] (1/4) Epoch 37, batch 250, loss[loss=0.01156, audio_tagging_loss=0.01156, over 25000.00 frames. ], tot_loss[loss=0.01362, audio_tagging_loss=0.01362, over 3557726.52 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 12:56:19,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1145506.6666666667, ans=0.0 2023-12-23 12:56:53,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1145706.6666666667, ans=0.125 2023-12-23 12:56:57,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1145773.3333333333, ans=0.0 2023-12-23 12:57:07,246 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.131e+01 3.485e+01 3.632e+01 3.797e+01 5.071e+01, threshold=7.264e+01, percent-clipped=0.0 2023-12-23 12:57:08,197 INFO [train.py:886] (1/4) Epoch 37, batch 300, loss[loss=0.01494, audio_tagging_loss=0.01494, over 24750.00 frames. ], tot_loss[loss=0.01329, audio_tagging_loss=0.01329, over 3866161.52 frames. ], batch size: 99, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 12:57:08,413 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 12:57:27,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1145906.6666666667, ans=0.125 2023-12-23 12:57:27,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1145906.6666666667, ans=0.125 2023-12-23 12:57:30,510 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.63 vs. limit=10.0 2023-12-23 12:57:51,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1146106.6666666667, ans=0.05 2023-12-23 12:57:59,464 INFO [train.py:886] (1/4) Epoch 37, batch 350, loss[loss=0.01257, audio_tagging_loss=0.01257, over 24750.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 4098662.29 frames. ], batch size: 99, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 12:58:09,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1146240.0, ans=0.125 2023-12-23 12:58:14,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1146240.0, ans=0.125 2023-12-23 12:58:15,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1146240.0, ans=0.125 2023-12-23 12:58:24,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1146306.6666666667, ans=0.1 2023-12-23 12:58:25,143 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.87 vs. limit=15.0 2023-12-23 12:58:26,083 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.47 vs. limit=10.0 2023-12-23 12:58:37,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1146373.3333333333, ans=0.125 2023-12-23 12:58:42,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1146440.0, ans=0.125 2023-12-23 12:58:49,556 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.203e+01 3.540e+01 3.697e+01 3.881e+01 4.233e+01, threshold=7.395e+01, percent-clipped=0.0 2023-12-23 12:58:49,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1146506.6666666667, ans=0.125 2023-12-23 12:58:50,510 INFO [train.py:886] (1/4) Epoch 37, batch 400, loss[loss=0.01067, audio_tagging_loss=0.01067, over 25000.00 frames. ], tot_loss[loss=0.01266, audio_tagging_loss=0.01266, over 4286949.24 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 12:58:52,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1146506.6666666667, ans=0.0 2023-12-23 12:58:57,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1146506.6666666667, ans=0.1 2023-12-23 12:59:18,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1146640.0, ans=0.125 2023-12-23 12:59:19,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1146640.0, ans=0.1 2023-12-23 12:59:22,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1146706.6666666667, ans=0.1 2023-12-23 12:59:27,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1146706.6666666667, ans=0.95 2023-12-23 12:59:37,538 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.48 vs. limit=15.0 2023-12-23 12:59:37,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.46 vs. limit=15.0 2023-12-23 12:59:38,657 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.74 vs. limit=15.0 2023-12-23 12:59:43,041 INFO [train.py:886] (1/4) Epoch 37, batch 450, loss[loss=0.01305, audio_tagging_loss=0.01305, over 25000.00 frames. ], tot_loss[loss=0.01244, audio_tagging_loss=0.01244, over 4437375.54 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 12:59:55,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1146906.6666666667, ans=0.125 2023-12-23 12:59:57,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1146906.6666666667, ans=0.1 2023-12-23 13:00:33,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1147106.6666666667, ans=0.125 2023-12-23 13:00:34,449 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.088e+01 3.542e+01 3.669e+01 3.831e+01 4.789e+01, threshold=7.338e+01, percent-clipped=0.0 2023-12-23 13:00:35,453 INFO [train.py:886] (1/4) Epoch 37, batch 500, loss[loss=0.01453, audio_tagging_loss=0.01453, over 25000.00 frames. ], tot_loss[loss=0.01221, audio_tagging_loss=0.01221, over 4556478.44 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:00:36,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1147173.3333333333, ans=0.125 2023-12-23 13:00:54,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1147240.0, ans=0.04949747468305833 2023-12-23 13:01:06,204 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.75 vs. limit=6.0 2023-12-23 13:01:07,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1147373.3333333333, ans=0.07 2023-12-23 13:01:11,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1147373.3333333333, ans=0.125 2023-12-23 13:01:27,942 INFO [train.py:886] (1/4) Epoch 37, batch 550, loss[loss=0.01234, audio_tagging_loss=0.01234, over 25000.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 4643676.54 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:01:29,317 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.04 vs. limit=15.0 2023-12-23 13:01:43,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1147573.3333333333, ans=0.1 2023-12-23 13:01:43,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1147573.3333333333, ans=0.125 2023-12-23 13:02:02,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=1147706.6666666667, ans=0.02 2023-12-23 13:02:04,312 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 13:02:13,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1147773.3333333333, ans=0.125 2023-12-23 13:02:14,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1147773.3333333333, ans=0.125 2023-12-23 13:02:17,067 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.258e+01 3.550e+01 3.713e+01 3.833e+01 4.389e+01, threshold=7.427e+01, percent-clipped=0.0 2023-12-23 13:02:18,068 INFO [train.py:886] (1/4) Epoch 37, batch 600, loss[loss=0.0114, audio_tagging_loss=0.0114, over 24750.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4710953.17 frames. ], batch size: 99, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:02:18,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1147840.0, ans=0.0 2023-12-23 13:02:21,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1147840.0, ans=0.125 2023-12-23 13:02:36,728 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.14 vs. limit=15.0 2023-12-23 13:02:38,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1147973.3333333333, ans=0.125 2023-12-23 13:02:49,735 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=15.0 2023-12-23 13:02:54,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1148040.0, ans=0.125 2023-12-23 13:03:02,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1148106.6666666667, ans=0.125 2023-12-23 13:03:03,648 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.82 vs. limit=12.0 2023-12-23 13:03:10,623 INFO [train.py:886] (1/4) Epoch 37, batch 650, loss[loss=0.01282, audio_tagging_loss=0.01282, over 22390.00 frames. ], tot_loss[loss=0.01224, audio_tagging_loss=0.01224, over 4756061.99 frames. ], batch size: 107, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:03:16,647 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.31 vs. limit=6.0 2023-12-23 13:03:47,950 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 13:03:48,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1148373.3333333333, ans=0.125 2023-12-23 13:04:00,810 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.245e+01 3.542e+01 3.681e+01 3.830e+01 5.017e+01, threshold=7.361e+01, percent-clipped=0.0 2023-12-23 13:04:01,834 INFO [train.py:886] (1/4) Epoch 37, batch 700, loss[loss=0.01177, audio_tagging_loss=0.01177, over 24750.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4794793.52 frames. ], batch size: 99, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:04:11,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1148506.6666666667, ans=0.07 2023-12-23 13:04:19,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1148573.3333333333, ans=0.035 2023-12-23 13:04:25,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1148640.0, ans=10.0 2023-12-23 13:04:27,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1148640.0, ans=0.125 2023-12-23 13:04:32,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1148706.6666666667, ans=0.0 2023-12-23 13:04:35,242 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.71 vs. limit=15.0 2023-12-23 13:04:48,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1148773.3333333333, ans=10.0 2023-12-23 13:04:48,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1148773.3333333333, ans=0.125 2023-12-23 13:04:50,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1148773.3333333333, ans=0.125 2023-12-23 13:04:54,211 INFO [train.py:886] (1/4) Epoch 37, batch 750, loss[loss=0.01139, audio_tagging_loss=0.01139, over 25000.00 frames. ], tot_loss[loss=0.01206, audio_tagging_loss=0.01206, over 4826386.51 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:04:57,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1148840.0, ans=0.0 2023-12-23 13:04:57,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1148840.0, ans=0.1 2023-12-23 13:04:59,219 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2023-12-23 13:05:08,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1148906.6666666667, ans=0.125 2023-12-23 13:05:33,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1149040.0, ans=0.2 2023-12-23 13:05:38,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1149106.6666666667, ans=0.0 2023-12-23 13:05:45,671 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.160e+01 3.496e+01 3.641e+01 3.857e+01 4.376e+01, threshold=7.282e+01, percent-clipped=0.0 2023-12-23 13:05:46,708 INFO [train.py:886] (1/4) Epoch 37, batch 800, loss[loss=0.009799, audio_tagging_loss=0.009799, over 25000.00 frames. ], tot_loss[loss=0.01202, audio_tagging_loss=0.01202, over 4847961.85 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:06:10,239 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 13:06:28,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1149440.0, ans=0.1 2023-12-23 13:06:28,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1149440.0, ans=0.1 2023-12-23 13:06:33,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=1149440.0, ans=0.02 2023-12-23 13:06:38,682 INFO [train.py:886] (1/4) Epoch 37, batch 850, loss[loss=0.0118, audio_tagging_loss=0.0118, over 25000.00 frames. ], tot_loss[loss=0.01196, audio_tagging_loss=0.01196, over 4873562.84 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:06:48,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1149506.6666666667, ans=0.0 2023-12-23 13:06:54,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1149573.3333333333, ans=0.125 2023-12-23 13:07:11,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1149706.6666666667, ans=0.0 2023-12-23 13:07:29,569 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.137e+01 3.523e+01 3.659e+01 3.806e+01 4.931e+01, threshold=7.318e+01, percent-clipped=0.0 2023-12-23 13:07:30,536 INFO [train.py:886] (1/4) Epoch 37, batch 900, loss[loss=0.009736, audio_tagging_loss=0.009736, over 23957.00 frames. ], tot_loss[loss=0.01191, audio_tagging_loss=0.01191, over 4890419.41 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:07:39,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1149840.0, ans=0.125 2023-12-23 13:07:50,994 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.75 vs. limit=15.0 2023-12-23 13:07:53,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=1149973.3333333333, ans=22.5 2023-12-23 13:07:58,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1149973.3333333333, ans=0.125 2023-12-23 13:08:16,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1150106.6666666667, ans=0.2 2023-12-23 13:08:18,072 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.74 vs. limit=15.0 2023-12-23 13:08:23,567 INFO [train.py:886] (1/4) Epoch 37, batch 950, loss[loss=0.0128, audio_tagging_loss=0.0128, over 24750.00 frames. ], tot_loss[loss=0.01201, audio_tagging_loss=0.01201, over 4901700.00 frames. ], batch size: 99, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:08:35,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1150240.0, ans=0.1 2023-12-23 13:08:47,486 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.27 vs. limit=15.0 2023-12-23 13:09:14,549 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.245e+01 3.535e+01 3.650e+01 3.854e+01 4.325e+01, threshold=7.300e+01, percent-clipped=0.0 2023-12-23 13:09:14,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1150506.6666666667, ans=0.2 2023-12-23 13:09:15,524 INFO [train.py:886] (1/4) Epoch 37, batch 1000, loss[loss=0.01237, audio_tagging_loss=0.01237, over 24750.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4910060.55 frames. ], batch size: 99, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:09:20,866 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.15 vs. limit=10.0 2023-12-23 13:09:32,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1150573.3333333333, ans=0.125 2023-12-23 13:09:41,246 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=22.5 2023-12-23 13:09:47,284 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.22 vs. limit=15.0 2023-12-23 13:09:52,844 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.24 vs. limit=15.0 2023-12-23 13:10:04,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1150773.3333333333, ans=0.125 2023-12-23 13:10:07,174 INFO [train.py:886] (1/4) Epoch 37, batch 1050, loss[loss=0.01278, audio_tagging_loss=0.01278, over 25000.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4915364.75 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:10:14,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1150840.0, ans=0.0 2023-12-23 13:10:23,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1150906.6666666667, ans=0.07 2023-12-23 13:10:24,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1150906.6666666667, ans=0.0 2023-12-23 13:10:26,063 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.81 vs. limit=15.0 2023-12-23 13:10:31,510 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.77 vs. limit=22.5 2023-12-23 13:10:55,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1151106.6666666667, ans=0.0 2023-12-23 13:10:57,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=15.0 2023-12-23 13:10:58,214 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.215e+01 3.461e+01 3.648e+01 3.824e+01 4.846e+01, threshold=7.297e+01, percent-clipped=0.0 2023-12-23 13:10:59,203 INFO [train.py:886] (1/4) Epoch 37, batch 1100, loss[loss=0.01291, audio_tagging_loss=0.01291, over 24750.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 4922462.99 frames. ], batch size: 99, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:11:13,377 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.86 vs. limit=15.0 2023-12-23 13:11:28,077 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 13:11:30,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1151373.3333333333, ans=0.2 2023-12-23 13:11:32,848 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 13:11:50,172 INFO [train.py:886] (1/4) Epoch 37, batch 1150, loss[loss=0.0124, audio_tagging_loss=0.0124, over 25000.00 frames. ], tot_loss[loss=0.0119, audio_tagging_loss=0.0119, over 4929978.88 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:11:54,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1151506.6666666667, ans=0.125 2023-12-23 13:12:24,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1151706.6666666667, ans=0.125 2023-12-23 13:12:26,824 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.86 vs. limit=22.5 2023-12-23 13:12:40,753 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.217e+01 3.496e+01 3.649e+01 3.812e+01 4.517e+01, threshold=7.299e+01, percent-clipped=0.0 2023-12-23 13:12:42,421 INFO [train.py:886] (1/4) Epoch 37, batch 1200, loss[loss=0.01143, audio_tagging_loss=0.01143, over 25000.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4940644.44 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:12:55,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1151906.6666666667, ans=0.125 2023-12-23 13:13:06,135 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.83 vs. limit=22.5 2023-12-23 13:13:08,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1151973.3333333333, ans=0.0 2023-12-23 13:13:08,882 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 13:13:34,078 INFO [train.py:886] (1/4) Epoch 37, batch 1250, loss[loss=0.01431, audio_tagging_loss=0.01431, over 24750.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4938979.47 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:13:43,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=1152240.0, ans=0.2 2023-12-23 13:14:06,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1152373.3333333333, ans=0.0 2023-12-23 13:14:23,394 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.12 vs. limit=22.5 2023-12-23 13:14:24,775 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.171e+01 3.559e+01 3.736e+01 3.887e+01 4.435e+01, threshold=7.472e+01, percent-clipped=0.0 2023-12-23 13:14:25,751 INFO [train.py:886] (1/4) Epoch 37, batch 1300, loss[loss=0.01165, audio_tagging_loss=0.01165, over 24750.00 frames. ], tot_loss[loss=0.01201, audio_tagging_loss=0.01201, over 4938643.78 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:14:31,754 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.10 vs. limit=15.0 2023-12-23 13:14:32,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1152506.6666666667, ans=0.0 2023-12-23 13:14:43,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1152573.3333333333, ans=0.125 2023-12-23 13:14:51,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=1152640.0, ans=10.0 2023-12-23 13:15:05,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1152706.6666666667, ans=0.0 2023-12-23 13:15:18,425 INFO [train.py:886] (1/4) Epoch 37, batch 1350, loss[loss=0.009173, audio_tagging_loss=0.009173, over 25000.00 frames. ], tot_loss[loss=0.01197, audio_tagging_loss=0.01197, over 4937500.55 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:15:32,060 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.66 vs. limit=15.0 2023-12-23 13:15:46,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1152973.3333333333, ans=0.0 2023-12-23 13:15:50,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=1153040.0, ans=0.5 2023-12-23 13:15:59,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1153106.6666666667, ans=0.125 2023-12-23 13:16:00,556 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.12 vs. limit=15.0 2023-12-23 13:16:02,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1153106.6666666667, ans=0.2 2023-12-23 13:16:08,517 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.118e+01 3.523e+01 3.683e+01 3.854e+01 4.444e+01, threshold=7.366e+01, percent-clipped=0.0 2023-12-23 13:16:10,191 INFO [train.py:886] (1/4) Epoch 37, batch 1400, loss[loss=0.01094, audio_tagging_loss=0.01094, over 25000.00 frames. ], tot_loss[loss=0.0119, audio_tagging_loss=0.0119, over 4940933.57 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:16:13,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1153173.3333333333, ans=0.0 2023-12-23 13:16:16,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1153173.3333333333, ans=0.1 2023-12-23 13:16:17,234 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2023-12-23 13:16:19,569 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 13:16:25,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1153240.0, ans=0.125 2023-12-23 13:16:39,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1153306.6666666667, ans=0.125 2023-12-23 13:16:47,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1153373.3333333333, ans=0.1 2023-12-23 13:16:51,899 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.88 vs. limit=15.0 2023-12-23 13:16:54,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1153440.0, ans=0.0 2023-12-23 13:16:58,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1153440.0, ans=0.2 2023-12-23 13:17:02,930 INFO [train.py:886] (1/4) Epoch 37, batch 1450, loss[loss=0.0123, audio_tagging_loss=0.0123, over 25000.00 frames. ], tot_loss[loss=0.01184, audio_tagging_loss=0.01184, over 4948834.94 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:17:05,222 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2023-12-23 13:17:07,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1153506.6666666667, ans=0.1 2023-12-23 13:17:13,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1153573.3333333333, ans=0.1 2023-12-23 13:17:36,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1153706.6666666667, ans=0.125 2023-12-23 13:17:48,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.58 vs. limit=6.0 2023-12-23 13:17:49,899 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.39 vs. limit=15.0 2023-12-23 13:17:53,071 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.154e+01 3.474e+01 3.653e+01 3.847e+01 4.349e+01, threshold=7.306e+01, percent-clipped=0.0 2023-12-23 13:17:54,037 INFO [train.py:886] (1/4) Epoch 37, batch 1500, loss[loss=0.01155, audio_tagging_loss=0.01155, over 25000.00 frames. ], tot_loss[loss=0.01197, audio_tagging_loss=0.01197, over 4952761.17 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:18:06,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1153906.6666666667, ans=0.2 2023-12-23 13:18:08,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1153906.6666666667, ans=0.125 2023-12-23 13:18:12,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1153906.6666666667, ans=0.125 2023-12-23 13:18:13,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1153906.6666666667, ans=0.2 2023-12-23 13:18:15,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1153973.3333333333, ans=0.125 2023-12-23 13:18:24,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1154040.0, ans=0.125 2023-12-23 13:18:29,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1154040.0, ans=0.04949747468305833 2023-12-23 13:18:40,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1154106.6666666667, ans=0.07 2023-12-23 13:18:41,439 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2023-12-23 13:18:42,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1154106.6666666667, ans=0.0 2023-12-23 13:18:46,618 INFO [train.py:886] (1/4) Epoch 37, batch 1550, loss[loss=0.01081, audio_tagging_loss=0.01081, over 24750.00 frames. ], tot_loss[loss=0.01199, audio_tagging_loss=0.01199, over 4948326.08 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:18:49,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1154173.3333333333, ans=0.0 2023-12-23 13:18:52,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1154173.3333333333, ans=0.04949747468305833 2023-12-23 13:19:37,585 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.188e+01 3.596e+01 3.735e+01 3.939e+01 4.850e+01, threshold=7.470e+01, percent-clipped=0.0 2023-12-23 13:19:37,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1154506.6666666667, ans=0.2 2023-12-23 13:19:39,222 INFO [train.py:886] (1/4) Epoch 37, batch 1600, loss[loss=0.01136, audio_tagging_loss=0.01136, over 24750.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4942101.85 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:19:41,619 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.94 vs. limit=22.5 2023-12-23 13:19:54,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1154573.3333333333, ans=0.125 2023-12-23 13:19:57,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1154640.0, ans=0.0 2023-12-23 13:20:13,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1154706.6666666667, ans=0.1 2023-12-23 13:20:27,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1154773.3333333333, ans=0.2 2023-12-23 13:20:28,549 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.30 vs. limit=15.0 2023-12-23 13:20:29,953 INFO [train.py:886] (1/4) Epoch 37, batch 1650, loss[loss=0.01155, audio_tagging_loss=0.01155, over 25000.00 frames. ], tot_loss[loss=0.01197, audio_tagging_loss=0.01197, over 4941612.15 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:20:31,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1154840.0, ans=0.125 2023-12-23 13:20:32,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1154840.0, ans=0.125 2023-12-23 13:20:40,300 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 13:20:45,475 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 13:20:47,650 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.20 vs. limit=15.0 2023-12-23 13:20:56,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1154973.3333333333, ans=0.125 2023-12-23 13:21:01,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1155040.0, ans=0.1 2023-12-23 13:21:10,494 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.40 vs. limit=15.0 2023-12-23 13:21:10,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1155106.6666666667, ans=0.125 2023-12-23 13:21:20,726 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.000e+01 3.491e+01 3.659e+01 3.859e+01 4.664e+01, threshold=7.317e+01, percent-clipped=0.0 2023-12-23 13:21:21,690 INFO [train.py:886] (1/4) Epoch 37, batch 1700, loss[loss=0.01417, audio_tagging_loss=0.01417, over 24750.00 frames. ], tot_loss[loss=0.01191, audio_tagging_loss=0.01191, over 4939028.93 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:21:24,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1155173.3333333333, ans=0.0 2023-12-23 13:21:36,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1155240.0, ans=0.0 2023-12-23 13:21:38,998 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.23 vs. limit=15.0 2023-12-23 13:22:06,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1155440.0, ans=0.2 2023-12-23 13:22:11,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1155506.6666666667, ans=0.125 2023-12-23 13:22:12,591 INFO [train.py:886] (1/4) Epoch 37, batch 1750, loss[loss=0.01066, audio_tagging_loss=0.01066, over 25000.00 frames. ], tot_loss[loss=0.01181, audio_tagging_loss=0.01181, over 4946095.04 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:22:16,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1155506.6666666667, ans=0.2 2023-12-23 13:22:16,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1155506.6666666667, ans=0.125 2023-12-23 13:22:19,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1155506.6666666667, ans=0.0 2023-12-23 13:22:19,953 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.93 vs. limit=15.0 2023-12-23 13:22:30,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1155573.3333333333, ans=0.125 2023-12-23 13:22:40,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1155640.0, ans=0.0 2023-12-23 13:22:42,457 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=15.0 2023-12-23 13:22:50,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1155706.6666666667, ans=0.125 2023-12-23 13:23:00,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1155773.3333333333, ans=0.125 2023-12-23 13:23:03,627 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.050e+01 3.504e+01 3.681e+01 3.884e+01 4.385e+01, threshold=7.361e+01, percent-clipped=0.0 2023-12-23 13:23:04,636 INFO [train.py:886] (1/4) Epoch 37, batch 1800, loss[loss=0.01088, audio_tagging_loss=0.01088, over 25000.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4946925.14 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:23:17,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1155906.6666666667, ans=0.5 2023-12-23 13:23:23,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1155906.6666666667, ans=0.125 2023-12-23 13:23:28,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1155973.3333333333, ans=0.1 2023-12-23 13:23:49,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1156106.6666666667, ans=0.0 2023-12-23 13:23:56,196 INFO [train.py:886] (1/4) Epoch 37, batch 1850, loss[loss=0.01531, audio_tagging_loss=0.01531, over 24948.00 frames. ], tot_loss[loss=0.01183, audio_tagging_loss=0.01183, over 4943198.89 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:23:57,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1156173.3333333333, ans=0.0 2023-12-23 13:24:05,964 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.36 vs. limit=15.0 2023-12-23 13:24:06,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1156240.0, ans=0.0 2023-12-23 13:24:06,795 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.15 vs. limit=15.0 2023-12-23 13:24:11,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1156240.0, ans=0.0 2023-12-23 13:24:30,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1156373.3333333333, ans=0.0 2023-12-23 13:24:46,828 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.290e+01 3.589e+01 3.735e+01 3.882e+01 5.249e+01, threshold=7.471e+01, percent-clipped=0.0 2023-12-23 13:24:47,864 INFO [train.py:886] (1/4) Epoch 37, batch 1900, loss[loss=0.01051, audio_tagging_loss=0.01051, over 24750.00 frames. ], tot_loss[loss=0.012, audio_tagging_loss=0.012, over 4935266.70 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:24:57,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1156573.3333333333, ans=0.2 2023-12-23 13:25:04,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1156573.3333333333, ans=15.0 2023-12-23 13:25:33,106 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.93 vs. limit=15.0 2023-12-23 13:25:36,148 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.90 vs. limit=15.0 2023-12-23 13:25:39,109 INFO [train.py:886] (1/4) Epoch 37, batch 1950, loss[loss=0.01096, audio_tagging_loss=0.01096, over 25000.00 frames. ], tot_loss[loss=0.01201, audio_tagging_loss=0.01201, over 4938163.16 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:25:57,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1156906.6666666667, ans=0.0 2023-12-23 13:26:06,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1156973.3333333333, ans=0.125 2023-12-23 13:26:28,622 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.224e+01 3.487e+01 3.716e+01 3.886e+01 4.600e+01, threshold=7.432e+01, percent-clipped=0.0 2023-12-23 13:26:30,339 INFO [train.py:886] (1/4) Epoch 37, batch 2000, loss[loss=0.01153, audio_tagging_loss=0.01153, over 24750.00 frames. ], tot_loss[loss=0.01191, audio_tagging_loss=0.01191, over 4944903.88 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 64.0 2023-12-23 13:26:42,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1157240.0, ans=0.1 2023-12-23 13:26:51,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1157306.6666666667, ans=0.125 2023-12-23 13:26:52,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1157306.6666666667, ans=0.0 2023-12-23 13:26:54,246 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.26 vs. limit=22.5 2023-12-23 13:27:00,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1157373.3333333333, ans=0.125 2023-12-23 13:27:13,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1157440.0, ans=0.125 2023-12-23 13:27:21,489 INFO [train.py:886] (1/4) Epoch 37, batch 2050, loss[loss=0.008993, audio_tagging_loss=0.008993, over 25000.00 frames. ], tot_loss[loss=0.01183, audio_tagging_loss=0.01183, over 4947383.55 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 64.0 2023-12-23 13:27:46,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1157640.0, ans=0.05 2023-12-23 13:27:49,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1157640.0, ans=0.125 2023-12-23 13:27:50,138 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.16 vs. limit=15.0 2023-12-23 13:27:56,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1157706.6666666667, ans=0.0 2023-12-23 13:27:58,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1157706.6666666667, ans=0.125 2023-12-23 13:28:06,219 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.98 vs. limit=22.5 2023-12-23 13:28:10,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1157773.3333333333, ans=0.125 2023-12-23 13:28:12,075 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.053e+01 3.452e+01 3.576e+01 3.817e+01 4.679e+01, threshold=7.151e+01, percent-clipped=0.0 2023-12-23 13:28:13,069 INFO [train.py:886] (1/4) Epoch 37, batch 2100, loss[loss=0.0115, audio_tagging_loss=0.0115, over 25000.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4948554.45 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 64.0 2023-12-23 13:28:17,502 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2023-12-23 13:28:20,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1157840.0, ans=0.125 2023-12-23 13:28:36,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1157973.3333333333, ans=0.125 2023-12-23 13:28:41,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1157973.3333333333, ans=0.1 2023-12-23 13:28:41,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1157973.3333333333, ans=0.0 2023-12-23 13:28:49,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1158040.0, ans=0.125 2023-12-23 13:29:04,797 INFO [train.py:886] (1/4) Epoch 37, batch 2150, loss[loss=0.01444, audio_tagging_loss=0.01444, over 24750.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4952872.76 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 64.0 2023-12-23 13:29:05,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1158173.3333333333, ans=0.0 2023-12-23 13:29:13,010 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 13:29:33,179 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.97 vs. limit=10.0 2023-12-23 13:29:43,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1158373.3333333333, ans=0.125 2023-12-23 13:29:54,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1158440.0, ans=0.0 2023-12-23 13:29:55,697 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.073e+01 3.588e+01 3.736e+01 3.896e+01 4.550e+01, threshold=7.472e+01, percent-clipped=0.0 2023-12-23 13:29:56,692 INFO [train.py:886] (1/4) Epoch 37, batch 2200, loss[loss=0.01202, audio_tagging_loss=0.01202, over 24750.00 frames. ], tot_loss[loss=0.01198, audio_tagging_loss=0.01198, over 4950795.28 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 64.0 2023-12-23 13:30:21,915 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.08 vs. limit=15.0 2023-12-23 13:30:31,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1158706.6666666667, ans=0.0 2023-12-23 13:30:49,051 INFO [train.py:886] (1/4) Epoch 37, batch 2250, loss[loss=0.01151, audio_tagging_loss=0.01151, over 24017.00 frames. ], tot_loss[loss=0.01202, audio_tagging_loss=0.01202, over 4939322.33 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 64.0 2023-12-23 13:30:49,556 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.04 vs. limit=22.5 2023-12-23 13:30:54,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1158840.0, ans=0.0 2023-12-23 13:30:56,150 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.23 vs. limit=12.0 2023-12-23 13:31:00,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1158906.6666666667, ans=0.0 2023-12-23 13:31:09,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.68 vs. limit=22.5 2023-12-23 13:31:33,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1159106.6666666667, ans=0.0 2023-12-23 13:31:38,835 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.103e+01 3.551e+01 3.685e+01 3.844e+01 4.557e+01, threshold=7.371e+01, percent-clipped=0.0 2023-12-23 13:31:40,560 INFO [train.py:886] (1/4) Epoch 37, batch 2300, loss[loss=0.009496, audio_tagging_loss=0.009496, over 24750.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4945993.93 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 64.0 2023-12-23 13:31:41,061 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.02 vs. limit=22.5 2023-12-23 13:31:42,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1159173.3333333333, ans=0.2 2023-12-23 13:31:42,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1159173.3333333333, ans=0.0 2023-12-23 13:31:49,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1159173.3333333333, ans=0.07 2023-12-23 13:32:31,786 INFO [train.py:886] (1/4) Epoch 37, batch 2350, loss[loss=0.01306, audio_tagging_loss=0.01306, over 25000.00 frames. ], tot_loss[loss=0.01181, audio_tagging_loss=0.01181, over 4947281.37 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 64.0 2023-12-23 13:32:34,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1159506.6666666667, ans=0.125 2023-12-23 13:33:12,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1159773.3333333333, ans=0.0 2023-12-23 13:33:19,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1159773.3333333333, ans=0.1 2023-12-23 13:33:21,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1159773.3333333333, ans=0.0 2023-12-23 13:33:22,665 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.090e+01 3.485e+01 3.642e+01 3.779e+01 4.463e+01, threshold=7.283e+01, percent-clipped=0.0 2023-12-23 13:33:23,708 INFO [train.py:886] (1/4) Epoch 37, batch 2400, loss[loss=0.0113, audio_tagging_loss=0.0113, over 25000.00 frames. ], tot_loss[loss=0.01176, audio_tagging_loss=0.01176, over 4950151.09 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 64.0 2023-12-23 13:33:27,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1159840.0, ans=0.1 2023-12-23 13:33:39,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1159906.6666666667, ans=0.125 2023-12-23 13:33:39,401 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.56 vs. limit=15.0 2023-12-23 13:33:45,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1159973.3333333333, ans=0.025 2023-12-23 13:33:46,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1159973.3333333333, ans=0.1 2023-12-23 13:34:00,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1160040.0, ans=0.125 2023-12-23 13:34:04,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1160106.6666666667, ans=0.125 2023-12-23 13:34:05,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1160106.6666666667, ans=0.0 2023-12-23 13:34:07,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1160106.6666666667, ans=0.125 2023-12-23 13:34:14,809 INFO [train.py:886] (1/4) Epoch 37, batch 2450, loss[loss=0.01251, audio_tagging_loss=0.01251, over 25000.00 frames. ], tot_loss[loss=0.01177, audio_tagging_loss=0.01177, over 4954192.64 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:34:46,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1160373.3333333333, ans=0.0 2023-12-23 13:34:47,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1160373.3333333333, ans=0.0 2023-12-23 13:34:58,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1160440.0, ans=0.1 2023-12-23 13:34:59,301 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.81 vs. limit=15.0 2023-12-23 13:35:00,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=1160440.0, ans=10.0 2023-12-23 13:35:06,194 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.251e+01 3.588e+01 3.750e+01 3.881e+01 5.588e+01, threshold=7.500e+01, percent-clipped=0.0 2023-12-23 13:35:07,156 INFO [train.py:886] (1/4) Epoch 37, batch 2500, loss[loss=0.0131, audio_tagging_loss=0.0131, over 24750.00 frames. ], tot_loss[loss=0.0119, audio_tagging_loss=0.0119, over 4950053.54 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:35:15,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.63 vs. limit=15.0 2023-12-23 13:35:23,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1160573.3333333333, ans=0.2 2023-12-23 13:35:26,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1160640.0, ans=0.125 2023-12-23 13:35:38,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1160706.6666666667, ans=0.1 2023-12-23 13:35:40,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1160706.6666666667, ans=0.2 2023-12-23 13:35:46,667 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.05 vs. limit=15.0 2023-12-23 13:35:49,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1160773.3333333333, ans=0.1 2023-12-23 13:35:57,200 INFO [train.py:886] (1/4) Epoch 37, batch 2550, loss[loss=0.01512, audio_tagging_loss=0.01512, over 24750.00 frames. ], tot_loss[loss=0.01211, audio_tagging_loss=0.01211, over 4944461.59 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:35:59,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1160840.0, ans=0.0 2023-12-23 13:36:13,483 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.53 vs. limit=10.0 2023-12-23 13:36:44,788 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=22.5 2023-12-23 13:36:45,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1161106.6666666667, ans=0.0 2023-12-23 13:36:48,130 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.252e+01 3.558e+01 3.722e+01 3.971e+01 4.498e+01, threshold=7.443e+01, percent-clipped=0.0 2023-12-23 13:36:49,158 INFO [train.py:886] (1/4) Epoch 37, batch 2600, loss[loss=0.01173, audio_tagging_loss=0.01173, over 22609.00 frames. ], tot_loss[loss=0.012, audio_tagging_loss=0.012, over 4944875.71 frames. ], batch size: 107, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:37:18,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1161306.6666666667, ans=0.09899494936611666 2023-12-23 13:37:20,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1161373.3333333333, ans=0.125 2023-12-23 13:37:42,245 INFO [train.py:886] (1/4) Epoch 37, batch 2650, loss[loss=0.009883, audio_tagging_loss=0.009883, over 21756.00 frames. ], tot_loss[loss=0.01184, audio_tagging_loss=0.01184, over 4942431.70 frames. ], batch size: 107, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:37:43,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1161506.6666666667, ans=0.0 2023-12-23 13:37:45,704 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.25 vs. limit=15.0 2023-12-23 13:37:46,492 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.19 vs. limit=15.0 2023-12-23 13:37:51,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1161573.3333333333, ans=0.2 2023-12-23 13:38:00,766 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.45 vs. limit=22.5 2023-12-23 13:38:05,122 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.90 vs. limit=15.0 2023-12-23 13:38:05,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1161640.0, ans=0.125 2023-12-23 13:38:12,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1161706.6666666667, ans=0.1 2023-12-23 13:38:31,342 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.155e+01 3.489e+01 3.610e+01 3.806e+01 4.521e+01, threshold=7.219e+01, percent-clipped=0.0 2023-12-23 13:38:32,318 INFO [train.py:886] (1/4) Epoch 37, batch 2700, loss[loss=0.0132, audio_tagging_loss=0.0132, over 24750.00 frames. ], tot_loss[loss=0.01179, audio_tagging_loss=0.01179, over 4945394.11 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:38:33,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1161840.0, ans=0.0 2023-12-23 13:38:58,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1161973.3333333333, ans=0.125 2023-12-23 13:39:04,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1162040.0, ans=0.1 2023-12-23 13:39:05,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1162040.0, ans=0.0 2023-12-23 13:39:25,532 INFO [train.py:886] (1/4) Epoch 37, batch 2750, loss[loss=0.01473, audio_tagging_loss=0.01473, over 25000.00 frames. ], tot_loss[loss=0.01172, audio_tagging_loss=0.01172, over 4944799.46 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:39:33,559 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.14 vs. limit=15.0 2023-12-23 13:39:49,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1162306.6666666667, ans=0.125 2023-12-23 13:40:08,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=1162440.0, ans=0.5 2023-12-23 13:40:15,310 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.145e+01 3.537e+01 3.670e+01 3.855e+01 4.195e+01, threshold=7.340e+01, percent-clipped=0.0 2023-12-23 13:40:16,316 INFO [train.py:886] (1/4) Epoch 37, batch 2800, loss[loss=0.009716, audio_tagging_loss=0.009716, over 23993.00 frames. ], tot_loss[loss=0.0118, audio_tagging_loss=0.0118, over 4945402.21 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:40:31,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1162573.3333333333, ans=0.0 2023-12-23 13:40:40,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1162640.0, ans=0.125 2023-12-23 13:41:08,999 INFO [train.py:886] (1/4) Epoch 37, batch 2850, loss[loss=0.01223, audio_tagging_loss=0.01223, over 24750.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4943946.65 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:41:21,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1162906.6666666667, ans=0.0 2023-12-23 13:41:24,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1162906.6666666667, ans=0.0 2023-12-23 13:41:28,501 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.69 vs. limit=15.0 2023-12-23 13:41:32,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1162973.3333333333, ans=0.125 2023-12-23 13:41:48,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1163040.0, ans=0.2 2023-12-23 13:41:51,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1163106.6666666667, ans=0.125 2023-12-23 13:41:52,878 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2023-12-23 13:41:56,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1163106.6666666667, ans=0.125 2023-12-23 13:42:00,274 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.241e+01 3.535e+01 3.716e+01 3.872e+01 4.379e+01, threshold=7.432e+01, percent-clipped=0.0 2023-12-23 13:42:01,223 INFO [train.py:886] (1/4) Epoch 37, batch 2900, loss[loss=0.01127, audio_tagging_loss=0.01127, over 24750.00 frames. ], tot_loss[loss=0.01182, audio_tagging_loss=0.01182, over 4944289.52 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:42:02,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1163173.3333333333, ans=0.0 2023-12-23 13:42:05,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1163173.3333333333, ans=0.125 2023-12-23 13:42:05,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1163173.3333333333, ans=0.1 2023-12-23 13:42:06,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1163173.3333333333, ans=0.125 2023-12-23 13:42:09,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1163173.3333333333, ans=0.0 2023-12-23 13:42:09,948 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.92 vs. limit=15.0 2023-12-23 13:42:39,176 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.93 vs. limit=22.5 2023-12-23 13:42:52,807 INFO [train.py:886] (1/4) Epoch 37, batch 2950, loss[loss=0.01004, audio_tagging_loss=0.01004, over 25000.00 frames. ], tot_loss[loss=0.0117, audio_tagging_loss=0.0117, over 4945765.74 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:43:02,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1163506.6666666667, ans=0.1 2023-12-23 13:43:08,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1163573.3333333333, ans=0.125 2023-12-23 13:43:25,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1163706.6666666667, ans=0.0 2023-12-23 13:43:31,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1163706.6666666667, ans=0.1 2023-12-23 13:43:31,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1163706.6666666667, ans=0.125 2023-12-23 13:43:34,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1163773.3333333333, ans=0.125 2023-12-23 13:43:36,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1163773.3333333333, ans=0.125 2023-12-23 13:43:39,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1163773.3333333333, ans=0.0 2023-12-23 13:43:43,102 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.214e+01 3.508e+01 3.692e+01 3.793e+01 4.892e+01, threshold=7.383e+01, percent-clipped=0.0 2023-12-23 13:43:44,824 INFO [train.py:886] (1/4) Epoch 37, batch 3000, loss[loss=0.0118, audio_tagging_loss=0.0118, over 24750.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4951030.58 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:43:44,824 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 13:43:58,316 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.7499, 5.8858, 5.3078, 5.5870], device='cuda:1') 2023-12-23 13:44:04,223 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.9871, 3.3879, 3.8129, 3.8800], device='cuda:1') 2023-12-23 13:44:05,925 INFO [train.py:917] (1/4) Epoch 37, validation: loss=0.03402, audio_tagging_loss=0.03402, over 3737520.00 frames. 2023-12-23 13:44:05,925 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 13:44:12,086 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.90 vs. limit=15.0 2023-12-23 13:44:13,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1163840.0, ans=0.1 2023-12-23 13:44:25,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1163973.3333333333, ans=0.125 2023-12-23 13:44:42,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1164040.0, ans=0.125 2023-12-23 13:44:49,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1164106.6666666667, ans=15.0 2023-12-23 13:44:52,119 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.28 vs. limit=15.0 2023-12-23 13:44:55,669 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.13 vs. limit=15.0 2023-12-23 13:44:57,776 INFO [train.py:886] (1/4) Epoch 37, batch 3050, loss[loss=0.01144, audio_tagging_loss=0.01144, over 25000.00 frames. ], tot_loss[loss=0.01176, audio_tagging_loss=0.01176, over 4952195.50 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:44:57,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1164173.3333333333, ans=0.1 2023-12-23 13:45:04,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1164173.3333333333, ans=0.125 2023-12-23 13:45:13,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1164240.0, ans=0.2 2023-12-23 13:45:33,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1164373.3333333333, ans=0.0 2023-12-23 13:45:35,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1164373.3333333333, ans=0.2 2023-12-23 13:45:37,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1164440.0, ans=0.2 2023-12-23 13:45:44,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1164440.0, ans=0.05 2023-12-23 13:45:48,485 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.196e+01 3.531e+01 3.694e+01 3.920e+01 4.438e+01, threshold=7.387e+01, percent-clipped=0.0 2023-12-23 13:45:49,442 INFO [train.py:886] (1/4) Epoch 37, batch 3100, loss[loss=0.01099, audio_tagging_loss=0.01099, over 24750.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4948141.18 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:46:16,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1164640.0, ans=0.1 2023-12-23 13:46:41,324 INFO [train.py:886] (1/4) Epoch 37, batch 3150, loss[loss=0.0134, audio_tagging_loss=0.0134, over 24750.00 frames. ], tot_loss[loss=0.01186, audio_tagging_loss=0.01186, over 4944558.99 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:46:47,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.11 vs. limit=15.0 2023-12-23 13:47:02,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1164973.3333333333, ans=0.125 2023-12-23 13:47:03,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1164973.3333333333, ans=0.125 2023-12-23 13:47:06,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1164973.3333333333, ans=0.0 2023-12-23 13:47:18,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1165040.0, ans=0.125 2023-12-23 13:47:24,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1165106.6666666667, ans=0.125 2023-12-23 13:47:32,504 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.188e+01 3.567e+01 3.724e+01 3.935e+01 4.508e+01, threshold=7.447e+01, percent-clipped=0.0 2023-12-23 13:47:33,499 INFO [train.py:886] (1/4) Epoch 37, batch 3200, loss[loss=0.01132, audio_tagging_loss=0.01132, over 25000.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4943245.11 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:47:52,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1165240.0, ans=0.2 2023-12-23 13:47:57,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1165306.6666666667, ans=0.125 2023-12-23 13:48:01,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1165306.6666666667, ans=0.2 2023-12-23 13:48:25,167 INFO [train.py:886] (1/4) Epoch 37, batch 3250, loss[loss=0.01074, audio_tagging_loss=0.01074, over 25000.00 frames. ], tot_loss[loss=0.01182, audio_tagging_loss=0.01182, over 4944799.04 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:48:37,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1165573.3333333333, ans=0.2 2023-12-23 13:48:42,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1165573.3333333333, ans=0.125 2023-12-23 13:48:44,263 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.12 vs. limit=22.5 2023-12-23 13:48:53,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1165640.0, ans=0.125 2023-12-23 13:48:55,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1165706.6666666667, ans=0.1 2023-12-23 13:48:55,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1165706.6666666667, ans=0.125 2023-12-23 13:48:58,130 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.68 vs. limit=22.5 2023-12-23 13:49:01,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1165706.6666666667, ans=0.125 2023-12-23 13:49:02,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1165706.6666666667, ans=0.125 2023-12-23 13:49:15,392 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.154e+01 3.509e+01 3.653e+01 3.824e+01 4.316e+01, threshold=7.305e+01, percent-clipped=0.0 2023-12-23 13:49:16,377 INFO [train.py:886] (1/4) Epoch 37, batch 3300, loss[loss=0.0126, audio_tagging_loss=0.0126, over 24750.00 frames. ], tot_loss[loss=0.01179, audio_tagging_loss=0.01179, over 4944280.12 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:50:05,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1166106.6666666667, ans=0.0 2023-12-23 13:50:07,856 INFO [train.py:886] (1/4) Epoch 37, batch 3350, loss[loss=0.01271, audio_tagging_loss=0.01271, over 25000.00 frames. ], tot_loss[loss=0.01176, audio_tagging_loss=0.01176, over 4948376.12 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:50:28,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1166306.6666666667, ans=0.0 2023-12-23 13:50:29,137 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 13:50:31,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1166306.6666666667, ans=0.2 2023-12-23 13:50:34,463 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.82 vs. limit=15.0 2023-12-23 13:50:52,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1166440.0, ans=0.125 2023-12-23 13:50:53,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1166440.0, ans=0.125 2023-12-23 13:50:56,573 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2023-12-23 13:50:58,583 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.110e+01 3.562e+01 3.721e+01 3.851e+01 4.497e+01, threshold=7.442e+01, percent-clipped=0.0 2023-12-23 13:51:00,229 INFO [train.py:886] (1/4) Epoch 37, batch 3400, loss[loss=0.01251, audio_tagging_loss=0.01251, over 25000.00 frames. ], tot_loss[loss=0.0118, audio_tagging_loss=0.0118, over 4948803.89 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:51:11,210 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.76 vs. limit=22.5 2023-12-23 13:51:14,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1166573.3333333333, ans=0.125 2023-12-23 13:51:36,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1166706.6666666667, ans=0.125 2023-12-23 13:51:48,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1166773.3333333333, ans=0.0 2023-12-23 13:51:50,564 INFO [train.py:886] (1/4) Epoch 37, batch 3450, loss[loss=0.01513, audio_tagging_loss=0.01513, over 24750.00 frames. ], tot_loss[loss=0.01183, audio_tagging_loss=0.01183, over 4939794.04 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:52:04,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1166906.6666666667, ans=0.0 2023-12-23 13:52:21,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1167040.0, ans=0.125 2023-12-23 13:52:34,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1167106.6666666667, ans=0.2 2023-12-23 13:52:41,674 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.255e+01 3.580e+01 3.707e+01 3.907e+01 4.394e+01, threshold=7.413e+01, percent-clipped=0.0 2023-12-23 13:52:42,665 INFO [train.py:886] (1/4) Epoch 37, batch 3500, loss[loss=0.0105, audio_tagging_loss=0.0105, over 24750.00 frames. ], tot_loss[loss=0.01195, audio_tagging_loss=0.01195, over 4943516.37 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:52:50,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1167173.3333333333, ans=0.0 2023-12-23 13:52:50,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1167173.3333333333, ans=0.0 2023-12-23 13:52:58,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.85 vs. limit=22.5 2023-12-23 13:53:00,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1167240.0, ans=0.0 2023-12-23 13:53:07,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=1167306.6666666667, ans=0.5 2023-12-23 13:53:08,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1167306.6666666667, ans=0.05 2023-12-23 13:53:35,209 INFO [train.py:886] (1/4) Epoch 37, batch 3550, loss[loss=0.01113, audio_tagging_loss=0.01113, over 25000.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4946583.22 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 32.0 2023-12-23 13:54:00,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1167640.0, ans=0.125 2023-12-23 13:54:14,993 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.38 vs. limit=15.0 2023-12-23 13:54:16,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1167773.3333333333, ans=0.0 2023-12-23 13:54:26,592 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.024e+01 3.522e+01 3.659e+01 3.853e+01 4.426e+01, threshold=7.319e+01, percent-clipped=0.0 2023-12-23 13:54:26,618 INFO [train.py:886] (1/4) Epoch 37, batch 3600, loss[loss=0.01147, audio_tagging_loss=0.01147, over 25000.00 frames. ], tot_loss[loss=0.01176, audio_tagging_loss=0.01176, over 4950744.95 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 32.0 2023-12-23 13:54:54,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1167973.3333333333, ans=0.125 2023-12-23 13:55:01,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=1168040.0, ans=0.025 2023-12-23 13:55:01,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1168040.0, ans=0.1 2023-12-23 13:55:03,810 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.08 vs. limit=22.5 2023-12-23 13:55:07,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1168106.6666666667, ans=0.125 2023-12-23 13:55:08,065 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.08 vs. limit=15.0 2023-12-23 13:55:20,003 INFO [train.py:886] (1/4) Epoch 37, batch 3650, loss[loss=0.01101, audio_tagging_loss=0.01101, over 25000.00 frames. ], tot_loss[loss=0.01172, audio_tagging_loss=0.01172, over 4953038.84 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 13:55:24,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1168173.3333333333, ans=0.0 2023-12-23 13:55:51,731 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.99 vs. limit=15.0 2023-12-23 13:55:56,679 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=15.0 2023-12-23 13:55:58,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1168373.3333333333, ans=0.125 2023-12-23 13:55:59,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1168373.3333333333, ans=0.125 2023-12-23 13:56:00,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1168440.0, ans=0.0 2023-12-23 13:56:11,231 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.231e+01 3.538e+01 3.706e+01 3.858e+01 4.224e+01, threshold=7.412e+01, percent-clipped=0.0 2023-12-23 13:56:11,257 INFO [train.py:886] (1/4) Epoch 37, batch 3700, loss[loss=0.01402, audio_tagging_loss=0.01402, over 24750.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4948669.60 frames. ], batch size: 99, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 13:56:37,748 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.76 vs. limit=15.0 2023-12-23 13:56:58,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1168773.3333333333, ans=0.1 2023-12-23 13:57:01,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1168840.0, ans=0.125 2023-12-23 13:57:02,253 INFO [train.py:886] (1/4) Epoch 37, batch 3750, loss[loss=0.01139, audio_tagging_loss=0.01139, over 24750.00 frames. ], tot_loss[loss=0.01183, audio_tagging_loss=0.01183, over 4952923.36 frames. ], batch size: 99, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 13:57:02,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1168840.0, ans=0.2 2023-12-23 13:57:04,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1168840.0, ans=0.2 2023-12-23 13:57:13,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1168906.6666666667, ans=0.125 2023-12-23 13:57:19,547 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.09 vs. limit=15.0 2023-12-23 13:57:26,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1168973.3333333333, ans=0.1 2023-12-23 13:57:35,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1169040.0, ans=0.125 2023-12-23 13:57:37,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1169040.0, ans=0.125 2023-12-23 13:57:39,640 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=15.0 2023-12-23 13:57:41,535 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.39 vs. limit=12.0 2023-12-23 13:57:44,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1169106.6666666667, ans=0.125 2023-12-23 13:57:45,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=1169106.6666666667, ans=0.2 2023-12-23 13:57:46,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1169106.6666666667, ans=0.07 2023-12-23 13:57:52,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1169106.6666666667, ans=0.125 2023-12-23 13:57:54,661 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.286e+01 3.646e+01 3.784e+01 3.939e+01 4.535e+01, threshold=7.569e+01, percent-clipped=0.0 2023-12-23 13:57:54,686 INFO [train.py:886] (1/4) Epoch 37, batch 3800, loss[loss=0.01195, audio_tagging_loss=0.01195, over 24750.00 frames. ], tot_loss[loss=0.01196, audio_tagging_loss=0.01196, over 4953749.87 frames. ], batch size: 99, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 13:57:54,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1169173.3333333333, ans=0.125 2023-12-23 13:58:00,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1169173.3333333333, ans=0.125 2023-12-23 13:58:01,601 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.93 vs. limit=22.5 2023-12-23 13:58:03,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1169173.3333333333, ans=0.09899494936611666 2023-12-23 13:58:07,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1169240.0, ans=0.0 2023-12-23 13:58:07,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1169240.0, ans=0.125 2023-12-23 13:58:22,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=1169306.6666666667, ans=15.0 2023-12-23 13:58:25,052 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.56 vs. limit=6.0 2023-12-23 13:58:42,541 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.56 vs. limit=22.5 2023-12-23 13:58:45,923 INFO [train.py:886] (1/4) Epoch 37, batch 3850, loss[loss=0.01397, audio_tagging_loss=0.01397, over 25000.00 frames. ], tot_loss[loss=0.01191, audio_tagging_loss=0.01191, over 4951184.07 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 13:58:53,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1169506.6666666667, ans=0.0 2023-12-23 13:58:57,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.55 vs. limit=22.5 2023-12-23 13:59:04,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1169573.3333333333, ans=0.1 2023-12-23 13:59:08,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1169640.0, ans=0.125 2023-12-23 13:59:14,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1169640.0, ans=0.0 2023-12-23 13:59:21,908 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 13:59:25,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1169706.6666666667, ans=0.125 2023-12-23 13:59:29,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1169773.3333333333, ans=0.1 2023-12-23 13:59:39,047 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.134e+01 3.574e+01 3.713e+01 3.863e+01 4.498e+01, threshold=7.426e+01, percent-clipped=0.0 2023-12-23 13:59:39,074 INFO [train.py:886] (1/4) Epoch 37, batch 3900, loss[loss=0.01312, audio_tagging_loss=0.01312, over 25000.00 frames. ], tot_loss[loss=0.01183, audio_tagging_loss=0.01183, over 4952154.32 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 13:59:42,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1169840.0, ans=0.0 2023-12-23 13:59:55,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1169906.6666666667, ans=0.0 2023-12-23 13:59:57,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1169906.6666666667, ans=0.1 2023-12-23 14:00:06,346 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.46 vs. limit=12.0 2023-12-23 14:00:06,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1169973.3333333333, ans=0.1 2023-12-23 14:00:10,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1170040.0, ans=0.0 2023-12-23 14:00:12,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.52 vs. limit=15.0 2023-12-23 14:00:15,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1170040.0, ans=0.125 2023-12-23 14:00:18,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1170106.6666666667, ans=0.0 2023-12-23 14:00:29,162 INFO [train.py:886] (1/4) Epoch 37, batch 3950, loss[loss=0.01246, audio_tagging_loss=0.01246, over 25000.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4958783.71 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:00:31,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1170173.3333333333, ans=0.0 2023-12-23 14:01:04,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1170373.3333333333, ans=0.125 2023-12-23 14:01:04,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1170373.3333333333, ans=0.125 2023-12-23 14:01:09,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1170373.3333333333, ans=0.125 2023-12-23 14:01:21,650 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.229e+01 3.552e+01 3.697e+01 3.830e+01 4.529e+01, threshold=7.395e+01, percent-clipped=0.0 2023-12-23 14:01:21,675 INFO [train.py:886] (1/4) Epoch 37, batch 4000, loss[loss=0.01064, audio_tagging_loss=0.01064, over 25000.00 frames. ], tot_loss[loss=0.01171, audio_tagging_loss=0.01171, over 4956608.88 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:01:33,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1170573.3333333333, ans=0.0 2023-12-23 14:01:53,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1170706.6666666667, ans=0.0 2023-12-23 14:02:13,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1170840.0, ans=0.125 2023-12-23 14:02:14,052 INFO [train.py:886] (1/4) Epoch 37, batch 4050, loss[loss=0.01184, audio_tagging_loss=0.01184, over 24750.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4951834.76 frames. ], batch size: 99, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:02:17,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1170840.0, ans=0.125 2023-12-23 14:02:25,586 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.82 vs. limit=22.5 2023-12-23 14:02:30,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1170906.6666666667, ans=0.125 2023-12-23 14:02:33,135 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.06 vs. limit=22.5 2023-12-23 14:02:46,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=1171040.0, ans=22.5 2023-12-23 14:02:47,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1171040.0, ans=0.0 2023-12-23 14:02:51,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1171040.0, ans=0.125 2023-12-23 14:02:58,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1171106.6666666667, ans=0.1 2023-12-23 14:03:06,581 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.265e+01 3.643e+01 3.788e+01 3.914e+01 4.371e+01, threshold=7.577e+01, percent-clipped=0.0 2023-12-23 14:03:06,607 INFO [train.py:886] (1/4) Epoch 37, batch 4100, loss[loss=0.01241, audio_tagging_loss=0.01241, over 24750.00 frames. ], tot_loss[loss=0.0118, audio_tagging_loss=0.0118, over 4948395.52 frames. ], batch size: 99, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:03:17,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1171240.0, ans=0.125 2023-12-23 14:03:22,620 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 14:03:59,160 INFO [train.py:886] (1/4) Epoch 37, batch 4150, loss[loss=0.01222, audio_tagging_loss=0.01222, over 25000.00 frames. ], tot_loss[loss=0.01178, audio_tagging_loss=0.01178, over 4945087.77 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:04:50,304 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.188e+01 3.521e+01 3.688e+01 3.874e+01 5.030e+01, threshold=7.375e+01, percent-clipped=0.0 2023-12-23 14:04:50,329 INFO [train.py:886] (1/4) Epoch 37, batch 4200, loss[loss=0.01415, audio_tagging_loss=0.01415, over 24750.00 frames. ], tot_loss[loss=0.01178, audio_tagging_loss=0.01178, over 4947214.81 frames. ], batch size: 99, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:04:54,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1171840.0, ans=0.125 2023-12-23 14:05:03,567 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.73 vs. limit=12.0 2023-12-23 14:05:16,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1171973.3333333333, ans=0.125 2023-12-23 14:05:42,852 INFO [train.py:886] (1/4) Epoch 37, batch 4250, loss[loss=0.01112, audio_tagging_loss=0.01112, over 25000.00 frames. ], tot_loss[loss=0.01171, audio_tagging_loss=0.01171, over 4954981.71 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:05:43,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1172173.3333333333, ans=0.125 2023-12-23 14:05:54,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1172240.0, ans=0.125 2023-12-23 14:05:56,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1172240.0, ans=0.0 2023-12-23 14:06:00,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1172240.0, ans=0.1 2023-12-23 14:06:05,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1172306.6666666667, ans=0.1 2023-12-23 14:06:06,289 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.26 vs. limit=15.0 2023-12-23 14:06:16,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1172373.3333333333, ans=0.0 2023-12-23 14:06:25,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1172440.0, ans=0.1 2023-12-23 14:06:35,037 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.171e+01 3.544e+01 3.699e+01 3.834e+01 4.591e+01, threshold=7.397e+01, percent-clipped=0.0 2023-12-23 14:06:35,063 INFO [train.py:886] (1/4) Epoch 37, batch 4300, loss[loss=0.01147, audio_tagging_loss=0.01147, over 25000.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4958559.59 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:06:41,153 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.43 vs. limit=15.0 2023-12-23 14:06:45,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1172573.3333333333, ans=0.2 2023-12-23 14:06:51,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1172573.3333333333, ans=0.125 2023-12-23 14:06:54,116 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.23 vs. limit=15.0 2023-12-23 14:07:03,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1172640.0, ans=0.125 2023-12-23 14:07:11,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.34 vs. limit=15.0 2023-12-23 14:07:17,216 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=15.0 2023-12-23 14:07:19,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1172773.3333333333, ans=0.0 2023-12-23 14:07:26,729 INFO [train.py:886] (1/4) Epoch 37, batch 4350, loss[loss=0.01138, audio_tagging_loss=0.01138, over 25000.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4958004.53 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:07:30,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1172840.0, ans=0.125 2023-12-23 14:07:33,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1172840.0, ans=0.1 2023-12-23 14:07:37,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1172906.6666666667, ans=0.1 2023-12-23 14:07:51,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1172973.3333333333, ans=0.125 2023-12-23 14:08:10,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1173106.6666666667, ans=0.125 2023-12-23 14:08:18,190 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.237e+01 3.582e+01 3.726e+01 3.912e+01 4.794e+01, threshold=7.453e+01, percent-clipped=0.0 2023-12-23 14:08:18,215 INFO [train.py:886] (1/4) Epoch 37, batch 4400, loss[loss=0.01112, audio_tagging_loss=0.01112, over 24750.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4950134.98 frames. ], batch size: 99, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:08:24,921 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=15.0 2023-12-23 14:08:40,434 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.51 vs. limit=10.0 2023-12-23 14:08:42,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1173306.6666666667, ans=0.125 2023-12-23 14:08:47,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1173306.6666666667, ans=0.035 2023-12-23 14:08:50,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1173373.3333333333, ans=0.125 2023-12-23 14:09:13,017 INFO [train.py:886] (1/4) Epoch 37, batch 4450, loss[loss=0.01163, audio_tagging_loss=0.01163, over 24750.00 frames. ], tot_loss[loss=0.01191, audio_tagging_loss=0.01191, over 4945836.32 frames. ], batch size: 99, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:09:13,455 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.26 vs. limit=10.0 2023-12-23 14:09:14,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1173506.6666666667, ans=0.1 2023-12-23 14:09:21,646 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.36 vs. limit=10.0 2023-12-23 14:09:25,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1173573.3333333333, ans=0.2 2023-12-23 14:09:27,087 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 14:09:38,199 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.49 vs. limit=15.0 2023-12-23 14:10:05,056 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.150e+01 3.567e+01 3.732e+01 3.907e+01 4.248e+01, threshold=7.464e+01, percent-clipped=0.0 2023-12-23 14:10:05,083 INFO [train.py:886] (1/4) Epoch 37, batch 4500, loss[loss=0.01378, audio_tagging_loss=0.01378, over 25000.00 frames. ], tot_loss[loss=0.01177, audio_tagging_loss=0.01177, over 4941219.92 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:10:09,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1173840.0, ans=0.125 2023-12-23 14:10:20,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1173906.6666666667, ans=0.125 2023-12-23 14:10:41,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=1174040.0, ans=15.0 2023-12-23 14:10:54,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1174106.6666666667, ans=0.0 2023-12-23 14:10:56,695 INFO [train.py:886] (1/4) Epoch 37, batch 4550, loss[loss=0.01371, audio_tagging_loss=0.01371, over 25000.00 frames. ], tot_loss[loss=0.01177, audio_tagging_loss=0.01177, over 4948911.66 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:11:02,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1174173.3333333333, ans=0.0 2023-12-23 14:11:03,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1174173.3333333333, ans=0.07 2023-12-23 14:11:10,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1174240.0, ans=0.0 2023-12-23 14:11:35,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1174373.3333333333, ans=0.0 2023-12-23 14:11:39,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1174440.0, ans=0.125 2023-12-23 14:11:47,706 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.143e+01 3.524e+01 3.710e+01 3.904e+01 4.910e+01, threshold=7.420e+01, percent-clipped=0.0 2023-12-23 14:11:47,732 INFO [train.py:886] (1/4) Epoch 37, batch 4600, loss[loss=0.01267, audio_tagging_loss=0.01267, over 25000.00 frames. ], tot_loss[loss=0.01177, audio_tagging_loss=0.01177, over 4950342.99 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:11:55,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1174506.6666666667, ans=0.0 2023-12-23 14:12:02,068 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.25 vs. limit=15.0 2023-12-23 14:12:40,250 INFO [train.py:886] (1/4) Epoch 37, batch 4650, loss[loss=0.01549, audio_tagging_loss=0.01549, over 24938.00 frames. ], tot_loss[loss=0.0118, audio_tagging_loss=0.0118, over 4954062.01 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:13:02,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1174973.3333333333, ans=0.0 2023-12-23 14:13:04,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1174973.3333333333, ans=0.0 2023-12-23 14:13:04,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1174973.3333333333, ans=0.0 2023-12-23 14:13:07,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1174973.3333333333, ans=0.0 2023-12-23 14:13:15,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1175040.0, ans=0.125 2023-12-23 14:13:23,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.97 vs. limit=22.5 2023-12-23 14:13:24,041 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 14:13:31,795 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.187e+01 3.524e+01 3.714e+01 3.861e+01 4.972e+01, threshold=7.428e+01, percent-clipped=0.0 2023-12-23 14:13:31,821 INFO [train.py:886] (1/4) Epoch 37, batch 4700, loss[loss=0.01163, audio_tagging_loss=0.01163, over 24750.00 frames. ], tot_loss[loss=0.01186, audio_tagging_loss=0.01186, over 4956815.97 frames. ], batch size: 99, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:13:41,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=1175240.0, ans=0.2 2023-12-23 14:14:18,216 INFO [train.py:886] (1/4) Epoch 37, batch 4750, loss[loss=0.01281, audio_tagging_loss=0.01281, over 24750.00 frames. ], tot_loss[loss=0.01195, audio_tagging_loss=0.01195, over 4953493.71 frames. ], batch size: 99, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:14:23,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1175506.6666666667, ans=0.0 2023-12-23 14:14:30,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1175573.3333333333, ans=0.0 2023-12-23 14:14:52,795 INFO [train.py:886] (1/4) Epoch 38, batch 0, loss[loss=0.02464, audio_tagging_loss=0.02464, over 25000.00 frames. ], tot_loss[loss=0.02464, audio_tagging_loss=0.02464, over 25000.00 frames. ], batch size: 100, lr: 2.85e-03, grad_scale: 32.0 2023-12-23 14:14:52,796 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 14:15:14,007 INFO [train.py:917] (1/4) Epoch 38, validation: loss=0.03366, audio_tagging_loss=0.03366, over 3737520.00 frames. 2023-12-23 14:15:14,008 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 14:15:17,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1175613.3333333333, ans=0.125 2023-12-23 14:15:22,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1175613.3333333333, ans=0.125 2023-12-23 14:15:23,071 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.25 vs. limit=22.5 2023-12-23 14:15:36,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1175746.6666666667, ans=0.125 2023-12-23 14:15:48,237 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.219e+01 3.729e+01 3.996e+01 5.182e+01 1.024e+02, threshold=7.991e+01, percent-clipped=5.0 2023-12-23 14:15:56,741 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.55 vs. limit=6.0 2023-12-23 14:16:06,244 INFO [train.py:886] (1/4) Epoch 38, batch 50, loss[loss=0.01625, audio_tagging_loss=0.01625, over 25000.00 frames. ], tot_loss[loss=0.01893, audio_tagging_loss=0.01893, over 1117247.25 frames. ], batch size: 100, lr: 2.85e-03, grad_scale: 32.0 2023-12-23 14:16:06,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1175946.6666666667, ans=0.0 2023-12-23 14:16:18,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1176013.3333333333, ans=0.1 2023-12-23 14:16:38,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1176146.6666666667, ans=0.125 2023-12-23 14:16:51,380 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.51 vs. limit=6.0 2023-12-23 14:16:57,338 INFO [train.py:886] (1/4) Epoch 38, batch 100, loss[loss=0.01456, audio_tagging_loss=0.01456, over 25000.00 frames. ], tot_loss[loss=0.01629, audio_tagging_loss=0.01629, over 1975419.95 frames. ], batch size: 100, lr: 2.85e-03, grad_scale: 32.0 2023-12-23 14:16:57,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1176280.0, ans=0.1 2023-12-23 14:17:30,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1176480.0, ans=0.125 2023-12-23 14:17:32,539 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.495e+01 3.959e+01 4.145e+01 4.396e+01 5.235e+01, threshold=8.289e+01, percent-clipped=0.0 2023-12-23 14:17:33,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1176480.0, ans=0.0 2023-12-23 14:17:49,837 INFO [train.py:886] (1/4) Epoch 38, batch 150, loss[loss=0.01281, audio_tagging_loss=0.01281, over 25000.00 frames. ], tot_loss[loss=0.01485, audio_tagging_loss=0.01485, over 2639817.76 frames. ], batch size: 100, lr: 2.85e-03, grad_scale: 32.0 2023-12-23 14:17:53,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1176613.3333333333, ans=0.0 2023-12-23 14:18:14,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=1176746.6666666667, ans=0.02 2023-12-23 14:18:19,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1176746.6666666667, ans=0.125 2023-12-23 14:18:19,678 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.09 vs. limit=15.0 2023-12-23 14:18:41,137 INFO [train.py:886] (1/4) Epoch 38, batch 200, loss[loss=0.01047, audio_tagging_loss=0.01047, over 25000.00 frames. ], tot_loss[loss=0.01389, audio_tagging_loss=0.01389, over 3156456.87 frames. ], batch size: 100, lr: 2.85e-03, grad_scale: 32.0 2023-12-23 14:18:51,410 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.80 vs. limit=15.0 2023-12-23 14:18:55,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1177013.3333333333, ans=0.0 2023-12-23 14:19:00,089 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.02 vs. limit=12.0 2023-12-23 14:19:14,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1177146.6666666667, ans=0.125 2023-12-23 14:19:16,766 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.167e+01 3.567e+01 3.766e+01 3.950e+01 4.355e+01, threshold=7.532e+01, percent-clipped=0.0 2023-12-23 14:19:32,666 INFO [train.py:886] (1/4) Epoch 38, batch 250, loss[loss=0.01003, audio_tagging_loss=0.01003, over 25000.00 frames. ], tot_loss[loss=0.01324, audio_tagging_loss=0.01324, over 3557843.81 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 32.0 2023-12-23 14:19:44,491 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.18 vs. limit=22.5 2023-12-23 14:19:54,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1177413.3333333333, ans=0.125 2023-12-23 14:19:58,411 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.523e-02 2023-12-23 14:20:18,958 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.43 vs. limit=6.0 2023-12-23 14:20:24,858 INFO [train.py:886] (1/4) Epoch 38, batch 300, loss[loss=0.01244, audio_tagging_loss=0.01244, over 24750.00 frames. ], tot_loss[loss=0.01296, audio_tagging_loss=0.01296, over 3861483.83 frames. ], batch size: 99, lr: 2.84e-03, grad_scale: 32.0 2023-12-23 14:20:27,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1177613.3333333333, ans=0.1 2023-12-23 14:20:39,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1177680.0, ans=10.0 2023-12-23 14:20:51,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1177746.6666666667, ans=0.0 2023-12-23 14:20:59,861 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.270e+01 3.565e+01 3.753e+01 3.870e+01 4.486e+01, threshold=7.506e+01, percent-clipped=0.0 2023-12-23 14:21:05,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1177880.0, ans=0.0 2023-12-23 14:21:06,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1177880.0, ans=0.125 2023-12-23 14:21:15,791 INFO [train.py:886] (1/4) Epoch 38, batch 350, loss[loss=0.008064, audio_tagging_loss=0.008064, over 25000.00 frames. ], tot_loss[loss=0.01276, audio_tagging_loss=0.01276, over 4101652.70 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 32.0 2023-12-23 14:21:17,111 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.10 vs. limit=15.0 2023-12-23 14:21:19,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1177946.6666666667, ans=0.125 2023-12-23 14:21:41,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1178080.0, ans=0.125 2023-12-23 14:21:48,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1178146.6666666667, ans=0.025 2023-12-23 14:22:08,445 INFO [train.py:886] (1/4) Epoch 38, batch 400, loss[loss=0.0112, audio_tagging_loss=0.0112, over 25000.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4288510.32 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 32.0 2023-12-23 14:22:12,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1178280.0, ans=0.0 2023-12-23 14:22:15,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1178280.0, ans=0.0 2023-12-23 14:22:20,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1178346.6666666667, ans=0.0 2023-12-23 14:22:41,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1178480.0, ans=0.0 2023-12-23 14:22:43,733 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.271e+01 3.551e+01 3.719e+01 3.910e+01 4.441e+01, threshold=7.437e+01, percent-clipped=0.0 2023-12-23 14:22:45,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1178480.0, ans=0.07 2023-12-23 14:23:01,098 INFO [train.py:886] (1/4) Epoch 38, batch 450, loss[loss=0.01065, audio_tagging_loss=0.01065, over 25000.00 frames. ], tot_loss[loss=0.01222, audio_tagging_loss=0.01222, over 4437948.31 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 32.0 2023-12-23 14:23:09,077 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.84 vs. limit=22.5 2023-12-23 14:23:21,367 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.64 vs. limit=22.5 2023-12-23 14:23:27,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.48 vs. limit=10.0 2023-12-23 14:23:31,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1178813.3333333333, ans=0.0 2023-12-23 14:23:42,113 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.63 vs. limit=15.0 2023-12-23 14:23:51,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1178946.6666666667, ans=0.125 2023-12-23 14:23:52,199 INFO [train.py:886] (1/4) Epoch 38, batch 500, loss[loss=0.01068, audio_tagging_loss=0.01068, over 25000.00 frames. ], tot_loss[loss=0.01205, audio_tagging_loss=0.01205, over 4554209.65 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 32.0 2023-12-23 14:23:54,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1178946.6666666667, ans=0.125 2023-12-23 14:24:20,197 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=22.5 2023-12-23 14:24:21,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1179080.0, ans=0.125 2023-12-23 14:24:27,113 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.290e+01 3.549e+01 3.713e+01 3.859e+01 4.188e+01, threshold=7.427e+01, percent-clipped=0.0 2023-12-23 14:24:29,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1179146.6666666667, ans=0.035 2023-12-23 14:24:44,564 INFO [train.py:886] (1/4) Epoch 38, batch 550, loss[loss=0.01218, audio_tagging_loss=0.01218, over 25000.00 frames. ], tot_loss[loss=0.01201, audio_tagging_loss=0.01201, over 4641098.10 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 32.0 2023-12-23 14:25:02,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1179346.6666666667, ans=0.125 2023-12-23 14:25:08,991 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.25 vs. limit=22.5 2023-12-23 14:25:15,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1179480.0, ans=0.0 2023-12-23 14:25:36,035 INFO [train.py:886] (1/4) Epoch 38, batch 600, loss[loss=0.01054, audio_tagging_loss=0.01054, over 24750.00 frames. ], tot_loss[loss=0.01207, audio_tagging_loss=0.01207, over 4714364.85 frames. ], batch size: 99, lr: 2.84e-03, grad_scale: 32.0 2023-12-23 14:25:50,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1179680.0, ans=0.125 2023-12-23 14:25:57,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1179746.6666666667, ans=0.125 2023-12-23 14:26:03,306 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=15.0 2023-12-23 14:26:10,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1179813.3333333333, ans=0.0 2023-12-23 14:26:11,724 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.162e+01 3.607e+01 3.767e+01 3.907e+01 4.380e+01, threshold=7.534e+01, percent-clipped=0.0 2023-12-23 14:26:12,238 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.69 vs. limit=6.0 2023-12-23 14:26:13,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1179813.3333333333, ans=0.09899494936611666 2023-12-23 14:26:24,589 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.05 vs. limit=15.0 2023-12-23 14:26:28,539 INFO [train.py:886] (1/4) Epoch 38, batch 650, loss[loss=0.01219, audio_tagging_loss=0.01219, over 24750.00 frames. ], tot_loss[loss=0.01207, audio_tagging_loss=0.01207, over 4760385.05 frames. ], batch size: 99, lr: 2.84e-03, grad_scale: 32.0 2023-12-23 14:26:36,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1179946.6666666667, ans=0.125 2023-12-23 14:26:39,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1180013.3333333333, ans=0.125 2023-12-23 14:26:53,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1180080.0, ans=0.125 2023-12-23 14:26:55,794 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.07 vs. limit=15.0 2023-12-23 14:27:01,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1180146.6666666667, ans=0.1 2023-12-23 14:27:20,549 INFO [train.py:886] (1/4) Epoch 38, batch 700, loss[loss=0.01198, audio_tagging_loss=0.01198, over 24750.00 frames. ], tot_loss[loss=0.01198, audio_tagging_loss=0.01198, over 4801783.73 frames. ], batch size: 99, lr: 2.84e-03, grad_scale: 32.0 2023-12-23 14:27:26,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1180280.0, ans=0.125 2023-12-23 14:27:34,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1180346.6666666667, ans=0.2 2023-12-23 14:27:39,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1180413.3333333333, ans=0.125 2023-12-23 14:27:44,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1180413.3333333333, ans=0.125 2023-12-23 14:27:55,672 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.264e+01 3.550e+01 3.700e+01 3.852e+01 4.697e+01, threshold=7.400e+01, percent-clipped=0.0 2023-12-23 14:27:59,850 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.39 vs. limit=15.0 2023-12-23 14:28:06,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1180546.6666666667, ans=0.07 2023-12-23 14:28:11,528 INFO [train.py:886] (1/4) Epoch 38, batch 750, loss[loss=0.01331, audio_tagging_loss=0.01331, over 25000.00 frames. ], tot_loss[loss=0.01184, audio_tagging_loss=0.01184, over 4832916.62 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:28:28,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=1180680.0, ans=0.95 2023-12-23 14:28:32,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1180746.6666666667, ans=0.0 2023-12-23 14:29:04,145 INFO [train.py:886] (1/4) Epoch 38, batch 800, loss[loss=0.009581, audio_tagging_loss=0.009581, over 23914.00 frames. ], tot_loss[loss=0.0118, audio_tagging_loss=0.0118, over 4860058.79 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:29:04,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1180946.6666666667, ans=0.125 2023-12-23 14:29:06,150 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 14:29:11,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=1180946.6666666667, ans=12.0 2023-12-23 14:29:16,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1181013.3333333333, ans=0.0 2023-12-23 14:29:38,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1181146.6666666667, ans=0.0 2023-12-23 14:29:39,328 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.258e+01 3.509e+01 3.712e+01 3.867e+01 4.445e+01, threshold=7.423e+01, percent-clipped=0.0 2023-12-23 14:29:55,992 INFO [train.py:886] (1/4) Epoch 38, batch 850, loss[loss=0.009884, audio_tagging_loss=0.009884, over 25000.00 frames. ], tot_loss[loss=0.01171, audio_tagging_loss=0.01171, over 4877370.42 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:29:56,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1181280.0, ans=0.09899494936611666 2023-12-23 14:30:01,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1181280.0, ans=0.0 2023-12-23 14:30:05,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1181280.0, ans=0.95 2023-12-23 14:30:13,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1181346.6666666667, ans=0.1 2023-12-23 14:30:35,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1181480.0, ans=0.1 2023-12-23 14:30:47,862 INFO [train.py:886] (1/4) Epoch 38, batch 900, loss[loss=0.01294, audio_tagging_loss=0.01294, over 24750.00 frames. ], tot_loss[loss=0.01179, audio_tagging_loss=0.01179, over 4896859.28 frames. ], batch size: 99, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:30:55,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1181613.3333333333, ans=0.0 2023-12-23 14:31:08,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1181680.0, ans=0.1 2023-12-23 14:31:13,069 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=15.0 2023-12-23 14:31:14,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1181746.6666666667, ans=0.0 2023-12-23 14:31:21,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1181813.3333333333, ans=0.0 2023-12-23 14:31:23,125 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.323e+01 3.602e+01 3.689e+01 3.922e+01 4.328e+01, threshold=7.378e+01, percent-clipped=0.0 2023-12-23 14:31:41,101 INFO [train.py:886] (1/4) Epoch 38, batch 950, loss[loss=0.0108, audio_tagging_loss=0.0108, over 24750.00 frames. ], tot_loss[loss=0.01191, audio_tagging_loss=0.01191, over 4905708.38 frames. ], batch size: 99, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:31:48,155 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.24 vs. limit=15.0 2023-12-23 14:31:58,214 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.96 vs. limit=10.0 2023-12-23 14:32:13,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1182146.6666666667, ans=0.125 2023-12-23 14:32:32,460 INFO [train.py:886] (1/4) Epoch 38, batch 1000, loss[loss=0.01391, audio_tagging_loss=0.01391, over 25000.00 frames. ], tot_loss[loss=0.01196, audio_tagging_loss=0.01196, over 4910657.84 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:32:47,487 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 14:32:50,857 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.74 vs. limit=15.0 2023-12-23 14:32:54,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=1182413.3333333333, ans=0.025 2023-12-23 14:33:03,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1182480.0, ans=0.125 2023-12-23 14:33:05,566 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.93 vs. limit=15.0 2023-12-23 14:33:06,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1182480.0, ans=0.0 2023-12-23 14:33:07,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1182480.0, ans=0.125 2023-12-23 14:33:07,754 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.182e+01 3.581e+01 3.724e+01 3.913e+01 4.572e+01, threshold=7.448e+01, percent-clipped=0.0 2023-12-23 14:33:24,487 INFO [train.py:886] (1/4) Epoch 38, batch 1050, loss[loss=0.01008, audio_tagging_loss=0.01008, over 21844.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 4912810.25 frames. ], batch size: 107, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:33:25,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1182613.3333333333, ans=0.2 2023-12-23 14:33:41,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1182680.0, ans=0.0 2023-12-23 14:33:42,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1182680.0, ans=0.2 2023-12-23 14:34:05,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1182880.0, ans=0.0 2023-12-23 14:34:07,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1182880.0, ans=0.0 2023-12-23 14:34:16,318 INFO [train.py:886] (1/4) Epoch 38, batch 1100, loss[loss=0.01214, audio_tagging_loss=0.01214, over 25000.00 frames. ], tot_loss[loss=0.01181, audio_tagging_loss=0.01181, over 4921610.94 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:34:31,649 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=15.0 2023-12-23 14:34:43,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1183080.0, ans=0.125 2023-12-23 14:34:44,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1183080.0, ans=0.2 2023-12-23 14:34:52,073 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.109e+01 3.541e+01 3.703e+01 3.876e+01 4.507e+01, threshold=7.407e+01, percent-clipped=0.0 2023-12-23 14:34:54,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1183146.6666666667, ans=0.125 2023-12-23 14:34:56,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.51 vs. limit=15.0 2023-12-23 14:35:01,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1183213.3333333333, ans=0.0 2023-12-23 14:35:03,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1183213.3333333333, ans=0.0 2023-12-23 14:35:07,331 INFO [train.py:886] (1/4) Epoch 38, batch 1150, loss[loss=0.01336, audio_tagging_loss=0.01336, over 25000.00 frames. ], tot_loss[loss=0.01181, audio_tagging_loss=0.01181, over 4930507.35 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:35:16,799 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.34 vs. limit=15.0 2023-12-23 14:35:21,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1183346.6666666667, ans=0.125 2023-12-23 14:36:00,179 INFO [train.py:886] (1/4) Epoch 38, batch 1200, loss[loss=0.01425, audio_tagging_loss=0.01425, over 25000.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4941526.44 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:36:35,472 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.439e+01 3.644e+01 3.844e+01 4.002e+01 4.470e+01, threshold=7.688e+01, percent-clipped=0.0 2023-12-23 14:36:43,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1183880.0, ans=0.0 2023-12-23 14:36:44,563 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.75 vs. limit=22.5 2023-12-23 14:36:51,366 INFO [train.py:886] (1/4) Epoch 38, batch 1250, loss[loss=0.01206, audio_tagging_loss=0.01206, over 24750.00 frames. ], tot_loss[loss=0.0119, audio_tagging_loss=0.0119, over 4938273.78 frames. ], batch size: 99, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:36:53,625 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.25 vs. limit=22.5 2023-12-23 14:37:04,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1184013.3333333333, ans=0.2 2023-12-23 14:37:06,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1184013.3333333333, ans=0.125 2023-12-23 14:37:07,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1184013.3333333333, ans=0.0 2023-12-23 14:37:12,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1184080.0, ans=0.2 2023-12-23 14:37:14,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1184080.0, ans=0.0 2023-12-23 14:37:27,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1184146.6666666667, ans=0.1 2023-12-23 14:37:43,632 INFO [train.py:886] (1/4) Epoch 38, batch 1300, loss[loss=0.01333, audio_tagging_loss=0.01333, over 24750.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4936302.16 frames. ], batch size: 99, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:37:46,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1184280.0, ans=0.125 2023-12-23 14:38:09,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1184413.3333333333, ans=0.1 2023-12-23 14:38:18,640 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.243e+01 3.655e+01 3.794e+01 4.002e+01 4.408e+01, threshold=7.587e+01, percent-clipped=0.0 2023-12-23 14:38:27,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1184546.6666666667, ans=0.1 2023-12-23 14:38:35,256 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.59 vs. limit=6.0 2023-12-23 14:38:35,825 INFO [train.py:886] (1/4) Epoch 38, batch 1350, loss[loss=0.01239, audio_tagging_loss=0.01239, over 25000.00 frames. ], tot_loss[loss=0.01197, audio_tagging_loss=0.01197, over 4935043.36 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:38:39,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1184613.3333333333, ans=0.1 2023-12-23 14:39:07,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1184813.3333333333, ans=0.125 2023-12-23 14:39:26,139 INFO [train.py:886] (1/4) Epoch 38, batch 1400, loss[loss=0.01023, audio_tagging_loss=0.01023, over 25000.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4937434.91 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:39:26,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=1184946.6666666667, ans=0.1 2023-12-23 14:39:48,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1185080.0, ans=0.1 2023-12-23 14:39:52,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1185080.0, ans=0.1 2023-12-23 14:39:52,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1185080.0, ans=0.0 2023-12-23 14:40:01,448 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.205e+01 3.485e+01 3.664e+01 3.812e+01 4.482e+01, threshold=7.328e+01, percent-clipped=0.0 2023-12-23 14:40:19,457 INFO [train.py:886] (1/4) Epoch 38, batch 1450, loss[loss=0.0109, audio_tagging_loss=0.0109, over 25000.00 frames. ], tot_loss[loss=0.0118, audio_tagging_loss=0.0118, over 4939401.59 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:40:20,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1185280.0, ans=0.2 2023-12-23 14:40:25,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1185280.0, ans=0.125 2023-12-23 14:40:58,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1185480.0, ans=0.0 2023-12-23 14:41:06,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1185546.6666666667, ans=0.0 2023-12-23 14:41:08,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1185546.6666666667, ans=0.0 2023-12-23 14:41:11,599 INFO [train.py:886] (1/4) Epoch 38, batch 1500, loss[loss=0.01384, audio_tagging_loss=0.01384, over 25000.00 frames. ], tot_loss[loss=0.01181, audio_tagging_loss=0.01181, over 4944573.57 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:41:15,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1185613.3333333333, ans=0.125 2023-12-23 14:41:38,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1185746.6666666667, ans=0.125 2023-12-23 14:41:42,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1185813.3333333333, ans=0.125 2023-12-23 14:41:43,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1185813.3333333333, ans=0.1 2023-12-23 14:41:47,044 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.260e+01 3.559e+01 3.727e+01 3.866e+01 4.257e+01, threshold=7.454e+01, percent-clipped=0.0 2023-12-23 14:41:55,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1185880.0, ans=0.0 2023-12-23 14:41:58,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1185880.0, ans=0.0 2023-12-23 14:42:03,334 INFO [train.py:886] (1/4) Epoch 38, batch 1550, loss[loss=0.01007, audio_tagging_loss=0.01007, over 24048.00 frames. ], tot_loss[loss=0.0118, audio_tagging_loss=0.0118, over 4942770.06 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:42:08,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1185946.6666666667, ans=0.05 2023-12-23 14:42:08,628 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.61 vs. limit=15.0 2023-12-23 14:42:20,158 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.01 vs. limit=15.0 2023-12-23 14:42:20,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1186013.3333333333, ans=0.0 2023-12-23 14:42:23,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1186080.0, ans=0.2 2023-12-23 14:42:30,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1186080.0, ans=0.125 2023-12-23 14:42:30,414 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.69 vs. limit=15.0 2023-12-23 14:42:53,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1186213.3333333333, ans=0.0 2023-12-23 14:42:55,881 INFO [train.py:886] (1/4) Epoch 38, batch 1600, loss[loss=0.008773, audio_tagging_loss=0.008773, over 24090.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4943601.94 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:43:07,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1186346.6666666667, ans=0.07 2023-12-23 14:43:15,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1186413.3333333333, ans=0.125 2023-12-23 14:43:23,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1186413.3333333333, ans=0.2 2023-12-23 14:43:30,929 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.275e+01 3.656e+01 3.741e+01 3.949e+01 4.800e+01, threshold=7.482e+01, percent-clipped=0.0 2023-12-23 14:43:38,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1186546.6666666667, ans=0.125 2023-12-23 14:43:46,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1186613.3333333333, ans=0.125 2023-12-23 14:43:47,042 INFO [train.py:886] (1/4) Epoch 38, batch 1650, loss[loss=0.01412, audio_tagging_loss=0.01412, over 24750.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4941203.59 frames. ], batch size: 99, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:43:51,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1186613.3333333333, ans=0.0 2023-12-23 14:44:02,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1186680.0, ans=0.0 2023-12-23 14:44:09,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1186746.6666666667, ans=0.04949747468305833 2023-12-23 14:44:10,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1186746.6666666667, ans=0.125 2023-12-23 14:44:40,023 INFO [train.py:886] (1/4) Epoch 38, batch 1700, loss[loss=0.01143, audio_tagging_loss=0.01143, over 25000.00 frames. ], tot_loss[loss=0.01182, audio_tagging_loss=0.01182, over 4947670.75 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:45:10,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1187146.6666666667, ans=0.0 2023-12-23 14:45:15,256 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.164e+01 3.542e+01 3.699e+01 3.849e+01 4.948e+01, threshold=7.398e+01, percent-clipped=0.0 2023-12-23 14:45:16,855 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.49 vs. limit=15.0 2023-12-23 14:45:32,639 INFO [train.py:886] (1/4) Epoch 38, batch 1750, loss[loss=0.01049, audio_tagging_loss=0.01049, over 22462.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4946942.26 frames. ], batch size: 107, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:45:36,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1187280.0, ans=0.2 2023-12-23 14:45:38,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1187280.0, ans=0.125 2023-12-23 14:46:06,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1187480.0, ans=0.0 2023-12-23 14:46:08,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1187480.0, ans=0.05 2023-12-23 14:46:23,377 INFO [train.py:886] (1/4) Epoch 38, batch 1800, loss[loss=0.01293, audio_tagging_loss=0.01293, over 25000.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4948073.76 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:46:23,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1187613.3333333333, ans=0.125 2023-12-23 14:46:52,894 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 14:46:58,333 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.277e+01 3.639e+01 3.766e+01 3.892e+01 4.518e+01, threshold=7.532e+01, percent-clipped=0.0 2023-12-23 14:47:10,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1187880.0, ans=0.0 2023-12-23 14:47:13,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1187880.0, ans=0.125 2023-12-23 14:47:15,570 INFO [train.py:886] (1/4) Epoch 38, batch 1850, loss[loss=0.01204, audio_tagging_loss=0.01204, over 24013.00 frames. ], tot_loss[loss=0.01177, audio_tagging_loss=0.01177, over 4951633.53 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:47:17,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1187946.6666666667, ans=0.125 2023-12-23 14:47:22,689 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.60 vs. limit=15.0 2023-12-23 14:47:31,063 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.95 vs. limit=15.0 2023-12-23 14:47:39,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.75 vs. limit=15.0 2023-12-23 14:47:39,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1188080.0, ans=0.125 2023-12-23 14:47:46,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1188146.6666666667, ans=0.0 2023-12-23 14:47:48,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1188146.6666666667, ans=0.0 2023-12-23 14:47:58,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1188213.3333333333, ans=0.0 2023-12-23 14:48:07,148 INFO [train.py:886] (1/4) Epoch 38, batch 1900, loss[loss=0.0117, audio_tagging_loss=0.0117, over 24750.00 frames. ], tot_loss[loss=0.01182, audio_tagging_loss=0.01182, over 4944029.79 frames. ], batch size: 99, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:48:10,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1188280.0, ans=0.1 2023-12-23 14:48:16,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1188280.0, ans=0.0 2023-12-23 14:48:20,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1188346.6666666667, ans=0.125 2023-12-23 14:48:22,546 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.17 vs. limit=10.0 2023-12-23 14:48:30,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1188413.3333333333, ans=0.125 2023-12-23 14:48:43,883 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.314e+01 3.569e+01 3.756e+01 3.902e+01 4.536e+01, threshold=7.513e+01, percent-clipped=0.0 2023-12-23 14:48:45,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1188480.0, ans=0.07 2023-12-23 14:48:45,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1188480.0, ans=0.1 2023-12-23 14:48:59,027 INFO [train.py:886] (1/4) Epoch 38, batch 1950, loss[loss=0.01065, audio_tagging_loss=0.01065, over 24750.00 frames. ], tot_loss[loss=0.01181, audio_tagging_loss=0.01181, over 4936687.91 frames. ], batch size: 99, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:49:25,998 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.68 vs. limit=15.0 2023-12-23 14:49:32,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1188813.3333333333, ans=0.125 2023-12-23 14:49:42,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1188880.0, ans=0.2 2023-12-23 14:49:51,568 INFO [train.py:886] (1/4) Epoch 38, batch 2000, loss[loss=0.01189, audio_tagging_loss=0.01189, over 24750.00 frames. ], tot_loss[loss=0.0117, audio_tagging_loss=0.0117, over 4930679.26 frames. ], batch size: 99, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:50:13,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1189080.0, ans=0.125 2023-12-23 14:50:22,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1189146.6666666667, ans=0.125 2023-12-23 14:50:24,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1189146.6666666667, ans=0.0 2023-12-23 14:50:26,861 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.948e+01 3.554e+01 3.702e+01 3.907e+01 4.356e+01, threshold=7.404e+01, percent-clipped=0.0 2023-12-23 14:50:27,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1189146.6666666667, ans=0.125 2023-12-23 14:50:36,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1189213.3333333333, ans=0.125 2023-12-23 14:50:42,551 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.36 vs. limit=22.5 2023-12-23 14:50:43,020 INFO [train.py:886] (1/4) Epoch 38, batch 2050, loss[loss=0.01037, audio_tagging_loss=0.01037, over 25000.00 frames. ], tot_loss[loss=0.01171, audio_tagging_loss=0.01171, over 4933137.35 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:50:58,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=1189346.6666666667, ans=12.0 2023-12-23 14:51:07,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1189413.3333333333, ans=0.0 2023-12-23 14:51:35,119 INFO [train.py:886] (1/4) Epoch 38, batch 2100, loss[loss=0.01273, audio_tagging_loss=0.01273, over 24750.00 frames. ], tot_loss[loss=0.0117, audio_tagging_loss=0.0117, over 4931925.53 frames. ], batch size: 99, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:51:41,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1189613.3333333333, ans=0.0 2023-12-23 14:51:41,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1189613.3333333333, ans=0.0 2023-12-23 14:51:42,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1189613.3333333333, ans=0.0 2023-12-23 14:51:51,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1189680.0, ans=0.1 2023-12-23 14:52:00,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1189746.6666666667, ans=0.0 2023-12-23 14:52:06,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1189813.3333333333, ans=0.1 2023-12-23 14:52:09,233 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.58 vs. limit=12.0 2023-12-23 14:52:09,629 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.222e+01 3.562e+01 3.709e+01 3.873e+01 4.397e+01, threshold=7.419e+01, percent-clipped=0.0 2023-12-23 14:52:25,601 INFO [train.py:886] (1/4) Epoch 38, batch 2150, loss[loss=0.01038, audio_tagging_loss=0.01038, over 25000.00 frames. ], tot_loss[loss=0.01177, audio_tagging_loss=0.01177, over 4944130.06 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:52:45,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1190080.0, ans=0.0 2023-12-23 14:52:50,206 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.03 vs. limit=15.0 2023-12-23 14:52:58,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1190146.6666666667, ans=15.0 2023-12-23 14:53:10,954 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.21 vs. limit=15.0 2023-12-23 14:53:17,054 INFO [train.py:886] (1/4) Epoch 38, batch 2200, loss[loss=0.01256, audio_tagging_loss=0.01256, over 25000.00 frames. ], tot_loss[loss=0.01186, audio_tagging_loss=0.01186, over 4940923.89 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:53:25,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1190280.0, ans=0.125 2023-12-23 14:53:28,228 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2023-12-23 14:53:34,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1190346.6666666667, ans=0.125 2023-12-23 14:53:34,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1190346.6666666667, ans=0.0 2023-12-23 14:53:35,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1190346.6666666667, ans=0.125 2023-12-23 14:53:36,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1190346.6666666667, ans=0.1 2023-12-23 14:53:51,995 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.264e+01 3.641e+01 3.770e+01 3.908e+01 4.334e+01, threshold=7.540e+01, percent-clipped=0.0 2023-12-23 14:54:09,916 INFO [train.py:886] (1/4) Epoch 38, batch 2250, loss[loss=0.01397, audio_tagging_loss=0.01397, over 24933.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4933332.05 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:54:27,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1190680.0, ans=0.125 2023-12-23 14:54:34,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1190746.6666666667, ans=0.125 2023-12-23 14:54:40,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1190813.3333333333, ans=0.0 2023-12-23 14:54:40,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1190813.3333333333, ans=0.2 2023-12-23 14:55:00,350 INFO [train.py:886] (1/4) Epoch 38, batch 2300, loss[loss=0.01106, audio_tagging_loss=0.01106, over 24750.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 4930058.43 frames. ], batch size: 99, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:55:12,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1191013.3333333333, ans=0.0 2023-12-23 14:55:25,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1191080.0, ans=0.2 2023-12-23 14:55:30,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1191146.6666666667, ans=0.125 2023-12-23 14:55:31,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1191146.6666666667, ans=0.0 2023-12-23 14:55:35,742 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.145e+01 3.555e+01 3.723e+01 3.863e+01 4.649e+01, threshold=7.446e+01, percent-clipped=0.0 2023-12-23 14:55:39,203 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 14:55:41,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1191146.6666666667, ans=0.125 2023-12-23 14:55:48,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1191213.3333333333, ans=0.125 2023-12-23 14:55:52,324 INFO [train.py:886] (1/4) Epoch 38, batch 2350, loss[loss=0.01223, audio_tagging_loss=0.01223, over 24750.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 4936610.58 frames. ], batch size: 99, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:55:53,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1191280.0, ans=0.125 2023-12-23 14:56:02,058 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.08 vs. limit=12.0 2023-12-23 14:56:10,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1191346.6666666667, ans=0.125 2023-12-23 14:56:13,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1191413.3333333333, ans=0.125 2023-12-23 14:56:25,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1191480.0, ans=0.125 2023-12-23 14:56:25,386 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.39 vs. limit=22.5 2023-12-23 14:56:29,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1191480.0, ans=0.1 2023-12-23 14:56:32,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1191546.6666666667, ans=0.0 2023-12-23 14:56:37,382 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.85 vs. limit=15.0 2023-12-23 14:56:38,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1191546.6666666667, ans=0.0 2023-12-23 14:56:44,878 INFO [train.py:886] (1/4) Epoch 38, batch 2400, loss[loss=0.01009, audio_tagging_loss=0.01009, over 25000.00 frames. ], tot_loss[loss=0.01177, audio_tagging_loss=0.01177, over 4942758.56 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:56:47,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1191613.3333333333, ans=0.025 2023-12-23 14:57:02,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1191680.0, ans=0.125 2023-12-23 14:57:06,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1191746.6666666667, ans=0.0 2023-12-23 14:57:08,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1191746.6666666667, ans=0.125 2023-12-23 14:57:13,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1191746.6666666667, ans=0.125 2023-12-23 14:57:20,631 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.189e+01 3.485e+01 3.668e+01 3.843e+01 4.329e+01, threshold=7.336e+01, percent-clipped=0.0 2023-12-23 14:57:33,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1191880.0, ans=0.2 2023-12-23 14:57:36,116 INFO [train.py:886] (1/4) Epoch 38, batch 2450, loss[loss=0.01024, audio_tagging_loss=0.01024, over 25000.00 frames. ], tot_loss[loss=0.01179, audio_tagging_loss=0.01179, over 4951168.79 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:57:44,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1191946.6666666667, ans=0.0 2023-12-23 14:57:44,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1191946.6666666667, ans=0.2 2023-12-23 14:57:54,823 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.29 vs. limit=15.0 2023-12-23 14:58:06,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1192146.6666666667, ans=0.2 2023-12-23 14:58:07,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1192146.6666666667, ans=0.125 2023-12-23 14:58:16,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1192213.3333333333, ans=0.0 2023-12-23 14:58:23,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1192213.3333333333, ans=0.125 2023-12-23 14:58:29,279 INFO [train.py:886] (1/4) Epoch 38, batch 2500, loss[loss=0.01431, audio_tagging_loss=0.01431, over 24750.00 frames. ], tot_loss[loss=0.0119, audio_tagging_loss=0.0119, over 4951766.24 frames. ], batch size: 99, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:58:31,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1192280.0, ans=0.09899494936611666 2023-12-23 14:58:52,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1192413.3333333333, ans=0.0 2023-12-23 14:58:56,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1192413.3333333333, ans=0.0 2023-12-23 14:59:04,279 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.151e+01 3.642e+01 3.821e+01 3.975e+01 4.588e+01, threshold=7.642e+01, percent-clipped=0.0 2023-12-23 14:59:06,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1192480.0, ans=0.125 2023-12-23 14:59:09,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1192546.6666666667, ans=0.125 2023-12-23 14:59:10,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1192546.6666666667, ans=0.125 2023-12-23 14:59:20,247 INFO [train.py:886] (1/4) Epoch 38, batch 2550, loss[loss=0.01159, audio_tagging_loss=0.01159, over 25000.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4941173.79 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:59:20,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.70 vs. limit=22.5 2023-12-23 14:59:28,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1192613.3333333333, ans=0.125 2023-12-23 14:59:31,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1192680.0, ans=0.125 2023-12-23 14:59:44,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1192746.6666666667, ans=0.0 2023-12-23 14:59:52,123 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.90 vs. limit=15.0 2023-12-23 14:59:54,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1192813.3333333333, ans=0.2 2023-12-23 15:00:03,936 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.80 vs. limit=12.0 2023-12-23 15:00:08,853 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.54 vs. limit=6.0 2023-12-23 15:00:12,992 INFO [train.py:886] (1/4) Epoch 38, batch 2600, loss[loss=0.009468, audio_tagging_loss=0.009468, over 25000.00 frames. ], tot_loss[loss=0.01182, audio_tagging_loss=0.01182, over 4939809.87 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 15:00:32,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1193013.3333333333, ans=0.125 2023-12-23 15:00:33,382 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.32 vs. limit=10.0 2023-12-23 15:00:37,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.94 vs. limit=10.0 2023-12-23 15:00:37,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1193080.0, ans=0.0 2023-12-23 15:00:47,979 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.256e+01 3.573e+01 3.736e+01 3.903e+01 5.523e+01, threshold=7.472e+01, percent-clipped=0.0 2023-12-23 15:00:50,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1193146.6666666667, ans=0.125 2023-12-23 15:00:59,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1193213.3333333333, ans=0.125 2023-12-23 15:01:05,288 INFO [train.py:886] (1/4) Epoch 38, batch 2650, loss[loss=0.01045, audio_tagging_loss=0.01045, over 25000.00 frames. ], tot_loss[loss=0.0117, audio_tagging_loss=0.0117, over 4945377.45 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 15:01:21,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1193346.6666666667, ans=0.125 2023-12-23 15:01:56,561 INFO [train.py:886] (1/4) Epoch 38, batch 2700, loss[loss=0.01265, audio_tagging_loss=0.01265, over 25000.00 frames. ], tot_loss[loss=0.01167, audio_tagging_loss=0.01167, over 4952374.86 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 15:02:12,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1193680.0, ans=0.125 2023-12-23 15:02:19,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1193746.6666666667, ans=0.0 2023-12-23 15:02:26,582 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.36 vs. limit=15.0 2023-12-23 15:02:30,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1193813.3333333333, ans=0.125 2023-12-23 15:02:32,430 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.252e+01 3.575e+01 3.684e+01 3.818e+01 4.506e+01, threshold=7.367e+01, percent-clipped=0.0 2023-12-23 15:02:33,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1193813.3333333333, ans=0.125 2023-12-23 15:02:42,123 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.08 vs. limit=10.0 2023-12-23 15:02:49,103 INFO [train.py:886] (1/4) Epoch 38, batch 2750, loss[loss=0.01271, audio_tagging_loss=0.01271, over 25000.00 frames. ], tot_loss[loss=0.01166, audio_tagging_loss=0.01166, over 4953074.64 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:02:52,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.74 vs. limit=15.0 2023-12-23 15:02:55,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1193946.6666666667, ans=0.025 2023-12-23 15:03:06,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1194013.3333333333, ans=0.125 2023-12-23 15:03:10,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1194080.0, ans=10.0 2023-12-23 15:03:11,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1194080.0, ans=0.0 2023-12-23 15:03:20,114 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.05 vs. limit=12.0 2023-12-23 15:03:28,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1194146.6666666667, ans=0.0 2023-12-23 15:03:39,367 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.51 vs. limit=15.0 2023-12-23 15:03:39,925 INFO [train.py:886] (1/4) Epoch 38, batch 2800, loss[loss=0.01058, audio_tagging_loss=0.01058, over 24022.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4949807.76 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:03:40,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1194280.0, ans=0.0 2023-12-23 15:03:40,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1194280.0, ans=0.0 2023-12-23 15:03:46,929 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.36 vs. limit=22.5 2023-12-23 15:04:15,986 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.366e+01 3.677e+01 3.839e+01 3.970e+01 4.490e+01, threshold=7.678e+01, percent-clipped=0.0 2023-12-23 15:04:23,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1194546.6666666667, ans=0.1 2023-12-23 15:04:31,029 INFO [train.py:886] (1/4) Epoch 38, batch 2850, loss[loss=0.009774, audio_tagging_loss=0.009774, over 21496.00 frames. ], tot_loss[loss=0.01179, audio_tagging_loss=0.01179, over 4939138.42 frames. ], batch size: 107, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:04:37,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1194613.3333333333, ans=0.125 2023-12-23 15:04:41,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1194680.0, ans=0.2 2023-12-23 15:04:47,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1194680.0, ans=0.125 2023-12-23 15:04:49,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1194680.0, ans=0.1 2023-12-23 15:04:49,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1194680.0, ans=0.125 2023-12-23 15:04:52,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1194746.6666666667, ans=0.2 2023-12-23 15:04:54,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1194746.6666666667, ans=0.0 2023-12-23 15:05:12,937 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.30 vs. limit=5.0 2023-12-23 15:05:23,372 INFO [train.py:886] (1/4) Epoch 38, batch 2900, loss[loss=0.01031, audio_tagging_loss=0.01031, over 25000.00 frames. ], tot_loss[loss=0.01172, audio_tagging_loss=0.01172, over 4939275.44 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:05:29,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1194946.6666666667, ans=0.05 2023-12-23 15:05:44,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1195080.0, ans=0.0 2023-12-23 15:05:45,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1195080.0, ans=0.125 2023-12-23 15:05:52,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1195080.0, ans=0.125 2023-12-23 15:05:57,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1195146.6666666667, ans=0.125 2023-12-23 15:05:58,809 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.116e+01 3.532e+01 3.702e+01 3.923e+01 4.475e+01, threshold=7.405e+01, percent-clipped=0.0 2023-12-23 15:06:14,384 INFO [train.py:886] (1/4) Epoch 38, batch 2950, loss[loss=0.01013, audio_tagging_loss=0.01013, over 25000.00 frames. ], tot_loss[loss=0.01171, audio_tagging_loss=0.01171, over 4941824.31 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:06:28,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1195346.6666666667, ans=0.0 2023-12-23 15:06:29,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1195346.6666666667, ans=0.125 2023-12-23 15:06:49,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1195480.0, ans=0.1 2023-12-23 15:06:50,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1195480.0, ans=0.0 2023-12-23 15:07:03,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1195613.3333333333, ans=0.125 2023-12-23 15:07:04,604 INFO [train.py:886] (1/4) Epoch 38, batch 3000, loss[loss=0.01073, audio_tagging_loss=0.01073, over 25000.00 frames. ], tot_loss[loss=0.01166, audio_tagging_loss=0.01166, over 4946017.59 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:07:04,604 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 15:07:15,937 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3214, 3.3700, 4.0642, 3.7818], device='cuda:1') 2023-12-23 15:07:23,478 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.0753, 5.8490, 5.8095, 5.9840], device='cuda:1') 2023-12-23 15:07:25,413 INFO [train.py:917] (1/4) Epoch 38, validation: loss=0.03488, audio_tagging_loss=0.03488, over 3737520.00 frames. 2023-12-23 15:07:25,414 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 15:07:29,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1195613.3333333333, ans=0.125 2023-12-23 15:07:52,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1195746.6666666667, ans=0.125 2023-12-23 15:07:56,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1195813.3333333333, ans=0.0 2023-12-23 15:08:01,317 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.200e+01 3.533e+01 3.703e+01 3.872e+01 4.647e+01, threshold=7.406e+01, percent-clipped=0.0 2023-12-23 15:08:07,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.38 vs. limit=6.0 2023-12-23 15:08:13,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1195880.0, ans=0.09899494936611666 2023-12-23 15:08:16,285 INFO [train.py:886] (1/4) Epoch 38, batch 3050, loss[loss=0.01045, audio_tagging_loss=0.01045, over 24750.00 frames. ], tot_loss[loss=0.01163, audio_tagging_loss=0.01163, over 4949044.40 frames. ], batch size: 99, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:08:21,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1195946.6666666667, ans=0.1 2023-12-23 15:08:31,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1196013.3333333333, ans=0.125 2023-12-23 15:08:35,435 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.73 vs. limit=22.5 2023-12-23 15:08:39,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1196080.0, ans=0.125 2023-12-23 15:08:42,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1196080.0, ans=0.0 2023-12-23 15:08:53,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1196146.6666666667, ans=0.125 2023-12-23 15:08:57,909 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.64 vs. limit=15.0 2023-12-23 15:09:05,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1196213.3333333333, ans=0.125 2023-12-23 15:09:08,027 INFO [train.py:886] (1/4) Epoch 38, batch 3100, loss[loss=0.01134, audio_tagging_loss=0.01134, over 24750.00 frames. ], tot_loss[loss=0.01167, audio_tagging_loss=0.01167, over 4953969.86 frames. ], batch size: 99, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:09:08,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1196280.0, ans=0.1 2023-12-23 15:09:12,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1196280.0, ans=0.1 2023-12-23 15:09:43,191 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.051e+01 3.615e+01 3.756e+01 3.923e+01 4.234e+01, threshold=7.513e+01, percent-clipped=0.0 2023-12-23 15:09:43,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1196480.0, ans=0.125 2023-12-23 15:09:57,366 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.75 vs. limit=15.0 2023-12-23 15:10:00,175 INFO [train.py:886] (1/4) Epoch 38, batch 3150, loss[loss=0.01269, audio_tagging_loss=0.01269, over 24750.00 frames. ], tot_loss[loss=0.01176, audio_tagging_loss=0.01176, over 4950360.89 frames. ], batch size: 99, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:10:11,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1196680.0, ans=0.125 2023-12-23 15:10:14,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1196680.0, ans=0.125 2023-12-23 15:10:31,522 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.24 vs. limit=15.0 2023-12-23 15:10:37,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1196813.3333333333, ans=0.1 2023-12-23 15:10:39,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1196880.0, ans=0.0 2023-12-23 15:10:49,011 INFO [train.py:886] (1/4) Epoch 38, batch 3200, loss[loss=0.01268, audio_tagging_loss=0.01268, over 25000.00 frames. ], tot_loss[loss=0.0118, audio_tagging_loss=0.0118, over 4949320.04 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:11:03,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1197013.3333333333, ans=0.0 2023-12-23 15:11:20,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1197146.6666666667, ans=0.125 2023-12-23 15:11:23,712 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.251e+01 3.566e+01 3.766e+01 3.939e+01 4.413e+01, threshold=7.532e+01, percent-clipped=0.0 2023-12-23 15:11:34,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1197213.3333333333, ans=0.125 2023-12-23 15:11:40,201 INFO [train.py:886] (1/4) Epoch 38, batch 3250, loss[loss=0.01225, audio_tagging_loss=0.01225, over 25000.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4944347.89 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:11:51,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.61 vs. limit=22.5 2023-12-23 15:11:51,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1197346.6666666667, ans=0.0 2023-12-23 15:11:57,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1197346.6666666667, ans=0.025 2023-12-23 15:12:22,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1197546.6666666667, ans=0.0 2023-12-23 15:12:30,372 INFO [train.py:886] (1/4) Epoch 38, batch 3300, loss[loss=0.01076, audio_tagging_loss=0.01076, over 24750.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4945680.83 frames. ], batch size: 99, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:12:38,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1197613.3333333333, ans=0.125 2023-12-23 15:12:39,227 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2023-12-23 15:12:41,870 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.37 vs. limit=15.0 2023-12-23 15:12:44,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1197680.0, ans=0.1 2023-12-23 15:12:49,160 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:13:07,160 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.087e+01 3.537e+01 3.698e+01 3.881e+01 4.434e+01, threshold=7.396e+01, percent-clipped=0.0 2023-12-23 15:13:13,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1197880.0, ans=0.125 2023-12-23 15:13:17,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.77 vs. limit=15.0 2023-12-23 15:13:21,889 INFO [train.py:886] (1/4) Epoch 38, batch 3350, loss[loss=0.01014, audio_tagging_loss=0.01014, over 25000.00 frames. ], tot_loss[loss=0.01164, audio_tagging_loss=0.01164, over 4946775.85 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:13:38,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1198013.3333333333, ans=0.0 2023-12-23 15:14:01,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1198213.3333333333, ans=0.125 2023-12-23 15:14:12,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1198280.0, ans=0.125 2023-12-23 15:14:12,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1198280.0, ans=0.125 2023-12-23 15:14:12,756 INFO [train.py:886] (1/4) Epoch 38, batch 3400, loss[loss=0.01269, audio_tagging_loss=0.01269, over 24750.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4949576.04 frames. ], batch size: 99, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:14:22,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1198346.6666666667, ans=0.0 2023-12-23 15:14:26,098 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.18 vs. limit=15.0 2023-12-23 15:14:26,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1198346.6666666667, ans=0.125 2023-12-23 15:14:39,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1198413.3333333333, ans=0.0 2023-12-23 15:14:48,196 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.386e+01 3.674e+01 3.823e+01 3.991e+01 4.333e+01, threshold=7.645e+01, percent-clipped=0.0 2023-12-23 15:14:49,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1198480.0, ans=0.125 2023-12-23 15:14:56,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1198546.6666666667, ans=0.0 2023-12-23 15:15:02,402 INFO [train.py:886] (1/4) Epoch 38, batch 3450, loss[loss=0.01167, audio_tagging_loss=0.01167, over 24750.00 frames. ], tot_loss[loss=0.01173, audio_tagging_loss=0.01173, over 4944330.57 frames. ], batch size: 99, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:15:18,309 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:15:26,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1198746.6666666667, ans=0.125 2023-12-23 15:15:27,616 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:15:28,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1198746.6666666667, ans=0.0 2023-12-23 15:15:38,748 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.51 vs. limit=22.5 2023-12-23 15:15:54,834 INFO [train.py:886] (1/4) Epoch 38, batch 3500, loss[loss=0.01216, audio_tagging_loss=0.01216, over 24750.00 frames. ], tot_loss[loss=0.01176, audio_tagging_loss=0.01176, over 4937405.45 frames. ], batch size: 99, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:15:57,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1198946.6666666667, ans=0.125 2023-12-23 15:16:01,045 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=22.5 2023-12-23 15:16:05,831 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.48 vs. limit=15.0 2023-12-23 15:16:30,767 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.139e+01 3.551e+01 3.681e+01 3.822e+01 4.366e+01, threshold=7.362e+01, percent-clipped=0.0 2023-12-23 15:16:45,624 INFO [train.py:886] (1/4) Epoch 38, batch 3550, loss[loss=0.01167, audio_tagging_loss=0.01167, over 24750.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4934147.99 frames. ], batch size: 99, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:16:48,456 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:16:51,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1199280.0, ans=0.125 2023-12-23 15:16:55,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1199346.6666666667, ans=10.0 2023-12-23 15:17:01,048 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.42 vs. limit=15.0 2023-12-23 15:17:03,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1199346.6666666667, ans=0.125 2023-12-23 15:17:20,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1199480.0, ans=0.2 2023-12-23 15:17:32,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1199546.6666666667, ans=0.125 2023-12-23 15:17:36,682 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:17:37,478 INFO [train.py:886] (1/4) Epoch 38, batch 3600, loss[loss=0.01222, audio_tagging_loss=0.01222, over 25000.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4940481.56 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 32.0 2023-12-23 15:17:51,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1199680.0, ans=0.0 2023-12-23 15:17:56,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1199680.0, ans=0.0 2023-12-23 15:17:57,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1199746.6666666667, ans=0.09899494936611666 2023-12-23 15:18:01,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1199746.6666666667, ans=0.125 2023-12-23 15:18:08,591 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.36 vs. limit=22.5 2023-12-23 15:18:10,082 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:18:10,546 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.31 vs. limit=22.5 2023-12-23 15:18:14,461 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.264e+01 3.519e+01 3.755e+01 3.929e+01 4.573e+01, threshold=7.510e+01, percent-clipped=0.0 2023-12-23 15:18:29,739 INFO [train.py:886] (1/4) Epoch 38, batch 3650, loss[loss=0.009256, audio_tagging_loss=0.009256, over 24044.00 frames. ], tot_loss[loss=0.01164, audio_tagging_loss=0.01164, over 4941719.68 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 32.0 2023-12-23 15:18:37,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1199946.6666666667, ans=0.125 2023-12-23 15:18:43,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1200013.3333333333, ans=0.2 2023-12-23 15:19:00,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1200080.0, ans=0.125 2023-12-23 15:19:14,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1200213.3333333333, ans=0.125 2023-12-23 15:19:15,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1200213.3333333333, ans=0.125 2023-12-23 15:19:23,000 INFO [train.py:886] (1/4) Epoch 38, batch 3700, loss[loss=0.01309, audio_tagging_loss=0.01309, over 25000.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4948537.40 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 32.0 2023-12-23 15:19:58,378 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.37 vs. limit=15.0 2023-12-23 15:20:00,558 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.223e+01 3.534e+01 3.731e+01 3.943e+01 4.301e+01, threshold=7.462e+01, percent-clipped=0.0 2023-12-23 15:20:10,317 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.82 vs. limit=12.0 2023-12-23 15:20:15,141 INFO [train.py:886] (1/4) Epoch 38, batch 3750, loss[loss=0.01206, audio_tagging_loss=0.01206, over 24750.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4946841.14 frames. ], batch size: 99, lr: 2.82e-03, grad_scale: 32.0 2023-12-23 15:20:21,501 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.73 vs. limit=6.0 2023-12-23 15:20:32,683 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.80 vs. limit=15.0 2023-12-23 15:20:44,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1200746.6666666667, ans=0.0 2023-12-23 15:20:53,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1200813.3333333333, ans=0.2 2023-12-23 15:21:06,192 INFO [train.py:886] (1/4) Epoch 38, batch 3800, loss[loss=0.0137, audio_tagging_loss=0.0137, over 25000.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4941755.78 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 32.0 2023-12-23 15:21:09,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1200946.6666666667, ans=0.0 2023-12-23 15:21:14,179 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.18 vs. limit=15.0 2023-12-23 15:21:24,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1201013.3333333333, ans=0.125 2023-12-23 15:21:28,393 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.81 vs. limit=6.0 2023-12-23 15:21:32,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1201080.0, ans=0.0 2023-12-23 15:21:43,259 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.264e+01 3.611e+01 3.778e+01 3.951e+01 5.109e+01, threshold=7.557e+01, percent-clipped=0.0 2023-12-23 15:21:47,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1201213.3333333333, ans=0.09899494936611666 2023-12-23 15:21:57,326 INFO [train.py:886] (1/4) Epoch 38, batch 3850, loss[loss=0.01031, audio_tagging_loss=0.01031, over 22591.00 frames. ], tot_loss[loss=0.01182, audio_tagging_loss=0.01182, over 4942963.28 frames. ], batch size: 107, lr: 2.82e-03, grad_scale: 32.0 2023-12-23 15:21:59,843 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.85 vs. limit=15.0 2023-12-23 15:22:05,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1201280.0, ans=0.0 2023-12-23 15:22:14,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1201346.6666666667, ans=0.125 2023-12-23 15:22:18,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1201413.3333333333, ans=0.0 2023-12-23 15:22:26,275 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.27 vs. limit=15.0 2023-12-23 15:22:38,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1201546.6666666667, ans=0.125 2023-12-23 15:22:50,003 INFO [train.py:886] (1/4) Epoch 38, batch 3900, loss[loss=0.00955, audio_tagging_loss=0.00955, over 25000.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4950028.10 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 32.0 2023-12-23 15:23:04,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1201680.0, ans=0.125 2023-12-23 15:23:16,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1201746.6666666667, ans=0.125 2023-12-23 15:23:26,945 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.160e+01 3.573e+01 3.725e+01 3.871e+01 4.594e+01, threshold=7.451e+01, percent-clipped=0.0 2023-12-23 15:23:32,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1201880.0, ans=0.1 2023-12-23 15:23:37,560 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.11 vs. limit=15.0 2023-12-23 15:23:39,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1201880.0, ans=0.125 2023-12-23 15:23:41,551 INFO [train.py:886] (1/4) Epoch 38, batch 3950, loss[loss=0.01199, audio_tagging_loss=0.01199, over 25000.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4955390.74 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 32.0 2023-12-23 15:23:43,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1201946.6666666667, ans=0.125 2023-12-23 15:23:59,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1202013.3333333333, ans=0.0 2023-12-23 15:24:09,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1202080.0, ans=0.025 2023-12-23 15:24:33,216 INFO [train.py:886] (1/4) Epoch 38, batch 4000, loss[loss=0.01428, audio_tagging_loss=0.01428, over 25000.00 frames. ], tot_loss[loss=0.01164, audio_tagging_loss=0.01164, over 4960503.53 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:24:39,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1202280.0, ans=0.125 2023-12-23 15:24:40,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1202280.0, ans=0.07 2023-12-23 15:24:59,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1202413.3333333333, ans=0.1 2023-12-23 15:25:03,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1202480.0, ans=0.125 2023-12-23 15:25:04,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1202480.0, ans=0.5 2023-12-23 15:25:10,934 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.209e+01 3.581e+01 3.725e+01 3.897e+01 4.371e+01, threshold=7.451e+01, percent-clipped=0.0 2023-12-23 15:25:11,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1202480.0, ans=0.125 2023-12-23 15:25:13,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1202480.0, ans=0.125 2023-12-23 15:25:20,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1202546.6666666667, ans=0.1 2023-12-23 15:25:26,264 INFO [train.py:886] (1/4) Epoch 38, batch 4050, loss[loss=0.009536, audio_tagging_loss=0.009536, over 24060.00 frames. ], tot_loss[loss=0.01167, audio_tagging_loss=0.01167, over 4957354.85 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:25:26,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1202613.3333333333, ans=0.0 2023-12-23 15:25:28,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1202613.3333333333, ans=0.0 2023-12-23 15:25:28,927 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.16 vs. limit=15.0 2023-12-23 15:25:35,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1202680.0, ans=0.0 2023-12-23 15:25:48,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1202746.6666666667, ans=0.1 2023-12-23 15:25:58,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1202813.3333333333, ans=0.0 2023-12-23 15:26:00,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.82 vs. limit=15.0 2023-12-23 15:26:05,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1202880.0, ans=0.0 2023-12-23 15:26:16,822 INFO [train.py:886] (1/4) Epoch 38, batch 4100, loss[loss=0.0115, audio_tagging_loss=0.0115, over 23994.00 frames. ], tot_loss[loss=0.01173, audio_tagging_loss=0.01173, over 4949443.67 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:26:18,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1202946.6666666667, ans=0.125 2023-12-23 15:26:37,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1203080.0, ans=0.1 2023-12-23 15:26:49,286 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.28 vs. limit=6.0 2023-12-23 15:26:54,192 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.258e+01 3.608e+01 3.845e+01 3.998e+01 4.535e+01, threshold=7.691e+01, percent-clipped=0.0 2023-12-23 15:26:54,790 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.87 vs. limit=15.0 2023-12-23 15:26:56,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1203146.6666666667, ans=10.0 2023-12-23 15:26:56,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1203146.6666666667, ans=0.125 2023-12-23 15:27:08,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1203280.0, ans=0.125 2023-12-23 15:27:09,034 INFO [train.py:886] (1/4) Epoch 38, batch 4150, loss[loss=0.01197, audio_tagging_loss=0.01197, over 25000.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4947335.08 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:27:25,076 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.06 vs. limit=22.5 2023-12-23 15:27:34,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1203413.3333333333, ans=0.1 2023-12-23 15:27:47,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=1203480.0, ans=0.025 2023-12-23 15:27:49,702 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.48 vs. limit=5.0 2023-12-23 15:27:54,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1203546.6666666667, ans=0.1 2023-12-23 15:28:00,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1203613.3333333333, ans=0.1 2023-12-23 15:28:01,532 INFO [train.py:886] (1/4) Epoch 38, batch 4200, loss[loss=0.01177, audio_tagging_loss=0.01177, over 25000.00 frames. ], tot_loss[loss=0.01178, audio_tagging_loss=0.01178, over 4950576.48 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:28:01,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2023-12-23 15:28:05,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1203613.3333333333, ans=0.2 2023-12-23 15:28:21,700 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:28:26,811 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.73 vs. limit=6.0 2023-12-23 15:28:27,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1203746.6666666667, ans=0.2 2023-12-23 15:28:39,423 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.229e+01 3.592e+01 3.742e+01 3.876e+01 4.221e+01, threshold=7.484e+01, percent-clipped=0.0 2023-12-23 15:28:45,580 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.87 vs. limit=22.5 2023-12-23 15:28:52,645 INFO [train.py:886] (1/4) Epoch 38, batch 4250, loss[loss=0.01166, audio_tagging_loss=0.01166, over 25000.00 frames. ], tot_loss[loss=0.0117, audio_tagging_loss=0.0117, over 4956082.62 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:29:07,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1204013.3333333333, ans=0.125 2023-12-23 15:29:22,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1204080.0, ans=0.125 2023-12-23 15:29:25,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1204146.6666666667, ans=0.125 2023-12-23 15:29:39,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1204213.3333333333, ans=0.125 2023-12-23 15:29:45,928 INFO [train.py:886] (1/4) Epoch 38, batch 4300, loss[loss=0.007896, audio_tagging_loss=0.007896, over 23929.00 frames. ], tot_loss[loss=0.01164, audio_tagging_loss=0.01164, over 4959830.56 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:29:46,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1204280.0, ans=0.0 2023-12-23 15:29:51,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1204280.0, ans=0.125 2023-12-23 15:29:53,914 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.15 vs. limit=6.0 2023-12-23 15:30:06,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1204413.3333333333, ans=0.1 2023-12-23 15:30:06,962 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.56 vs. limit=15.0 2023-12-23 15:30:13,882 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.65 vs. limit=10.0 2023-12-23 15:30:21,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1204480.0, ans=0.0 2023-12-23 15:30:22,708 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.297e+01 3.587e+01 3.741e+01 3.975e+01 4.489e+01, threshold=7.482e+01, percent-clipped=0.0 2023-12-23 15:30:25,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1204546.6666666667, ans=0.0 2023-12-23 15:30:27,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1204546.6666666667, ans=0.05 2023-12-23 15:30:33,390 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:30:35,955 INFO [train.py:886] (1/4) Epoch 38, batch 4350, loss[loss=0.01148, audio_tagging_loss=0.01148, over 24750.00 frames. ], tot_loss[loss=0.01164, audio_tagging_loss=0.01164, over 4961253.60 frames. ], batch size: 99, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:30:37,686 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.48 vs. limit=5.0 2023-12-23 15:30:49,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1204680.0, ans=0.05 2023-12-23 15:30:49,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1204680.0, ans=0.125 2023-12-23 15:30:52,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1204680.0, ans=0.125 2023-12-23 15:30:57,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1204746.6666666667, ans=0.0 2023-12-23 15:31:20,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1204880.0, ans=0.0 2023-12-23 15:31:27,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1204946.6666666667, ans=0.1 2023-12-23 15:31:28,705 INFO [train.py:886] (1/4) Epoch 38, batch 4400, loss[loss=0.0131, audio_tagging_loss=0.0131, over 24750.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4954582.70 frames. ], batch size: 99, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:32:05,839 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.267e+01 3.555e+01 3.761e+01 3.984e+01 4.470e+01, threshold=7.522e+01, percent-clipped=0.0 2023-12-23 15:32:15,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1205213.3333333333, ans=0.07 2023-12-23 15:32:20,301 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.56 vs. limit=22.5 2023-12-23 15:32:21,247 INFO [train.py:886] (1/4) Epoch 38, batch 4450, loss[loss=0.01066, audio_tagging_loss=0.01066, over 24750.00 frames. ], tot_loss[loss=0.01181, audio_tagging_loss=0.01181, over 4948383.69 frames. ], batch size: 99, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:32:21,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1205280.0, ans=0.1 2023-12-23 15:32:27,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1205280.0, ans=0.1 2023-12-23 15:32:41,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1205413.3333333333, ans=0.0 2023-12-23 15:32:49,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.83 vs. limit=15.0 2023-12-23 15:32:55,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1205480.0, ans=0.0 2023-12-23 15:33:02,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=1205546.6666666667, ans=22.5 2023-12-23 15:33:07,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1205546.6666666667, ans=0.0 2023-12-23 15:33:08,644 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.94 vs. limit=15.0 2023-12-23 15:33:11,837 INFO [train.py:886] (1/4) Epoch 38, batch 4500, loss[loss=0.01142, audio_tagging_loss=0.01142, over 25000.00 frames. ], tot_loss[loss=0.01181, audio_tagging_loss=0.01181, over 4948964.40 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:33:31,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=1205680.0, ans=10.0 2023-12-23 15:33:48,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1205813.3333333333, ans=0.0 2023-12-23 15:33:49,548 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.168e+01 3.607e+01 3.784e+01 3.969e+01 5.476e+01, threshold=7.568e+01, percent-clipped=0.0 2023-12-23 15:33:57,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1205880.0, ans=0.1 2023-12-23 15:33:58,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1205880.0, ans=0.125 2023-12-23 15:34:03,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=1205880.0, ans=0.5 2023-12-23 15:34:04,995 INFO [train.py:886] (1/4) Epoch 38, batch 4550, loss[loss=0.0128, audio_tagging_loss=0.0128, over 25000.00 frames. ], tot_loss[loss=0.01173, audio_tagging_loss=0.01173, over 4954405.71 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:34:40,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1206146.6666666667, ans=0.0 2023-12-23 15:34:41,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1206146.6666666667, ans=0.0 2023-12-23 15:34:47,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1206213.3333333333, ans=0.125 2023-12-23 15:34:54,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1206213.3333333333, ans=0.125 2023-12-23 15:34:55,731 INFO [train.py:886] (1/4) Epoch 38, batch 4600, loss[loss=0.01362, audio_tagging_loss=0.01362, over 25000.00 frames. ], tot_loss[loss=0.01171, audio_tagging_loss=0.01171, over 4953397.26 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:34:59,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=15.0 2023-12-23 15:35:18,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1206413.3333333333, ans=0.05 2023-12-23 15:35:26,989 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.98 vs. limit=15.0 2023-12-23 15:35:29,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1206480.0, ans=0.2 2023-12-23 15:35:32,882 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.218e+01 3.570e+01 3.726e+01 3.915e+01 4.554e+01, threshold=7.452e+01, percent-clipped=0.0 2023-12-23 15:35:36,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1206546.6666666667, ans=0.0 2023-12-23 15:35:37,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=1206546.6666666667, ans=22.5 2023-12-23 15:35:46,164 INFO [train.py:886] (1/4) Epoch 38, batch 4650, loss[loss=0.009258, audio_tagging_loss=0.009258, over 25000.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4959552.03 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:35:57,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1206680.0, ans=0.125 2023-12-23 15:36:09,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1206746.6666666667, ans=0.125 2023-12-23 15:36:13,546 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:36:14,063 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.97 vs. limit=10.0 2023-12-23 15:36:14,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1206746.6666666667, ans=0.2 2023-12-23 15:36:24,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1206813.3333333333, ans=0.0 2023-12-23 15:36:36,155 INFO [train.py:886] (1/4) Epoch 38, batch 4700, loss[loss=0.01005, audio_tagging_loss=0.01005, over 24750.00 frames. ], tot_loss[loss=0.01173, audio_tagging_loss=0.01173, over 4954089.53 frames. ], batch size: 99, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:36:42,188 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.21 vs. limit=15.0 2023-12-23 15:36:58,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1207080.0, ans=0.125 2023-12-23 15:36:59,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1207080.0, ans=0.1 2023-12-23 15:37:02,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1207080.0, ans=0.09899494936611666 2023-12-23 15:37:10,718 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.408e+01 3.692e+01 3.810e+01 4.007e+01 4.373e+01, threshold=7.619e+01, percent-clipped=0.0 2023-12-23 15:37:11,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1207146.6666666667, ans=0.025 2023-12-23 15:37:13,945 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.04 vs. limit=22.5 2023-12-23 15:37:15,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1207213.3333333333, ans=0.2 2023-12-23 15:37:15,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1207213.3333333333, ans=0.0 2023-12-23 15:37:16,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1207213.3333333333, ans=0.125 2023-12-23 15:37:23,445 INFO [train.py:886] (1/4) Epoch 38, batch 4750, loss[loss=0.01304, audio_tagging_loss=0.01304, over 24750.00 frames. ], tot_loss[loss=0.01183, audio_tagging_loss=0.01183, over 4946154.82 frames. ], batch size: 99, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:37:57,304 INFO [train.py:886] (1/4) Epoch 39, batch 0, loss[loss=0.02597, audio_tagging_loss=0.02597, over 24021.00 frames. ], tot_loss[loss=0.02597, audio_tagging_loss=0.02597, over 24021.00 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:37:57,305 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 15:38:17,049 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6785, 2.7425, 3.6391, 3.6964], device='cuda:1') 2023-12-23 15:38:17,957 INFO [train.py:917] (1/4) Epoch 39, validation: loss=0.03421, audio_tagging_loss=0.03421, over 3737520.00 frames. 2023-12-23 15:38:17,958 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 15:38:31,849 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.78 vs. limit=22.5 2023-12-23 15:38:46,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1207520.0, ans=0.0 2023-12-23 15:38:50,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1207586.6666666667, ans=0.0 2023-12-23 15:38:54,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1207586.6666666667, ans=0.2 2023-12-23 15:38:55,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1207586.6666666667, ans=0.0 2023-12-23 15:39:01,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1207653.3333333333, ans=0.125 2023-12-23 15:39:10,799 INFO [train.py:886] (1/4) Epoch 39, batch 50, loss[loss=0.0157, audio_tagging_loss=0.0157, over 25000.00 frames. ], tot_loss[loss=0.01877, audio_tagging_loss=0.01877, over 1119651.94 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:39:13,222 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.32 vs. limit=22.5 2023-12-23 15:39:30,443 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.430e+01 4.003e+01 4.538e+01 5.178e+01 1.091e+02, threshold=9.075e+01, percent-clipped=8.0 2023-12-23 15:39:37,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=1207853.3333333333, ans=0.1 2023-12-23 15:40:01,769 INFO [train.py:886] (1/4) Epoch 39, batch 100, loss[loss=0.01832, audio_tagging_loss=0.01832, over 25000.00 frames. ], tot_loss[loss=0.0163, audio_tagging_loss=0.0163, over 1971954.23 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:40:02,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1208053.3333333333, ans=0.09899494936611666 2023-12-23 15:40:21,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1208120.0, ans=0.2 2023-12-23 15:40:26,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1208186.6666666667, ans=0.1 2023-12-23 15:40:34,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1208253.3333333333, ans=0.125 2023-12-23 15:40:53,368 INFO [train.py:886] (1/4) Epoch 39, batch 150, loss[loss=0.01092, audio_tagging_loss=0.01092, over 25000.00 frames. ], tot_loss[loss=0.01479, audio_tagging_loss=0.01479, over 2642583.01 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:41:03,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1208453.3333333333, ans=0.0 2023-12-23 15:41:14,327 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.383e+01 3.761e+01 3.990e+01 4.223e+01 5.067e+01, threshold=7.980e+01, percent-clipped=0.0 2023-12-23 15:41:34,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1208653.3333333333, ans=0.125 2023-12-23 15:41:45,774 INFO [train.py:886] (1/4) Epoch 39, batch 200, loss[loss=0.01213, audio_tagging_loss=0.01213, over 24022.00 frames. ], tot_loss[loss=0.01388, audio_tagging_loss=0.01388, over 3157107.58 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:41:52,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1208720.0, ans=0.0 2023-12-23 15:42:02,973 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:42:36,422 INFO [train.py:886] (1/4) Epoch 39, batch 250, loss[loss=0.01174, audio_tagging_loss=0.01174, over 21924.00 frames. ], tot_loss[loss=0.01339, audio_tagging_loss=0.01339, over 3552306.04 frames. ], batch size: 107, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:42:37,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1209053.3333333333, ans=0.125 2023-12-23 15:42:55,195 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.07 vs. limit=12.0 2023-12-23 15:42:57,484 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.324e+01 3.633e+01 3.790e+01 3.994e+01 4.386e+01, threshold=7.580e+01, percent-clipped=0.0 2023-12-23 15:43:03,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1209186.6666666667, ans=0.125 2023-12-23 15:43:14,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1209253.3333333333, ans=0.125 2023-12-23 15:43:16,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1209320.0, ans=0.2 2023-12-23 15:43:21,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1209320.0, ans=0.2 2023-12-23 15:43:28,121 INFO [train.py:886] (1/4) Epoch 39, batch 300, loss[loss=0.01038, audio_tagging_loss=0.01038, over 24750.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 3860923.47 frames. ], batch size: 99, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:43:42,545 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.62 vs. limit=15.0 2023-12-23 15:43:47,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1209520.0, ans=0.1 2023-12-23 15:43:56,505 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.35 vs. limit=15.0 2023-12-23 15:44:19,129 INFO [train.py:886] (1/4) Epoch 39, batch 350, loss[loss=0.01333, audio_tagging_loss=0.01333, over 24750.00 frames. ], tot_loss[loss=0.01277, audio_tagging_loss=0.01277, over 4099767.71 frames. ], batch size: 99, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:44:20,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1209720.0, ans=0.1 2023-12-23 15:44:25,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1209720.0, ans=0.0 2023-12-23 15:44:32,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1209786.6666666667, ans=0.0 2023-12-23 15:44:36,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1209786.6666666667, ans=0.125 2023-12-23 15:44:39,344 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.294e+01 3.641e+01 3.787e+01 3.947e+01 4.798e+01, threshold=7.575e+01, percent-clipped=0.0 2023-12-23 15:44:43,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1209853.3333333333, ans=0.0 2023-12-23 15:44:44,710 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.63 vs. limit=10.0 2023-12-23 15:44:47,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1209853.3333333333, ans=0.125 2023-12-23 15:44:47,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1209853.3333333333, ans=0.125 2023-12-23 15:45:00,796 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.52 vs. limit=15.0 2023-12-23 15:45:09,626 INFO [train.py:886] (1/4) Epoch 39, batch 400, loss[loss=0.009963, audio_tagging_loss=0.009963, over 24750.00 frames. ], tot_loss[loss=0.01242, audio_tagging_loss=0.01242, over 4290196.89 frames. ], batch size: 99, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:45:16,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1210053.3333333333, ans=0.0 2023-12-23 15:45:22,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1210120.0, ans=0.0 2023-12-23 15:45:33,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=1210186.6666666667, ans=0.5 2023-12-23 15:45:57,354 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.25 vs. limit=15.0 2023-12-23 15:46:00,631 INFO [train.py:886] (1/4) Epoch 39, batch 450, loss[loss=0.01037, audio_tagging_loss=0.01037, over 25000.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 4439287.42 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:46:02,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=22.5 2023-12-23 15:46:13,457 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.17 vs. limit=15.0 2023-12-23 15:46:14,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1210453.3333333333, ans=0.0 2023-12-23 15:46:18,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1210453.3333333333, ans=0.125 2023-12-23 15:46:20,293 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.320e+01 3.614e+01 3.726e+01 3.946e+01 4.381e+01, threshold=7.452e+01, percent-clipped=0.0 2023-12-23 15:46:23,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1210520.0, ans=0.2 2023-12-23 15:46:31,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1210586.6666666667, ans=0.125 2023-12-23 15:46:31,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1210586.6666666667, ans=0.0 2023-12-23 15:46:51,852 INFO [train.py:886] (1/4) Epoch 39, batch 500, loss[loss=0.01231, audio_tagging_loss=0.01231, over 25000.00 frames. ], tot_loss[loss=0.01204, audio_tagging_loss=0.01204, over 4554563.15 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:46:53,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=1210720.0, ans=0.2 2023-12-23 15:47:08,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1210786.6666666667, ans=0.2 2023-12-23 15:47:15,708 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.95 vs. limit=15.0 2023-12-23 15:47:40,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1210986.6666666667, ans=0.125 2023-12-23 15:47:43,611 INFO [train.py:886] (1/4) Epoch 39, batch 550, loss[loss=0.00846, audio_tagging_loss=0.00846, over 24023.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4648430.10 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:48:00,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1211120.0, ans=0.125 2023-12-23 15:48:03,899 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.249e+01 3.629e+01 3.792e+01 3.921e+01 4.570e+01, threshold=7.585e+01, percent-clipped=0.0 2023-12-23 15:48:11,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1211186.6666666667, ans=0.1 2023-12-23 15:48:14,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1211253.3333333333, ans=0.125 2023-12-23 15:48:17,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1211253.3333333333, ans=0.125 2023-12-23 15:48:18,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1211253.3333333333, ans=0.1 2023-12-23 15:48:35,006 INFO [train.py:886] (1/4) Epoch 39, batch 600, loss[loss=0.01241, audio_tagging_loss=0.01241, over 24750.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4715420.75 frames. ], batch size: 99, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:48:38,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1211386.6666666667, ans=0.0 2023-12-23 15:48:39,022 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.81 vs. limit=15.0 2023-12-23 15:48:46,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1211453.3333333333, ans=10.0 2023-12-23 15:48:56,676 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.61 vs. limit=10.0 2023-12-23 15:48:59,172 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.79 vs. limit=12.0 2023-12-23 15:49:05,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.89 vs. limit=15.0 2023-12-23 15:49:17,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1211653.3333333333, ans=0.125 2023-12-23 15:49:24,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1211653.3333333333, ans=0.125 2023-12-23 15:49:24,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1211653.3333333333, ans=0.125 2023-12-23 15:49:25,998 INFO [train.py:886] (1/4) Epoch 39, batch 650, loss[loss=0.01415, audio_tagging_loss=0.01415, over 24750.00 frames. ], tot_loss[loss=0.01198, audio_tagging_loss=0.01198, over 4756792.19 frames. ], batch size: 99, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:49:30,911 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.51 vs. limit=15.0 2023-12-23 15:49:47,859 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.262e+01 3.662e+01 3.873e+01 3.983e+01 4.612e+01, threshold=7.746e+01, percent-clipped=0.0 2023-12-23 15:49:50,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1211853.3333333333, ans=0.125 2023-12-23 15:49:55,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1211853.3333333333, ans=0.125 2023-12-23 15:50:09,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1211986.6666666667, ans=0.125 2023-12-23 15:50:14,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1211986.6666666667, ans=0.0 2023-12-23 15:50:19,294 INFO [train.py:886] (1/4) Epoch 39, batch 700, loss[loss=0.01267, audio_tagging_loss=0.01267, over 21755.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4792521.36 frames. ], batch size: 107, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:50:19,864 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.25 vs. limit=22.5 2023-12-23 15:50:21,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1212053.3333333333, ans=0.125 2023-12-23 15:50:25,643 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.71 vs. limit=10.0 2023-12-23 15:50:26,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1212053.3333333333, ans=0.0 2023-12-23 15:50:38,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1212120.0, ans=0.0 2023-12-23 15:50:40,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1212186.6666666667, ans=0.1 2023-12-23 15:50:46,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1212186.6666666667, ans=0.1 2023-12-23 15:51:00,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1212320.0, ans=0.0 2023-12-23 15:51:08,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1212320.0, ans=0.125 2023-12-23 15:51:10,652 INFO [train.py:886] (1/4) Epoch 39, batch 750, loss[loss=0.01268, audio_tagging_loss=0.01268, over 25000.00 frames. ], tot_loss[loss=0.0119, audio_tagging_loss=0.0119, over 4824959.52 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:51:21,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1212453.3333333333, ans=0.0 2023-12-23 15:51:28,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1212453.3333333333, ans=0.125 2023-12-23 15:51:31,119 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.158e+01 3.606e+01 3.797e+01 3.983e+01 4.805e+01, threshold=7.593e+01, percent-clipped=0.0 2023-12-23 15:51:35,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1212520.0, ans=0.125 2023-12-23 15:51:42,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1212586.6666666667, ans=0.125 2023-12-23 15:51:48,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1212586.6666666667, ans=0.0 2023-12-23 15:51:54,384 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2023-12-23 15:52:02,556 INFO [train.py:886] (1/4) Epoch 39, batch 800, loss[loss=0.01279, audio_tagging_loss=0.01279, over 25000.00 frames. ], tot_loss[loss=0.01186, audio_tagging_loss=0.01186, over 4860542.33 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:52:06,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1212720.0, ans=0.1 2023-12-23 15:52:29,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1212853.3333333333, ans=0.125 2023-12-23 15:52:38,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1212920.0, ans=0.1 2023-12-23 15:52:43,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1212986.6666666667, ans=0.125 2023-12-23 15:52:46,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1212986.6666666667, ans=0.125 2023-12-23 15:52:53,719 INFO [train.py:886] (1/4) Epoch 39, batch 850, loss[loss=0.01201, audio_tagging_loss=0.01201, over 25000.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4879398.85 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 64.0 2023-12-23 15:52:58,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1213053.3333333333, ans=0.09899494936611666 2023-12-23 15:53:00,177 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.56 vs. limit=6.0 2023-12-23 15:53:04,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1213120.0, ans=0.1 2023-12-23 15:53:09,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1213120.0, ans=0.04949747468305833 2023-12-23 15:53:14,221 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.254e+01 3.592e+01 3.779e+01 3.968e+01 4.434e+01, threshold=7.558e+01, percent-clipped=0.0 2023-12-23 15:53:14,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1213186.6666666667, ans=0.0 2023-12-23 15:53:19,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1213186.6666666667, ans=0.125 2023-12-23 15:53:28,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1213253.3333333333, ans=0.125 2023-12-23 15:53:36,326 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.63 vs. limit=15.0 2023-12-23 15:53:45,643 INFO [train.py:886] (1/4) Epoch 39, batch 900, loss[loss=0.01257, audio_tagging_loss=0.01257, over 25000.00 frames. ], tot_loss[loss=0.01171, audio_tagging_loss=0.01171, over 4897803.22 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 64.0 2023-12-23 15:53:56,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1213453.3333333333, ans=0.125 2023-12-23 15:54:01,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1213453.3333333333, ans=0.125 2023-12-23 15:54:38,408 INFO [train.py:886] (1/4) Epoch 39, batch 950, loss[loss=0.01253, audio_tagging_loss=0.01253, over 24750.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4903500.52 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 15:54:48,973 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:54:49,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1213786.6666666667, ans=0.1 2023-12-23 15:54:57,965 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.310e+01 3.605e+01 3.794e+01 3.970e+01 4.761e+01, threshold=7.588e+01, percent-clipped=0.0 2023-12-23 15:55:06,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1213853.3333333333, ans=0.125 2023-12-23 15:55:09,984 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:55:24,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1213986.6666666667, ans=0.2 2023-12-23 15:55:25,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1213986.6666666667, ans=0.125 2023-12-23 15:55:28,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1213986.6666666667, ans=0.1 2023-12-23 15:55:28,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.93 vs. limit=6.0 2023-12-23 15:55:29,698 INFO [train.py:886] (1/4) Epoch 39, batch 1000, loss[loss=0.01342, audio_tagging_loss=0.01342, over 24750.00 frames. ], tot_loss[loss=0.01182, audio_tagging_loss=0.01182, over 4912539.95 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 15:55:50,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1214186.6666666667, ans=0.125 2023-12-23 15:56:02,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1214253.3333333333, ans=0.1 2023-12-23 15:56:04,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1214253.3333333333, ans=0.125 2023-12-23 15:56:12,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1214320.0, ans=0.025 2023-12-23 15:56:21,328 INFO [train.py:886] (1/4) Epoch 39, batch 1050, loss[loss=0.009489, audio_tagging_loss=0.009489, over 24750.00 frames. ], tot_loss[loss=0.01173, audio_tagging_loss=0.01173, over 4919940.26 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 15:56:31,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1214453.3333333333, ans=0.2 2023-12-23 15:56:32,041 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.09 vs. limit=15.0 2023-12-23 15:56:42,238 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.02 vs. limit=15.0 2023-12-23 15:56:42,531 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.325e+01 3.673e+01 3.795e+01 3.964e+01 4.762e+01, threshold=7.590e+01, percent-clipped=0.0 2023-12-23 15:56:48,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1214520.0, ans=0.0 2023-12-23 15:56:59,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1214586.6666666667, ans=0.125 2023-12-23 15:57:10,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1214653.3333333333, ans=0.125 2023-12-23 15:57:13,368 INFO [train.py:886] (1/4) Epoch 39, batch 1100, loss[loss=0.01445, audio_tagging_loss=0.01445, over 25000.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4929626.95 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 15:57:23,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1214786.6666666667, ans=0.0 2023-12-23 15:57:28,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1214786.6666666667, ans=0.0 2023-12-23 15:57:35,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1214853.3333333333, ans=0.1 2023-12-23 15:57:40,512 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.26 vs. limit=6.0 2023-12-23 15:57:47,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1214920.0, ans=0.0 2023-12-23 15:57:55,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1214986.6666666667, ans=0.015 2023-12-23 15:57:55,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1214986.6666666667, ans=0.1 2023-12-23 15:57:56,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1214986.6666666667, ans=0.125 2023-12-23 15:58:00,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1214986.6666666667, ans=0.2 2023-12-23 15:58:02,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.53 vs. limit=15.0 2023-12-23 15:58:03,993 INFO [train.py:886] (1/4) Epoch 39, batch 1150, loss[loss=0.009933, audio_tagging_loss=0.009933, over 25000.00 frames. ], tot_loss[loss=0.01162, audio_tagging_loss=0.01162, over 4940520.72 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 15:58:20,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1215120.0, ans=0.2 2023-12-23 15:58:25,642 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.044e+01 3.585e+01 3.703e+01 3.926e+01 4.731e+01, threshold=7.406e+01, percent-clipped=0.0 2023-12-23 15:58:43,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1215253.3333333333, ans=0.1 2023-12-23 15:58:55,523 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.79 vs. limit=22.5 2023-12-23 15:58:56,854 INFO [train.py:886] (1/4) Epoch 39, batch 1200, loss[loss=0.01365, audio_tagging_loss=0.01365, over 25000.00 frames. ], tot_loss[loss=0.01166, audio_tagging_loss=0.01166, over 4943452.26 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 15:58:58,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1215386.6666666667, ans=0.125 2023-12-23 15:59:20,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1215520.0, ans=0.125 2023-12-23 15:59:24,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1215520.0, ans=0.125 2023-12-23 15:59:29,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1215586.6666666667, ans=0.1 2023-12-23 15:59:37,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1215653.3333333333, ans=0.125 2023-12-23 15:59:38,962 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:59:39,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1215653.3333333333, ans=0.125 2023-12-23 15:59:41,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1215653.3333333333, ans=0.0 2023-12-23 15:59:47,991 INFO [train.py:886] (1/4) Epoch 39, batch 1250, loss[loss=0.009977, audio_tagging_loss=0.009977, over 24750.00 frames. ], tot_loss[loss=0.01173, audio_tagging_loss=0.01173, over 4936809.05 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 15:59:59,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1215786.6666666667, ans=0.125 2023-12-23 16:00:08,923 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.341e+01 3.597e+01 3.795e+01 3.980e+01 4.718e+01, threshold=7.591e+01, percent-clipped=0.0 2023-12-23 16:00:38,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1215986.6666666667, ans=0.125 2023-12-23 16:00:40,165 INFO [train.py:886] (1/4) Epoch 39, batch 1300, loss[loss=0.009552, audio_tagging_loss=0.009552, over 25000.00 frames. ], tot_loss[loss=0.01178, audio_tagging_loss=0.01178, over 4931741.46 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:00:57,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=1216120.0, ans=0.2 2023-12-23 16:01:19,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1216253.3333333333, ans=0.1 2023-12-23 16:01:30,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1216320.0, ans=0.1 2023-12-23 16:01:32,454 INFO [train.py:886] (1/4) Epoch 39, batch 1350, loss[loss=0.01359, audio_tagging_loss=0.01359, over 24750.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4931804.34 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:01:38,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1216386.6666666667, ans=0.125 2023-12-23 16:01:42,318 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.96 vs. limit=15.0 2023-12-23 16:01:42,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1216453.3333333333, ans=0.1 2023-12-23 16:01:51,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1216453.3333333333, ans=0.125 2023-12-23 16:01:52,797 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.226e+01 3.613e+01 3.759e+01 3.931e+01 4.440e+01, threshold=7.518e+01, percent-clipped=0.0 2023-12-23 16:01:53,419 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.44 vs. limit=15.0 2023-12-23 16:02:04,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1216586.6666666667, ans=0.2 2023-12-23 16:02:24,083 INFO [train.py:886] (1/4) Epoch 39, batch 1400, loss[loss=0.01179, audio_tagging_loss=0.01179, over 25000.00 frames. ], tot_loss[loss=0.01163, audio_tagging_loss=0.01163, over 4933570.55 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:02:25,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1216720.0, ans=0.125 2023-12-23 16:02:25,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1216720.0, ans=0.125 2023-12-23 16:02:34,522 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.48 vs. limit=6.0 2023-12-23 16:02:48,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1216853.3333333333, ans=0.0 2023-12-23 16:02:57,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1216920.0, ans=0.2 2023-12-23 16:03:16,299 INFO [train.py:886] (1/4) Epoch 39, batch 1450, loss[loss=0.008901, audio_tagging_loss=0.008901, over 25000.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4932345.95 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:03:36,563 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.177e+01 3.585e+01 3.717e+01 3.896e+01 4.835e+01, threshold=7.434e+01, percent-clipped=0.0 2023-12-23 16:03:36,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1217186.6666666667, ans=0.2 2023-12-23 16:04:06,459 INFO [train.py:886] (1/4) Epoch 39, batch 1500, loss[loss=0.01283, audio_tagging_loss=0.01283, over 25000.00 frames. ], tot_loss[loss=0.01162, audio_tagging_loss=0.01162, over 4936060.85 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:04:09,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1217386.6666666667, ans=0.125 2023-12-23 16:04:28,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1217520.0, ans=0.1 2023-12-23 16:04:43,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1217586.6666666667, ans=0.1 2023-12-23 16:04:50,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1217653.3333333333, ans=0.125 2023-12-23 16:04:57,996 INFO [train.py:886] (1/4) Epoch 39, batch 1550, loss[loss=0.01081, audio_tagging_loss=0.01081, over 24750.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4937429.03 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:05:05,809 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.95 vs. limit=10.0 2023-12-23 16:05:18,864 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.273e+01 3.671e+01 3.823e+01 4.043e+01 4.664e+01, threshold=7.647e+01, percent-clipped=0.0 2023-12-23 16:05:26,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1217853.3333333333, ans=0.125 2023-12-23 16:05:28,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1217920.0, ans=0.0 2023-12-23 16:05:33,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1217920.0, ans=0.0 2023-12-23 16:05:49,421 INFO [train.py:886] (1/4) Epoch 39, batch 1600, loss[loss=0.009859, audio_tagging_loss=0.009859, over 24750.00 frames. ], tot_loss[loss=0.01184, audio_tagging_loss=0.01184, over 4938432.59 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:05:52,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1218053.3333333333, ans=0.1 2023-12-23 16:06:22,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1218253.3333333333, ans=0.2 2023-12-23 16:06:37,298 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.84 vs. limit=15.0 2023-12-23 16:06:40,796 INFO [train.py:886] (1/4) Epoch 39, batch 1650, loss[loss=0.01157, audio_tagging_loss=0.01157, over 24750.00 frames. ], tot_loss[loss=0.01178, audio_tagging_loss=0.01178, over 4937349.40 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:06:55,439 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.44 vs. limit=5.0 2023-12-23 16:07:00,998 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.195e+01 3.628e+01 3.774e+01 3.923e+01 5.343e+01, threshold=7.548e+01, percent-clipped=0.0 2023-12-23 16:07:15,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1218586.6666666667, ans=0.125 2023-12-23 16:07:23,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1218653.3333333333, ans=0.125 2023-12-23 16:07:27,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1218653.3333333333, ans=0.2 2023-12-23 16:07:31,238 INFO [train.py:886] (1/4) Epoch 39, batch 1700, loss[loss=0.01192, audio_tagging_loss=0.01192, over 25000.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4940460.12 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:07:31,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1218720.0, ans=0.125 2023-12-23 16:07:39,874 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.56 vs. limit=22.5 2023-12-23 16:07:50,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1218786.6666666667, ans=0.0 2023-12-23 16:07:51,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1218853.3333333333, ans=0.125 2023-12-23 16:07:54,776 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.19 vs. limit=22.5 2023-12-23 16:07:58,783 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.77 vs. limit=8.0 2023-12-23 16:08:06,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1218920.0, ans=0.2 2023-12-23 16:08:23,647 INFO [train.py:886] (1/4) Epoch 39, batch 1750, loss[loss=0.01057, audio_tagging_loss=0.01057, over 25000.00 frames. ], tot_loss[loss=0.01162, audio_tagging_loss=0.01162, over 4944764.48 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:08:24,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1219053.3333333333, ans=0.125 2023-12-23 16:08:43,373 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.314e+01 3.573e+01 3.705e+01 3.928e+01 4.407e+01, threshold=7.410e+01, percent-clipped=0.0 2023-12-23 16:08:43,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1219186.6666666667, ans=0.125 2023-12-23 16:08:54,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1219253.3333333333, ans=0.125 2023-12-23 16:09:00,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1219253.3333333333, ans=10.0 2023-12-23 16:09:13,951 INFO [train.py:886] (1/4) Epoch 39, batch 1800, loss[loss=0.01161, audio_tagging_loss=0.01161, over 24750.00 frames. ], tot_loss[loss=0.01166, audio_tagging_loss=0.01166, over 4950252.75 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:09:18,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1219386.6666666667, ans=0.125 2023-12-23 16:09:26,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1219453.3333333333, ans=0.1 2023-12-23 16:09:29,179 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.80 vs. limit=10.0 2023-12-23 16:09:50,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1219586.6666666667, ans=0.2 2023-12-23 16:10:05,624 INFO [train.py:886] (1/4) Epoch 39, batch 1850, loss[loss=0.01253, audio_tagging_loss=0.01253, over 24750.00 frames. ], tot_loss[loss=0.0117, audio_tagging_loss=0.0117, over 4950715.93 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:10:10,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1219720.0, ans=0.125 2023-12-23 16:10:25,963 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.296e+01 3.679e+01 3.832e+01 4.036e+01 5.101e+01, threshold=7.663e+01, percent-clipped=0.0 2023-12-23 16:10:39,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1219920.0, ans=0.0 2023-12-23 16:10:41,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1219920.0, ans=0.1 2023-12-23 16:10:46,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1219986.6666666667, ans=0.1 2023-12-23 16:10:57,220 INFO [train.py:886] (1/4) Epoch 39, batch 1900, loss[loss=0.01385, audio_tagging_loss=0.01385, over 24750.00 frames. ], tot_loss[loss=0.0118, audio_tagging_loss=0.0118, over 4951509.94 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:11:25,006 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:11:47,546 INFO [train.py:886] (1/4) Epoch 39, batch 1950, loss[loss=0.01235, audio_tagging_loss=0.01235, over 25000.00 frames. ], tot_loss[loss=0.01173, audio_tagging_loss=0.01173, over 4950206.63 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:11:50,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1220386.6666666667, ans=0.09899494936611666 2023-12-23 16:12:06,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1220453.3333333333, ans=0.2 2023-12-23 16:12:07,898 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.088e+01 3.588e+01 3.763e+01 3.930e+01 4.449e+01, threshold=7.526e+01, percent-clipped=0.0 2023-12-23 16:12:12,274 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.76 vs. limit=12.0 2023-12-23 16:12:15,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1220520.0, ans=0.0 2023-12-23 16:12:38,738 INFO [train.py:886] (1/4) Epoch 39, batch 2000, loss[loss=0.01123, audio_tagging_loss=0.01123, over 24750.00 frames. ], tot_loss[loss=0.01164, audio_tagging_loss=0.01164, over 4944786.16 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:12:41,396 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.01 vs. limit=15.0 2023-12-23 16:12:45,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=1220720.0, ans=0.2 2023-12-23 16:12:55,602 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.84 vs. limit=15.0 2023-12-23 16:13:29,338 INFO [train.py:886] (1/4) Epoch 39, batch 2050, loss[loss=0.0117, audio_tagging_loss=0.0117, over 25000.00 frames. ], tot_loss[loss=0.01155, audio_tagging_loss=0.01155, over 4949818.17 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:13:44,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1221120.0, ans=0.0 2023-12-23 16:13:46,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1221120.0, ans=0.2 2023-12-23 16:13:47,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1221120.0, ans=0.1 2023-12-23 16:13:51,333 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.148e+01 3.589e+01 3.729e+01 3.908e+01 4.611e+01, threshold=7.458e+01, percent-clipped=0.0 2023-12-23 16:14:06,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1221253.3333333333, ans=0.125 2023-12-23 16:14:13,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1221320.0, ans=0.125 2023-12-23 16:14:23,176 INFO [train.py:886] (1/4) Epoch 39, batch 2100, loss[loss=0.009843, audio_tagging_loss=0.009843, over 25000.00 frames. ], tot_loss[loss=0.01154, audio_tagging_loss=0.01154, over 4950692.93 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:14:23,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1221386.6666666667, ans=0.125 2023-12-23 16:14:28,105 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:14:32,806 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:14:34,293 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.83 vs. limit=15.0 2023-12-23 16:14:46,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1221520.0, ans=0.125 2023-12-23 16:15:14,108 INFO [train.py:886] (1/4) Epoch 39, batch 2150, loss[loss=0.008691, audio_tagging_loss=0.008691, over 24047.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4951429.94 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:15:15,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1221720.0, ans=0.1 2023-12-23 16:15:20,111 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2023-12-23 16:15:26,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1221786.6666666667, ans=0.125 2023-12-23 16:15:30,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1221786.6666666667, ans=10.0 2023-12-23 16:15:33,841 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.319e+01 3.644e+01 3.753e+01 3.947e+01 5.073e+01, threshold=7.506e+01, percent-clipped=0.0 2023-12-23 16:15:40,618 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.18 vs. limit=15.0 2023-12-23 16:15:46,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1221920.0, ans=0.125 2023-12-23 16:16:02,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1221986.6666666667, ans=0.0 2023-12-23 16:16:04,610 INFO [train.py:886] (1/4) Epoch 39, batch 2200, loss[loss=0.01344, audio_tagging_loss=0.01344, over 24943.00 frames. ], tot_loss[loss=0.01172, audio_tagging_loss=0.01172, over 4948405.07 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:16:23,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1222120.0, ans=0.1 2023-12-23 16:16:31,434 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.95 vs. limit=15.0 2023-12-23 16:16:38,655 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.36 vs. limit=15.0 2023-12-23 16:16:39,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=1222253.3333333333, ans=0.05 2023-12-23 16:16:57,386 INFO [train.py:886] (1/4) Epoch 39, batch 2250, loss[loss=0.01384, audio_tagging_loss=0.01384, over 25000.00 frames. ], tot_loss[loss=0.01179, audio_tagging_loss=0.01179, over 4939182.13 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:17:17,743 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.255e+01 3.644e+01 3.804e+01 3.965e+01 5.338e+01, threshold=7.608e+01, percent-clipped=0.0 2023-12-23 16:17:33,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1222586.6666666667, ans=0.125 2023-12-23 16:17:49,033 INFO [train.py:886] (1/4) Epoch 39, batch 2300, loss[loss=0.009296, audio_tagging_loss=0.009296, over 24060.00 frames. ], tot_loss[loss=0.01167, audio_tagging_loss=0.01167, over 4934410.95 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:17:54,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1222720.0, ans=0.1 2023-12-23 16:18:02,233 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.14 vs. limit=15.0 2023-12-23 16:18:12,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1222853.3333333333, ans=0.125 2023-12-23 16:18:13,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1222853.3333333333, ans=0.125 2023-12-23 16:18:16,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1222853.3333333333, ans=0.125 2023-12-23 16:18:18,768 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.92 vs. limit=12.0 2023-12-23 16:18:24,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1222920.0, ans=0.0 2023-12-23 16:18:25,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1222920.0, ans=0.0 2023-12-23 16:18:26,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=1222920.0, ans=0.025 2023-12-23 16:18:29,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1222920.0, ans=0.0 2023-12-23 16:18:40,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1223053.3333333333, ans=0.0 2023-12-23 16:18:41,202 INFO [train.py:886] (1/4) Epoch 39, batch 2350, loss[loss=0.01077, audio_tagging_loss=0.01077, over 25000.00 frames. ], tot_loss[loss=0.01164, audio_tagging_loss=0.01164, over 4942662.10 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:19:02,465 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.209e+01 3.576e+01 3.750e+01 3.912e+01 4.537e+01, threshold=7.499e+01, percent-clipped=0.0 2023-12-23 16:19:02,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1223186.6666666667, ans=0.125 2023-12-23 16:19:02,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1223186.6666666667, ans=0.125 2023-12-23 16:19:05,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1223186.6666666667, ans=0.07 2023-12-23 16:19:32,840 INFO [train.py:886] (1/4) Epoch 39, batch 2400, loss[loss=0.01104, audio_tagging_loss=0.01104, over 21981.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4938085.15 frames. ], batch size: 107, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:19:33,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1223386.6666666667, ans=0.125 2023-12-23 16:19:37,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1223386.6666666667, ans=0.2 2023-12-23 16:19:41,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1223386.6666666667, ans=0.125 2023-12-23 16:19:44,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1223453.3333333333, ans=0.2 2023-12-23 16:19:48,044 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.58 vs. limit=15.0 2023-12-23 16:20:24,850 INFO [train.py:886] (1/4) Epoch 39, batch 2450, loss[loss=0.01074, audio_tagging_loss=0.01074, over 25000.00 frames. ], tot_loss[loss=0.01163, audio_tagging_loss=0.01163, over 4943226.52 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:20:28,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1223720.0, ans=0.125 2023-12-23 16:20:44,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1223786.6666666667, ans=0.125 2023-12-23 16:20:45,905 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.280e+01 3.674e+01 3.800e+01 3.952e+01 4.172e+01, threshold=7.601e+01, percent-clipped=0.0 2023-12-23 16:20:50,058 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.28 vs. limit=15.0 2023-12-23 16:20:55,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1223920.0, ans=0.125 2023-12-23 16:21:05,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1223920.0, ans=0.0 2023-12-23 16:21:11,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1223986.6666666667, ans=0.125 2023-12-23 16:21:15,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1223986.6666666667, ans=0.125 2023-12-23 16:21:17,304 INFO [train.py:886] (1/4) Epoch 39, batch 2500, loss[loss=0.01117, audio_tagging_loss=0.01117, over 24750.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4934654.76 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:21:19,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1224053.3333333333, ans=0.1 2023-12-23 16:21:24,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1224053.3333333333, ans=0.125 2023-12-23 16:22:09,655 INFO [train.py:886] (1/4) Epoch 39, batch 2550, loss[loss=0.01086, audio_tagging_loss=0.01086, over 24750.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4935627.18 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:22:15,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1224386.6666666667, ans=0.0 2023-12-23 16:22:15,697 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.08 vs. limit=15.0 2023-12-23 16:22:21,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1224453.3333333333, ans=0.0 2023-12-23 16:22:24,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1224453.3333333333, ans=0.1 2023-12-23 16:22:24,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1224453.3333333333, ans=0.125 2023-12-23 16:22:30,077 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.292e+01 3.640e+01 3.880e+01 4.066e+01 5.144e+01, threshold=7.760e+01, percent-clipped=0.0 2023-12-23 16:22:32,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1224520.0, ans=0.1 2023-12-23 16:22:58,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1224653.3333333333, ans=0.125 2023-12-23 16:22:58,483 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.54 vs. limit=15.0 2023-12-23 16:23:01,524 INFO [train.py:886] (1/4) Epoch 39, batch 2600, loss[loss=0.01022, audio_tagging_loss=0.01022, over 25000.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4934581.64 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:23:05,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1224720.0, ans=0.95 2023-12-23 16:23:27,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1224853.3333333333, ans=0.125 2023-12-23 16:23:54,147 INFO [train.py:886] (1/4) Epoch 39, batch 2650, loss[loss=0.01138, audio_tagging_loss=0.01138, over 25000.00 frames. ], tot_loss[loss=0.01162, audio_tagging_loss=0.01162, over 4943869.26 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:23:59,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1225053.3333333333, ans=0.035 2023-12-23 16:24:02,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1225120.0, ans=0.125 2023-12-23 16:24:14,558 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.356e+01 3.623e+01 3.755e+01 3.962e+01 4.665e+01, threshold=7.509e+01, percent-clipped=0.0 2023-12-23 16:24:22,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1225186.6666666667, ans=0.125 2023-12-23 16:24:29,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1225253.3333333333, ans=0.1 2023-12-23 16:24:38,576 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.86 vs. limit=22.5 2023-12-23 16:24:46,268 INFO [train.py:886] (1/4) Epoch 39, batch 2700, loss[loss=0.01122, audio_tagging_loss=0.01122, over 25000.00 frames. ], tot_loss[loss=0.01161, audio_tagging_loss=0.01161, over 4947618.41 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:24:54,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1225386.6666666667, ans=0.1 2023-12-23 16:25:04,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=1225453.3333333333, ans=15.0 2023-12-23 16:25:23,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=1225586.6666666667, ans=0.95 2023-12-23 16:25:30,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1225653.3333333333, ans=0.125 2023-12-23 16:25:30,535 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:25:31,756 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2023-12-23 16:25:35,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1225653.3333333333, ans=0.125 2023-12-23 16:25:37,944 INFO [train.py:886] (1/4) Epoch 39, batch 2750, loss[loss=0.01231, audio_tagging_loss=0.01231, over 24918.00 frames. ], tot_loss[loss=0.01154, audio_tagging_loss=0.01154, over 4948112.25 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:25:38,281 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=22.5 2023-12-23 16:25:40,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1225720.0, ans=0.2 2023-12-23 16:25:50,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.38 vs. limit=22.5 2023-12-23 16:25:59,159 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.274e+01 3.579e+01 3.766e+01 3.936e+01 4.564e+01, threshold=7.531e+01, percent-clipped=0.0 2023-12-23 16:26:10,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1225920.0, ans=0.1 2023-12-23 16:26:14,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1225920.0, ans=0.1 2023-12-23 16:26:16,428 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.35 vs. limit=15.0 2023-12-23 16:26:18,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1225986.6666666667, ans=0.0 2023-12-23 16:26:18,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1225986.6666666667, ans=0.2 2023-12-23 16:26:18,322 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.64 vs. limit=15.0 2023-12-23 16:26:20,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1225986.6666666667, ans=0.0 2023-12-23 16:26:30,269 INFO [train.py:886] (1/4) Epoch 39, batch 2800, loss[loss=0.01292, audio_tagging_loss=0.01292, over 24941.00 frames. ], tot_loss[loss=0.01161, audio_tagging_loss=0.01161, over 4948113.15 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:26:42,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1226120.0, ans=0.95 2023-12-23 16:27:09,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.whiten.whitening_limit, batch_count=1226253.3333333333, ans=12.0 2023-12-23 16:27:15,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1226320.0, ans=0.0 2023-12-23 16:27:20,925 INFO [train.py:886] (1/4) Epoch 39, batch 2850, loss[loss=0.01155, audio_tagging_loss=0.01155, over 24750.00 frames. ], tot_loss[loss=0.0117, audio_tagging_loss=0.0117, over 4942588.93 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:27:23,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1226386.6666666667, ans=0.1 2023-12-23 16:27:32,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1226453.3333333333, ans=0.125 2023-12-23 16:27:43,715 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.319e+01 3.655e+01 3.774e+01 3.936e+01 6.681e+01, threshold=7.549e+01, percent-clipped=0.0 2023-12-23 16:28:03,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1226653.3333333333, ans=0.125 2023-12-23 16:28:15,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1226720.0, ans=0.09899494936611666 2023-12-23 16:28:16,152 INFO [train.py:886] (1/4) Epoch 39, batch 2900, loss[loss=0.01155, audio_tagging_loss=0.01155, over 24750.00 frames. ], tot_loss[loss=0.0116, audio_tagging_loss=0.0116, over 4941659.97 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:28:16,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1226720.0, ans=0.07 2023-12-23 16:28:21,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1226720.0, ans=0.2 2023-12-23 16:28:49,055 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.85 vs. limit=15.0 2023-12-23 16:28:53,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1226920.0, ans=0.125 2023-12-23 16:28:54,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1226920.0, ans=0.125 2023-12-23 16:29:08,280 INFO [train.py:886] (1/4) Epoch 39, batch 2950, loss[loss=0.01048, audio_tagging_loss=0.01048, over 25000.00 frames. ], tot_loss[loss=0.0115, audio_tagging_loss=0.0115, over 4943315.71 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:29:11,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1227053.3333333333, ans=0.125 2023-12-23 16:29:28,874 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.270e+01 3.645e+01 3.777e+01 3.932e+01 4.663e+01, threshold=7.553e+01, percent-clipped=0.0 2023-12-23 16:29:36,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1227186.6666666667, ans=0.1 2023-12-23 16:29:39,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1227253.3333333333, ans=0.1 2023-12-23 16:29:58,852 INFO [train.py:886] (1/4) Epoch 39, batch 3000, loss[loss=0.0119, audio_tagging_loss=0.0119, over 25000.00 frames. ], tot_loss[loss=0.01145, audio_tagging_loss=0.01145, over 4946932.64 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:29:58,853 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 16:30:14,063 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.0673, 5.8946, 5.8356, 5.9656], device='cuda:1') 2023-12-23 16:30:16,867 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.6481, 3.1793, 4.1725, 3.8810], device='cuda:1') 2023-12-23 16:30:19,984 INFO [train.py:917] (1/4) Epoch 39, validation: loss=0.03462, audio_tagging_loss=0.03462, over 3737520.00 frames. 2023-12-23 16:30:19,984 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 16:30:20,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1227386.6666666667, ans=0.09899494936611666 2023-12-23 16:30:37,392 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.77 vs. limit=15.0 2023-12-23 16:31:03,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1227653.3333333333, ans=0.07 2023-12-23 16:31:10,896 INFO [train.py:886] (1/4) Epoch 39, batch 3050, loss[loss=0.00834, audio_tagging_loss=0.00834, over 24040.00 frames. ], tot_loss[loss=0.01149, audio_tagging_loss=0.01149, over 4947245.61 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:31:11,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1227720.0, ans=0.125 2023-12-23 16:31:12,793 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:31:23,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1227786.6666666667, ans=0.0 2023-12-23 16:31:32,790 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.327e+01 3.613e+01 3.797e+01 3.917e+01 4.495e+01, threshold=7.595e+01, percent-clipped=0.0 2023-12-23 16:31:42,423 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.41 vs. limit=15.0 2023-12-23 16:31:44,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1227920.0, ans=0.2 2023-12-23 16:32:03,072 INFO [train.py:886] (1/4) Epoch 39, batch 3100, loss[loss=0.01142, audio_tagging_loss=0.01142, over 25000.00 frames. ], tot_loss[loss=0.01156, audio_tagging_loss=0.01156, over 4953153.13 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:32:05,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1228053.3333333333, ans=0.5 2023-12-23 16:32:20,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.58 vs. limit=15.0 2023-12-23 16:32:32,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1228186.6666666667, ans=15.0 2023-12-23 16:32:43,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1228320.0, ans=0.0 2023-12-23 16:32:51,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1228320.0, ans=0.125 2023-12-23 16:32:53,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1228320.0, ans=0.1 2023-12-23 16:32:55,427 INFO [train.py:886] (1/4) Epoch 39, batch 3150, loss[loss=0.01539, audio_tagging_loss=0.01539, over 25000.00 frames. ], tot_loss[loss=0.01164, audio_tagging_loss=0.01164, over 4946983.83 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:33:10,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1228453.3333333333, ans=0.0 2023-12-23 16:33:16,707 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.313e+01 3.715e+01 3.835e+01 3.978e+01 4.506e+01, threshold=7.670e+01, percent-clipped=0.0 2023-12-23 16:33:21,935 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.64 vs. limit=8.0 2023-12-23 16:33:31,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1228586.6666666667, ans=0.125 2023-12-23 16:33:32,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1228586.6666666667, ans=0.125 2023-12-23 16:33:33,623 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.88 vs. limit=15.0 2023-12-23 16:33:36,426 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.40 vs. limit=10.0 2023-12-23 16:33:46,378 INFO [train.py:886] (1/4) Epoch 39, batch 3200, loss[loss=0.01045, audio_tagging_loss=0.01045, over 24750.00 frames. ], tot_loss[loss=0.01162, audio_tagging_loss=0.01162, over 4947587.56 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:33:49,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1228720.0, ans=0.07 2023-12-23 16:33:56,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1228786.6666666667, ans=0.125 2023-12-23 16:34:39,439 INFO [train.py:886] (1/4) Epoch 39, batch 3250, loss[loss=0.01542, audio_tagging_loss=0.01542, over 25000.00 frames. ], tot_loss[loss=0.01161, audio_tagging_loss=0.01161, over 4950518.03 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:35:00,580 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.139e+01 3.582e+01 3.732e+01 3.928e+01 4.508e+01, threshold=7.464e+01, percent-clipped=0.0 2023-12-23 16:35:31,249 INFO [train.py:886] (1/4) Epoch 39, batch 3300, loss[loss=0.01022, audio_tagging_loss=0.01022, over 25000.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4948103.43 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:35:53,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1229520.0, ans=0.2 2023-12-23 16:35:53,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1229520.0, ans=0.125 2023-12-23 16:35:54,934 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.87 vs. limit=12.0 2023-12-23 16:35:55,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1229520.0, ans=0.125 2023-12-23 16:36:01,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1229520.0, ans=0.1 2023-12-23 16:36:02,365 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.37 vs. limit=15.0 2023-12-23 16:36:05,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1229586.6666666667, ans=0.1 2023-12-23 16:36:09,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1229586.6666666667, ans=0.125 2023-12-23 16:36:15,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1229653.3333333333, ans=0.125 2023-12-23 16:36:19,088 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.75 vs. limit=15.0 2023-12-23 16:36:20,899 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.61 vs. limit=6.0 2023-12-23 16:36:22,429 INFO [train.py:886] (1/4) Epoch 39, batch 3350, loss[loss=0.01278, audio_tagging_loss=0.01278, over 25000.00 frames. ], tot_loss[loss=0.01154, audio_tagging_loss=0.01154, over 4948580.40 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:36:25,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1229720.0, ans=0.125 2023-12-23 16:36:30,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1229720.0, ans=0.125 2023-12-23 16:36:38,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1229786.6666666667, ans=0.2 2023-12-23 16:36:39,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1229786.6666666667, ans=0.1 2023-12-23 16:36:45,136 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.391e+01 3.649e+01 3.789e+01 3.930e+01 4.813e+01, threshold=7.578e+01, percent-clipped=0.0 2023-12-23 16:36:55,012 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.04 vs. limit=15.0 2023-12-23 16:36:55,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1229920.0, ans=0.0 2023-12-23 16:36:58,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1229920.0, ans=0.2 2023-12-23 16:37:09,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1229986.6666666667, ans=0.125 2023-12-23 16:37:13,998 INFO [train.py:886] (1/4) Epoch 39, batch 3400, loss[loss=0.0133, audio_tagging_loss=0.0133, over 25000.00 frames. ], tot_loss[loss=0.01152, audio_tagging_loss=0.01152, over 4950359.30 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:37:15,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1230053.3333333333, ans=0.1 2023-12-23 16:37:27,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1230120.0, ans=0.0 2023-12-23 16:37:28,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1230120.0, ans=0.125 2023-12-23 16:37:38,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.01 vs. limit=6.0 2023-12-23 16:37:48,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=1230253.3333333333, ans=22.5 2023-12-23 16:37:53,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1230253.3333333333, ans=0.0 2023-12-23 16:37:55,919 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.90 vs. limit=8.0 2023-12-23 16:37:56,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1230320.0, ans=0.125 2023-12-23 16:38:00,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=1230320.0, ans=0.1 2023-12-23 16:38:06,192 INFO [train.py:886] (1/4) Epoch 39, batch 3450, loss[loss=0.01194, audio_tagging_loss=0.01194, over 24750.00 frames. ], tot_loss[loss=0.01166, audio_tagging_loss=0.01166, over 4946516.97 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:38:13,800 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.02 vs. limit=15.0 2023-12-23 16:38:15,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1230453.3333333333, ans=0.1 2023-12-23 16:38:26,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1230520.0, ans=0.0 2023-12-23 16:38:27,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1230520.0, ans=0.1 2023-12-23 16:38:28,016 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.315e+01 3.698e+01 3.845e+01 3.983e+01 4.520e+01, threshold=7.691e+01, percent-clipped=0.0 2023-12-23 16:38:42,280 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.03 vs. limit=12.0 2023-12-23 16:38:58,264 INFO [train.py:886] (1/4) Epoch 39, batch 3500, loss[loss=0.01243, audio_tagging_loss=0.01243, over 24750.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4942819.67 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:38:58,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1230720.0, ans=0.0 2023-12-23 16:39:27,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1230853.3333333333, ans=0.125 2023-12-23 16:39:31,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1230920.0, ans=0.125 2023-12-23 16:39:43,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1230986.6666666667, ans=0.2 2023-12-23 16:39:49,923 INFO [train.py:886] (1/4) Epoch 39, batch 3550, loss[loss=0.01108, audio_tagging_loss=0.01108, over 25000.00 frames. ], tot_loss[loss=0.01157, audio_tagging_loss=0.01157, over 4940484.10 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:40:11,399 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.150e+01 3.585e+01 3.771e+01 3.949e+01 4.246e+01, threshold=7.542e+01, percent-clipped=0.0 2023-12-23 16:40:13,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1231186.6666666667, ans=0.1 2023-12-23 16:40:13,703 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.46 vs. limit=15.0 2023-12-23 16:40:18,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=1231186.6666666667, ans=0.1 2023-12-23 16:40:23,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1231253.3333333333, ans=0.125 2023-12-23 16:40:25,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1231253.3333333333, ans=0.1 2023-12-23 16:40:41,492 INFO [train.py:886] (1/4) Epoch 39, batch 3600, loss[loss=0.01064, audio_tagging_loss=0.01064, over 25000.00 frames. ], tot_loss[loss=0.01153, audio_tagging_loss=0.01153, over 4941043.74 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:40:42,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1231386.6666666667, ans=0.0 2023-12-23 16:40:49,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=1231386.6666666667, ans=15.0 2023-12-23 16:40:55,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1231453.3333333333, ans=0.1 2023-12-23 16:40:59,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1231453.3333333333, ans=0.035 2023-12-23 16:41:33,851 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.14 vs. limit=15.0 2023-12-23 16:41:34,303 INFO [train.py:886] (1/4) Epoch 39, batch 3650, loss[loss=0.01112, audio_tagging_loss=0.01112, over 25000.00 frames. ], tot_loss[loss=0.01148, audio_tagging_loss=0.01148, over 4947754.77 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 64.0 2023-12-23 16:41:56,216 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.259e+01 3.628e+01 3.795e+01 4.011e+01 5.130e+01, threshold=7.590e+01, percent-clipped=0.0 2023-12-23 16:42:07,927 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.53 vs. limit=22.5 2023-12-23 16:42:20,076 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.08 vs. limit=10.0 2023-12-23 16:42:26,725 INFO [train.py:886] (1/4) Epoch 39, batch 3700, loss[loss=0.01204, audio_tagging_loss=0.01204, over 25000.00 frames. ], tot_loss[loss=0.01146, audio_tagging_loss=0.01146, over 4955889.28 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 64.0 2023-12-23 16:42:26,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1232053.3333333333, ans=0.0 2023-12-23 16:42:39,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1232120.0, ans=0.2 2023-12-23 16:42:45,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1232186.6666666667, ans=0.125 2023-12-23 16:42:57,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1232253.3333333333, ans=0.125 2023-12-23 16:43:15,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1232320.0, ans=0.125 2023-12-23 16:43:17,091 INFO [train.py:886] (1/4) Epoch 39, batch 3750, loss[loss=0.01393, audio_tagging_loss=0.01393, over 24750.00 frames. ], tot_loss[loss=0.01154, audio_tagging_loss=0.01154, over 4949927.41 frames. ], batch size: 99, lr: 2.74e-03, grad_scale: 64.0 2023-12-23 16:43:26,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1232386.6666666667, ans=0.0 2023-12-23 16:43:29,298 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.55 vs. limit=6.0 2023-12-23 16:43:39,551 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.207e+01 3.636e+01 3.779e+01 3.931e+01 4.643e+01, threshold=7.558e+01, percent-clipped=0.0 2023-12-23 16:43:59,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1232653.3333333333, ans=0.1 2023-12-23 16:44:10,052 INFO [train.py:886] (1/4) Epoch 39, batch 3800, loss[loss=0.01392, audio_tagging_loss=0.01392, over 24750.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4945198.91 frames. ], batch size: 99, lr: 2.74e-03, grad_scale: 64.0 2023-12-23 16:44:24,352 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.86 vs. limit=12.0 2023-12-23 16:44:51,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1232986.6666666667, ans=0.125 2023-12-23 16:44:57,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1232986.6666666667, ans=0.125 2023-12-23 16:44:59,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1232986.6666666667, ans=0.125 2023-12-23 16:45:01,449 INFO [train.py:886] (1/4) Epoch 39, batch 3850, loss[loss=0.009818, audio_tagging_loss=0.009818, over 24043.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4943820.23 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:45:23,679 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.220e+01 3.590e+01 3.789e+01 3.957e+01 4.562e+01, threshold=7.578e+01, percent-clipped=0.0 2023-12-23 16:45:26,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1233186.6666666667, ans=0.125 2023-12-23 16:45:42,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1233320.0, ans=0.125 2023-12-23 16:45:53,261 INFO [train.py:886] (1/4) Epoch 39, batch 3900, loss[loss=0.008903, audio_tagging_loss=0.008903, over 25000.00 frames. ], tot_loss[loss=0.01148, audio_tagging_loss=0.01148, over 4949077.46 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:45:53,605 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.55 vs. limit=12.0 2023-12-23 16:46:00,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1233386.6666666667, ans=0.125 2023-12-23 16:46:08,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1233453.3333333333, ans=0.125 2023-12-23 16:46:13,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1233520.0, ans=0.1 2023-12-23 16:46:15,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1233520.0, ans=0.2 2023-12-23 16:46:23,194 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.47 vs. limit=10.0 2023-12-23 16:46:27,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1233586.6666666667, ans=0.035 2023-12-23 16:46:28,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.52 vs. limit=15.0 2023-12-23 16:46:43,918 INFO [train.py:886] (1/4) Epoch 39, batch 3950, loss[loss=0.0129, audio_tagging_loss=0.0129, over 25000.00 frames. ], tot_loss[loss=0.01141, audio_tagging_loss=0.01141, over 4953689.18 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:46:50,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1233720.0, ans=0.5 2023-12-23 16:47:06,328 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.227e+01 3.602e+01 3.745e+01 4.012e+01 4.573e+01, threshold=7.490e+01, percent-clipped=0.0 2023-12-23 16:47:06,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1233853.3333333333, ans=0.0 2023-12-23 16:47:34,896 INFO [train.py:886] (1/4) Epoch 39, batch 4000, loss[loss=0.01302, audio_tagging_loss=0.01302, over 25000.00 frames. ], tot_loss[loss=0.01142, audio_tagging_loss=0.01142, over 4948928.53 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:47:35,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1234053.3333333333, ans=0.0 2023-12-23 16:47:47,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=1234120.0, ans=22.5 2023-12-23 16:47:48,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1234120.0, ans=0.125 2023-12-23 16:47:55,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1234186.6666666667, ans=0.125 2023-12-23 16:47:58,878 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.61 vs. limit=10.0 2023-12-23 16:48:27,944 INFO [train.py:886] (1/4) Epoch 39, batch 4050, loss[loss=0.01525, audio_tagging_loss=0.01525, over 24750.00 frames. ], tot_loss[loss=0.01155, audio_tagging_loss=0.01155, over 4954611.23 frames. ], batch size: 99, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:48:40,810 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.25 vs. limit=15.0 2023-12-23 16:48:44,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1234453.3333333333, ans=0.2 2023-12-23 16:48:50,250 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.337e+01 3.639e+01 3.792e+01 4.036e+01 4.478e+01, threshold=7.585e+01, percent-clipped=0.0 2023-12-23 16:48:56,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1234520.0, ans=0.125 2023-12-23 16:49:02,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1234586.6666666667, ans=0.2 2023-12-23 16:49:10,445 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.18 vs. limit=10.0 2023-12-23 16:49:13,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1234653.3333333333, ans=0.1 2023-12-23 16:49:16,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1234653.3333333333, ans=0.1 2023-12-23 16:49:18,321 INFO [train.py:886] (1/4) Epoch 39, batch 4100, loss[loss=0.01191, audio_tagging_loss=0.01191, over 25000.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4952068.76 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:49:25,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1234720.0, ans=0.09899494936611666 2023-12-23 16:49:35,058 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.49 vs. limit=6.0 2023-12-23 16:49:36,316 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.36 vs. limit=22.5 2023-12-23 16:49:46,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1234853.3333333333, ans=0.0 2023-12-23 16:50:10,280 INFO [train.py:886] (1/4) Epoch 39, batch 4150, loss[loss=0.01366, audio_tagging_loss=0.01366, over 24750.00 frames. ], tot_loss[loss=0.01164, audio_tagging_loss=0.01164, over 4952409.87 frames. ], batch size: 99, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:50:33,719 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.167e+01 3.682e+01 3.809e+01 3.972e+01 4.566e+01, threshold=7.618e+01, percent-clipped=0.0 2023-12-23 16:50:39,106 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.90 vs. limit=22.5 2023-12-23 16:50:45,672 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.28 vs. limit=22.5 2023-12-23 16:50:48,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1235253.3333333333, ans=0.0 2023-12-23 16:51:02,400 INFO [train.py:886] (1/4) Epoch 39, batch 4200, loss[loss=0.01091, audio_tagging_loss=0.01091, over 24750.00 frames. ], tot_loss[loss=0.0116, audio_tagging_loss=0.0116, over 4944471.50 frames. ], batch size: 99, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:51:02,678 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:51:07,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1235386.6666666667, ans=0.125 2023-12-23 16:51:12,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1235453.3333333333, ans=0.125 2023-12-23 16:51:16,763 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:51:19,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1235453.3333333333, ans=0.125 2023-12-23 16:51:19,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1235453.3333333333, ans=0.125 2023-12-23 16:51:24,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1235520.0, ans=0.0 2023-12-23 16:51:24,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1235520.0, ans=0.0 2023-12-23 16:51:34,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1235586.6666666667, ans=0.125 2023-12-23 16:51:50,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1235653.3333333333, ans=0.125 2023-12-23 16:51:54,131 INFO [train.py:886] (1/4) Epoch 39, batch 4250, loss[loss=0.01354, audio_tagging_loss=0.01354, over 25000.00 frames. ], tot_loss[loss=0.01157, audio_tagging_loss=0.01157, over 4942080.76 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:52:00,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1235720.0, ans=0.125 2023-12-23 16:52:17,122 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.199e+01 3.604e+01 3.815e+01 3.941e+01 4.499e+01, threshold=7.630e+01, percent-clipped=0.0 2023-12-23 16:52:19,608 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.06 vs. limit=15.0 2023-12-23 16:52:24,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1235920.0, ans=0.125 2023-12-23 16:52:40,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1235986.6666666667, ans=0.125 2023-12-23 16:52:46,814 INFO [train.py:886] (1/4) Epoch 39, batch 4300, loss[loss=0.0126, audio_tagging_loss=0.0126, over 24750.00 frames. ], tot_loss[loss=0.01162, audio_tagging_loss=0.01162, over 4947642.65 frames. ], batch size: 99, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:52:51,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1236053.3333333333, ans=0.125 2023-12-23 16:52:53,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1236053.3333333333, ans=0.125 2023-12-23 16:53:12,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1236186.6666666667, ans=0.125 2023-12-23 16:53:26,965 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.28 vs. limit=10.0 2023-12-23 16:53:37,804 INFO [train.py:886] (1/4) Epoch 39, batch 4350, loss[loss=0.01295, audio_tagging_loss=0.01295, over 25000.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4956728.65 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:53:42,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1236386.6666666667, ans=0.2 2023-12-23 16:53:54,481 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.58 vs. limit=15.0 2023-12-23 16:53:57,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1236453.3333333333, ans=0.0 2023-12-23 16:54:00,707 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.228e+01 3.613e+01 3.861e+01 4.059e+01 4.961e+01, threshold=7.722e+01, percent-clipped=0.0 2023-12-23 16:54:01,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1236520.0, ans=0.125 2023-12-23 16:54:05,510 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.56 vs. limit=12.0 2023-12-23 16:54:06,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1236520.0, ans=0.125 2023-12-23 16:54:20,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1236653.3333333333, ans=0.2 2023-12-23 16:54:29,098 INFO [train.py:886] (1/4) Epoch 39, batch 4400, loss[loss=0.01163, audio_tagging_loss=0.01163, over 24750.00 frames. ], tot_loss[loss=0.01177, audio_tagging_loss=0.01177, over 4953364.52 frames. ], batch size: 99, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:55:06,516 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.82 vs. limit=15.0 2023-12-23 16:55:09,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1236920.0, ans=0.1 2023-12-23 16:55:09,471 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.85 vs. limit=15.0 2023-12-23 16:55:20,782 INFO [train.py:886] (1/4) Epoch 39, batch 4450, loss[loss=0.01177, audio_tagging_loss=0.01177, over 25000.00 frames. ], tot_loss[loss=0.01182, audio_tagging_loss=0.01182, over 4954579.85 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:55:33,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1237120.0, ans=0.0 2023-12-23 16:55:34,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1237120.0, ans=0.0 2023-12-23 16:55:44,357 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.339e+01 3.658e+01 3.824e+01 3.990e+01 4.644e+01, threshold=7.648e+01, percent-clipped=0.0 2023-12-23 16:55:51,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1237253.3333333333, ans=0.0 2023-12-23 16:56:08,965 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.18 vs. limit=15.0 2023-12-23 16:56:13,238 INFO [train.py:886] (1/4) Epoch 39, batch 4500, loss[loss=0.01064, audio_tagging_loss=0.01064, over 25000.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4958366.53 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:56:46,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1237586.6666666667, ans=0.125 2023-12-23 16:56:55,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1237653.3333333333, ans=0.0 2023-12-23 16:57:05,497 INFO [train.py:886] (1/4) Epoch 39, batch 4550, loss[loss=0.01291, audio_tagging_loss=0.01291, over 25000.00 frames. ], tot_loss[loss=0.01155, audio_tagging_loss=0.01155, over 4954362.86 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:57:07,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1237720.0, ans=0.0 2023-12-23 16:57:27,609 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.279e+01 3.606e+01 3.763e+01 4.003e+01 4.650e+01, threshold=7.525e+01, percent-clipped=0.0 2023-12-23 16:57:37,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1237920.0, ans=0.0 2023-12-23 16:57:50,767 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.19 vs. limit=15.0 2023-12-23 16:57:52,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1237986.6666666667, ans=0.0 2023-12-23 16:57:56,800 INFO [train.py:886] (1/4) Epoch 39, batch 4600, loss[loss=0.009729, audio_tagging_loss=0.009729, over 25000.00 frames. ], tot_loss[loss=0.01156, audio_tagging_loss=0.01156, over 4953482.80 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:58:27,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1238253.3333333333, ans=0.2 2023-12-23 16:58:33,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1238253.3333333333, ans=0.125 2023-12-23 16:58:48,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1238386.6666666667, ans=0.2 2023-12-23 16:58:48,784 INFO [train.py:886] (1/4) Epoch 39, batch 4650, loss[loss=0.01095, audio_tagging_loss=0.01095, over 25000.00 frames. ], tot_loss[loss=0.01153, audio_tagging_loss=0.01153, over 4955476.72 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:59:05,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1238453.3333333333, ans=0.125 2023-12-23 16:59:08,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1238520.0, ans=0.05 2023-12-23 16:59:09,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1238520.0, ans=0.0 2023-12-23 16:59:11,644 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.377e+01 3.683e+01 3.896e+01 4.120e+01 5.056e+01, threshold=7.792e+01, percent-clipped=0.0 2023-12-23 16:59:23,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1238586.6666666667, ans=0.125 2023-12-23 16:59:24,973 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.81 vs. limit=22.5 2023-12-23 16:59:26,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.48 vs. limit=6.0 2023-12-23 16:59:29,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1238653.3333333333, ans=0.0 2023-12-23 16:59:36,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.64 vs. limit=6.0 2023-12-23 16:59:40,148 INFO [train.py:886] (1/4) Epoch 39, batch 4700, loss[loss=0.01342, audio_tagging_loss=0.01342, over 24750.00 frames. ], tot_loss[loss=0.01161, audio_tagging_loss=0.01161, over 4958913.90 frames. ], batch size: 99, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:59:40,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1238720.0, ans=0.125 2023-12-23 16:59:45,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.74 vs. limit=15.0 2023-12-23 17:00:06,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1238853.3333333333, ans=0.125 2023-12-23 17:00:11,937 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.79 vs. limit=12.0 2023-12-23 17:00:20,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1238986.6666666667, ans=0.125 2023-12-23 17:00:22,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1238986.6666666667, ans=0.125 2023-12-23 17:00:27,132 INFO [train.py:886] (1/4) Epoch 39, batch 4750, loss[loss=0.01218, audio_tagging_loss=0.01218, over 24750.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4954965.59 frames. ], batch size: 99, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 17:00:38,793 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.19 vs. limit=15.0 2023-12-23 17:01:01,908 INFO [train.py:886] (1/4) Epoch 40, batch 0, loss[loss=0.02588, audio_tagging_loss=0.02588, over 23951.00 frames. ], tot_loss[loss=0.02588, audio_tagging_loss=0.02588, over 23951.00 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:01:01,908 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 17:01:23,300 INFO [train.py:917] (1/4) Epoch 40, validation: loss=0.03439, audio_tagging_loss=0.03439, over 3737520.00 frames. 2023-12-23 17:01:23,300 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 17:01:25,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1239160.0, ans=0.1 2023-12-23 17:01:27,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=4.25 vs. limit=15.0 2023-12-23 17:01:28,896 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.047e+01 3.718e+01 3.892e+01 4.077e+01 1.138e+02, threshold=7.784e+01, percent-clipped=4.0 2023-12-23 17:01:38,718 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.03 vs. limit=15.0 2023-12-23 17:01:44,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1239293.3333333333, ans=0.0 2023-12-23 17:01:48,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1239293.3333333333, ans=0.125 2023-12-23 17:01:55,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1239360.0, ans=0.1 2023-12-23 17:01:55,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1239360.0, ans=0.125 2023-12-23 17:01:57,009 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.50 vs. limit=15.0 2023-12-23 17:02:14,205 INFO [train.py:886] (1/4) Epoch 40, batch 50, loss[loss=0.01546, audio_tagging_loss=0.01546, over 25000.00 frames. ], tot_loss[loss=0.01813, audio_tagging_loss=0.01813, over 1114624.56 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:02:19,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1239493.3333333333, ans=0.125 2023-12-23 17:02:42,606 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2023-12-23 17:02:43,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=1239626.6666666667, ans=0.1 2023-12-23 17:02:44,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1239693.3333333333, ans=0.1 2023-12-23 17:03:06,241 INFO [train.py:886] (1/4) Epoch 40, batch 100, loss[loss=0.01536, audio_tagging_loss=0.01536, over 25000.00 frames. ], tot_loss[loss=0.01574, audio_tagging_loss=0.01574, over 1969856.48 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:03:06,762 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.79 vs. limit=15.0 2023-12-23 17:03:11,825 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.790e+01 4.300e+01 4.589e+01 5.007e+01 8.087e+01, threshold=9.178e+01, percent-clipped=4.0 2023-12-23 17:03:14,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1239893.3333333333, ans=0.05 2023-12-23 17:03:21,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1239893.3333333333, ans=0.07 2023-12-23 17:03:47,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1240093.3333333333, ans=0.0 2023-12-23 17:03:53,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1240093.3333333333, ans=0.0 2023-12-23 17:03:56,510 INFO [train.py:886] (1/4) Epoch 40, batch 150, loss[loss=0.01319, audio_tagging_loss=0.01319, over 25000.00 frames. ], tot_loss[loss=0.01451, audio_tagging_loss=0.01451, over 2634919.07 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:03:56,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1240160.0, ans=0.0 2023-12-23 17:03:59,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1240160.0, ans=0.125 2023-12-23 17:04:11,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1240226.6666666667, ans=0.0 2023-12-23 17:04:36,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1240360.0, ans=0.0 2023-12-23 17:04:48,685 INFO [train.py:886] (1/4) Epoch 40, batch 200, loss[loss=0.01291, audio_tagging_loss=0.01291, over 25000.00 frames. ], tot_loss[loss=0.01356, audio_tagging_loss=0.01356, over 3154730.12 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:04:55,107 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.318e+01 3.719e+01 3.873e+01 4.042e+01 6.291e+01, threshold=7.746e+01, percent-clipped=0.0 2023-12-23 17:04:58,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1240560.0, ans=0.0 2023-12-23 17:05:02,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=1240560.0, ans=10.0 2023-12-23 17:05:04,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1240560.0, ans=0.0 2023-12-23 17:05:13,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1240626.6666666667, ans=0.125 2023-12-23 17:05:23,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1240693.3333333333, ans=0.125 2023-12-23 17:05:39,537 INFO [train.py:886] (1/4) Epoch 40, batch 250, loss[loss=0.01084, audio_tagging_loss=0.01084, over 25000.00 frames. ], tot_loss[loss=0.01299, audio_tagging_loss=0.01299, over 3552574.30 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:05:42,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1240826.6666666667, ans=0.1 2023-12-23 17:05:45,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1240826.6666666667, ans=0.035 2023-12-23 17:05:49,582 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:06:04,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1240960.0, ans=0.125 2023-12-23 17:06:16,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1241026.6666666667, ans=0.025 2023-12-23 17:06:25,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1241093.3333333333, ans=0.0 2023-12-23 17:06:32,182 INFO [train.py:886] (1/4) Epoch 40, batch 300, loss[loss=0.01012, audio_tagging_loss=0.01012, over 24750.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 3866096.85 frames. ], batch size: 99, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:06:37,793 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.364e+01 3.703e+01 3.886e+01 3.999e+01 4.717e+01, threshold=7.771e+01, percent-clipped=0.0 2023-12-23 17:06:40,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1241226.6666666667, ans=0.125 2023-12-23 17:06:56,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1241293.3333333333, ans=0.0 2023-12-23 17:07:02,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1241293.3333333333, ans=0.2 2023-12-23 17:07:05,861 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.80 vs. limit=15.0 2023-12-23 17:07:23,926 INFO [train.py:886] (1/4) Epoch 40, batch 350, loss[loss=0.01133, audio_tagging_loss=0.01133, over 25000.00 frames. ], tot_loss[loss=0.0125, audio_tagging_loss=0.0125, over 4099058.66 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:07:30,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1241493.3333333333, ans=0.0 2023-12-23 17:07:34,243 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.63 vs. limit=15.0 2023-12-23 17:08:02,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1241693.3333333333, ans=0.125 2023-12-23 17:08:02,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1241693.3333333333, ans=0.05 2023-12-23 17:08:03,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1241693.3333333333, ans=0.125 2023-12-23 17:08:12,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1241760.0, ans=0.1 2023-12-23 17:08:15,531 INFO [train.py:886] (1/4) Epoch 40, batch 400, loss[loss=0.01049, audio_tagging_loss=0.01049, over 24750.00 frames. ], tot_loss[loss=0.01221, audio_tagging_loss=0.01221, over 4285669.01 frames. ], batch size: 99, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:08:21,793 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.317e+01 3.642e+01 3.774e+01 3.991e+01 4.784e+01, threshold=7.549e+01, percent-clipped=0.0 2023-12-23 17:08:23,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1241826.6666666667, ans=0.0 2023-12-23 17:08:27,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1241893.3333333333, ans=0.0 2023-12-23 17:08:45,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1242026.6666666667, ans=0.2 2023-12-23 17:08:46,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1242026.6666666667, ans=0.0 2023-12-23 17:08:47,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1242026.6666666667, ans=0.0 2023-12-23 17:09:08,243 INFO [train.py:886] (1/4) Epoch 40, batch 450, loss[loss=0.01361, audio_tagging_loss=0.01361, over 24750.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4433986.67 frames. ], batch size: 99, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:09:46,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1242360.0, ans=0.2 2023-12-23 17:09:50,752 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.11 vs. limit=12.0 2023-12-23 17:09:53,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1242426.6666666667, ans=0.07 2023-12-23 17:09:58,579 INFO [train.py:886] (1/4) Epoch 40, batch 500, loss[loss=0.007887, audio_tagging_loss=0.007887, over 24043.00 frames. ], tot_loss[loss=0.01178, audio_tagging_loss=0.01178, over 4547286.38 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:10:04,902 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.282e+01 3.617e+01 3.778e+01 3.928e+01 4.794e+01, threshold=7.557e+01, percent-clipped=0.0 2023-12-23 17:10:06,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1242493.3333333333, ans=0.07 2023-12-23 17:10:08,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1242560.0, ans=0.0 2023-12-23 17:10:09,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1242560.0, ans=0.125 2023-12-23 17:10:16,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1242560.0, ans=0.0 2023-12-23 17:10:38,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1242760.0, ans=0.0 2023-12-23 17:10:40,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1242760.0, ans=0.125 2023-12-23 17:10:40,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1242760.0, ans=0.2 2023-12-23 17:10:48,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1242760.0, ans=0.125 2023-12-23 17:10:48,228 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.95 vs. limit=15.0 2023-12-23 17:10:50,637 INFO [train.py:886] (1/4) Epoch 40, batch 550, loss[loss=0.009801, audio_tagging_loss=0.009801, over 24020.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4636899.74 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:10:57,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1242826.6666666667, ans=0.0 2023-12-23 17:11:01,159 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.52 vs. limit=22.5 2023-12-23 17:11:02,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1242893.3333333333, ans=0.125 2023-12-23 17:11:06,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1242893.3333333333, ans=0.1 2023-12-23 17:11:23,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1243026.6666666667, ans=0.0 2023-12-23 17:11:28,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1243026.6666666667, ans=0.125 2023-12-23 17:11:33,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=1243093.3333333333, ans=15.0 2023-12-23 17:11:36,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1243093.3333333333, ans=0.125 2023-12-23 17:11:41,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1243160.0, ans=0.0 2023-12-23 17:11:42,147 INFO [train.py:886] (1/4) Epoch 40, batch 600, loss[loss=0.01211, audio_tagging_loss=0.01211, over 24750.00 frames. ], tot_loss[loss=0.01173, audio_tagging_loss=0.01173, over 4705645.51 frames. ], batch size: 99, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:11:42,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1243160.0, ans=10.0 2023-12-23 17:11:42,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1243160.0, ans=0.125 2023-12-23 17:11:47,796 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.374e+01 3.661e+01 3.804e+01 3.984e+01 4.384e+01, threshold=7.608e+01, percent-clipped=0.0 2023-12-23 17:11:56,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1243226.6666666667, ans=0.0 2023-12-23 17:12:03,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1243293.3333333333, ans=0.0 2023-12-23 17:12:14,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1243360.0, ans=0.125 2023-12-23 17:12:28,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1243426.6666666667, ans=0.95 2023-12-23 17:12:34,241 INFO [train.py:886] (1/4) Epoch 40, batch 650, loss[loss=0.0128, audio_tagging_loss=0.0128, over 24750.00 frames. ], tot_loss[loss=0.01181, audio_tagging_loss=0.01181, over 4749502.48 frames. ], batch size: 99, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:12:46,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1243560.0, ans=0.125 2023-12-23 17:12:48,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1243560.0, ans=0.0 2023-12-23 17:12:52,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=1243560.0, ans=0.2 2023-12-23 17:13:00,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1243626.6666666667, ans=0.0 2023-12-23 17:13:01,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1243626.6666666667, ans=0.1 2023-12-23 17:13:10,605 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.52 vs. limit=15.0 2023-12-23 17:13:14,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1243760.0, ans=0.125 2023-12-23 17:13:18,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1243760.0, ans=0.125 2023-12-23 17:13:25,778 INFO [train.py:886] (1/4) Epoch 40, batch 700, loss[loss=0.01199, audio_tagging_loss=0.01199, over 24750.00 frames. ], tot_loss[loss=0.01178, audio_tagging_loss=0.01178, over 4786681.95 frames. ], batch size: 99, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:13:32,122 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.317e+01 3.657e+01 3.830e+01 4.035e+01 4.623e+01, threshold=7.660e+01, percent-clipped=0.0 2023-12-23 17:14:15,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1244093.3333333333, ans=0.1 2023-12-23 17:14:18,463 INFO [train.py:886] (1/4) Epoch 40, batch 750, loss[loss=0.01249, audio_tagging_loss=0.01249, over 25000.00 frames. ], tot_loss[loss=0.01166, audio_tagging_loss=0.01166, over 4825817.89 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:14:34,889 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.14 vs. limit=15.0 2023-12-23 17:14:41,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1244293.3333333333, ans=0.0 2023-12-23 17:14:51,991 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:14:54,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1244360.0, ans=0.95 2023-12-23 17:15:03,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1244426.6666666667, ans=0.125 2023-12-23 17:15:03,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1244426.6666666667, ans=0.1 2023-12-23 17:15:09,549 INFO [train.py:886] (1/4) Epoch 40, batch 800, loss[loss=0.01328, audio_tagging_loss=0.01328, over 25000.00 frames. ], tot_loss[loss=0.01163, audio_tagging_loss=0.01163, over 4853334.22 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:15:16,527 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.149e+01 3.626e+01 3.799e+01 3.971e+01 4.663e+01, threshold=7.598e+01, percent-clipped=0.0 2023-12-23 17:15:21,258 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:15:46,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1244693.3333333333, ans=0.125 2023-12-23 17:15:56,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1244760.0, ans=0.05 2023-12-23 17:15:59,767 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.03 vs. limit=12.0 2023-12-23 17:16:01,963 INFO [train.py:886] (1/4) Epoch 40, batch 850, loss[loss=0.009816, audio_tagging_loss=0.009816, over 25000.00 frames. ], tot_loss[loss=0.01155, audio_tagging_loss=0.01155, over 4874739.54 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:16:11,597 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.60 vs. limit=10.0 2023-12-23 17:16:23,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1244960.0, ans=0.125 2023-12-23 17:16:25,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1244960.0, ans=0.125 2023-12-23 17:16:40,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1245026.6666666667, ans=0.125 2023-12-23 17:16:42,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1245093.3333333333, ans=0.0 2023-12-23 17:16:54,245 INFO [train.py:886] (1/4) Epoch 40, batch 900, loss[loss=0.01535, audio_tagging_loss=0.01535, over 25000.00 frames. ], tot_loss[loss=0.01162, audio_tagging_loss=0.01162, over 4885871.76 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 32.0 2023-12-23 17:17:00,646 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.293e+01 3.650e+01 3.794e+01 3.949e+01 4.349e+01, threshold=7.587e+01, percent-clipped=0.0 2023-12-23 17:17:13,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1245293.3333333333, ans=0.125 2023-12-23 17:17:18,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1245293.3333333333, ans=0.1 2023-12-23 17:17:19,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1245293.3333333333, ans=0.0 2023-12-23 17:17:27,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=1245360.0, ans=0.1 2023-12-23 17:17:28,175 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.60 vs. limit=15.0 2023-12-23 17:17:28,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1245360.0, ans=0.0 2023-12-23 17:17:39,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1245426.6666666667, ans=0.1 2023-12-23 17:17:46,209 INFO [train.py:886] (1/4) Epoch 40, batch 950, loss[loss=0.01009, audio_tagging_loss=0.01009, over 24750.00 frames. ], tot_loss[loss=0.01166, audio_tagging_loss=0.01166, over 4892078.51 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 32.0 2023-12-23 17:17:59,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1245560.0, ans=0.2 2023-12-23 17:18:05,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1245560.0, ans=0.0 2023-12-23 17:18:21,068 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=15.0 2023-12-23 17:18:28,488 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.75 vs. limit=22.5 2023-12-23 17:18:37,909 INFO [train.py:886] (1/4) Epoch 40, batch 1000, loss[loss=0.01007, audio_tagging_loss=0.01007, over 24750.00 frames. ], tot_loss[loss=0.01172, audio_tagging_loss=0.01172, over 4899572.95 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 32.0 2023-12-23 17:18:43,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1245826.6666666667, ans=0.2 2023-12-23 17:18:44,300 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.312e+01 3.606e+01 3.769e+01 4.019e+01 4.543e+01, threshold=7.537e+01, percent-clipped=0.0 2023-12-23 17:19:05,525 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=15.0 2023-12-23 17:19:13,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1246026.6666666667, ans=0.125 2023-12-23 17:19:17,830 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=12.0 2023-12-23 17:19:21,962 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.56 vs. limit=12.0 2023-12-23 17:19:28,907 INFO [train.py:886] (1/4) Epoch 40, batch 1050, loss[loss=0.01072, audio_tagging_loss=0.01072, over 24750.00 frames. ], tot_loss[loss=0.01166, audio_tagging_loss=0.01166, over 4911177.57 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:19:39,940 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:19:57,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1246293.3333333333, ans=0.125 2023-12-23 17:20:07,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1246360.0, ans=0.125 2023-12-23 17:20:21,900 INFO [train.py:886] (1/4) Epoch 40, batch 1100, loss[loss=0.01172, audio_tagging_loss=0.01172, over 25000.00 frames. ], tot_loss[loss=0.01162, audio_tagging_loss=0.01162, over 4921388.53 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:20:27,710 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.272e+01 3.694e+01 3.833e+01 4.003e+01 4.303e+01, threshold=7.667e+01, percent-clipped=0.0 2023-12-23 17:20:33,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1246560.0, ans=0.1 2023-12-23 17:20:48,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1246626.6666666667, ans=0.0 2023-12-23 17:21:14,031 INFO [train.py:886] (1/4) Epoch 40, batch 1150, loss[loss=0.009098, audio_tagging_loss=0.009098, over 24086.00 frames. ], tot_loss[loss=0.01153, audio_tagging_loss=0.01153, over 4932467.38 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:21:14,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1246826.6666666667, ans=0.2 2023-12-23 17:21:18,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1246826.6666666667, ans=0.0 2023-12-23 17:21:22,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1246826.6666666667, ans=0.0 2023-12-23 17:21:26,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1246893.3333333333, ans=0.09899494936611666 2023-12-23 17:21:30,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1246893.3333333333, ans=0.125 2023-12-23 17:21:37,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1246960.0, ans=0.2 2023-12-23 17:21:40,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1246960.0, ans=0.09899494936611666 2023-12-23 17:21:50,473 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.53 vs. limit=8.0 2023-12-23 17:22:05,588 INFO [train.py:886] (1/4) Epoch 40, batch 1200, loss[loss=0.01322, audio_tagging_loss=0.01322, over 25000.00 frames. ], tot_loss[loss=0.01157, audio_tagging_loss=0.01157, over 4943714.52 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:22:11,284 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.252e+01 3.680e+01 3.818e+01 4.022e+01 4.894e+01, threshold=7.635e+01, percent-clipped=0.0 2023-12-23 17:22:36,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1247360.0, ans=10.0 2023-12-23 17:22:38,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1247360.0, ans=0.0 2023-12-23 17:22:42,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1247360.0, ans=0.125 2023-12-23 17:22:47,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1247426.6666666667, ans=0.1 2023-12-23 17:22:57,008 INFO [train.py:886] (1/4) Epoch 40, batch 1250, loss[loss=0.01159, audio_tagging_loss=0.01159, over 24750.00 frames. ], tot_loss[loss=0.01171, audio_tagging_loss=0.01171, over 4937826.22 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:22:57,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1247493.3333333333, ans=0.2 2023-12-23 17:22:58,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1247493.3333333333, ans=0.125 2023-12-23 17:22:58,536 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.12 vs. limit=22.5 2023-12-23 17:23:02,964 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2023-12-23 17:23:17,119 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=22.5 2023-12-23 17:23:42,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1247760.0, ans=0.125 2023-12-23 17:23:42,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1247760.0, ans=0.125 2023-12-23 17:23:46,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1247760.0, ans=0.2 2023-12-23 17:23:48,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=1247826.6666666667, ans=15.0 2023-12-23 17:23:48,810 INFO [train.py:886] (1/4) Epoch 40, batch 1300, loss[loss=0.01097, audio_tagging_loss=0.01097, over 24095.00 frames. ], tot_loss[loss=0.01176, audio_tagging_loss=0.01176, over 4934014.20 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:23:54,415 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.303e+01 3.637e+01 3.803e+01 3.958e+01 5.134e+01, threshold=7.605e+01, percent-clipped=0.0 2023-12-23 17:24:02,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1247893.3333333333, ans=0.125 2023-12-23 17:24:11,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1247960.0, ans=0.0 2023-12-23 17:24:15,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1247960.0, ans=0.1 2023-12-23 17:24:15,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1247960.0, ans=0.125 2023-12-23 17:24:30,021 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=22.5 2023-12-23 17:24:41,340 INFO [train.py:886] (1/4) Epoch 40, batch 1350, loss[loss=0.009933, audio_tagging_loss=0.009933, over 25000.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4935823.62 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:24:42,404 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:24:45,783 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.42 vs. limit=15.0 2023-12-23 17:25:30,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1248426.6666666667, ans=0.1 2023-12-23 17:25:32,070 INFO [train.py:886] (1/4) Epoch 40, batch 1400, loss[loss=0.01257, audio_tagging_loss=0.01257, over 25000.00 frames. ], tot_loss[loss=0.01164, audio_tagging_loss=0.01164, over 4941019.08 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:25:32,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1248493.3333333333, ans=0.125 2023-12-23 17:25:39,075 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.177e+01 3.609e+01 3.720e+01 3.896e+01 4.432e+01, threshold=7.440e+01, percent-clipped=0.0 2023-12-23 17:26:03,784 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.91 vs. limit=15.0 2023-12-23 17:26:24,062 INFO [train.py:886] (1/4) Epoch 40, batch 1450, loss[loss=0.01162, audio_tagging_loss=0.01162, over 25000.00 frames. ], tot_loss[loss=0.01162, audio_tagging_loss=0.01162, over 4945761.42 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:26:26,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1248826.6666666667, ans=0.125 2023-12-23 17:26:54,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1249026.6666666667, ans=0.0 2023-12-23 17:26:58,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1249026.6666666667, ans=0.1 2023-12-23 17:27:15,482 INFO [train.py:886] (1/4) Epoch 40, batch 1500, loss[loss=0.01033, audio_tagging_loss=0.01033, over 25000.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4951055.75 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:27:22,486 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.173e+01 3.651e+01 3.787e+01 4.002e+01 4.522e+01, threshold=7.575e+01, percent-clipped=0.0 2023-12-23 17:27:40,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1249293.3333333333, ans=0.125 2023-12-23 17:27:58,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1249426.6666666667, ans=0.1 2023-12-23 17:27:59,458 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.20 vs. limit=22.5 2023-12-23 17:28:03,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1249426.6666666667, ans=0.0 2023-12-23 17:28:08,007 INFO [train.py:886] (1/4) Epoch 40, batch 1550, loss[loss=0.01256, audio_tagging_loss=0.01256, over 24750.00 frames. ], tot_loss[loss=0.01163, audio_tagging_loss=0.01163, over 4940808.30 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:28:13,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1249493.3333333333, ans=0.125 2023-12-23 17:28:16,345 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:28:19,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2023-12-23 17:28:36,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1249626.6666666667, ans=0.125 2023-12-23 17:28:46,595 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.98 vs. limit=15.0 2023-12-23 17:28:50,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1249760.0, ans=0.1 2023-12-23 17:28:51,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1249760.0, ans=0.0 2023-12-23 17:28:59,838 INFO [train.py:886] (1/4) Epoch 40, batch 1600, loss[loss=0.01296, audio_tagging_loss=0.01296, over 24750.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4944092.75 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:29:05,487 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.337e+01 3.668e+01 3.855e+01 4.043e+01 4.586e+01, threshold=7.710e+01, percent-clipped=0.0 2023-12-23 17:29:08,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1249826.6666666667, ans=0.2 2023-12-23 17:29:10,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1249893.3333333333, ans=0.125 2023-12-23 17:29:18,869 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.57 vs. limit=22.5 2023-12-23 17:29:21,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1249960.0, ans=0.125 2023-12-23 17:29:21,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1249960.0, ans=0.2 2023-12-23 17:29:41,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1250093.3333333333, ans=0.1 2023-12-23 17:29:42,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1250093.3333333333, ans=0.07 2023-12-23 17:29:44,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1250093.3333333333, ans=0.125 2023-12-23 17:29:51,689 INFO [train.py:886] (1/4) Epoch 40, batch 1650, loss[loss=0.01304, audio_tagging_loss=0.01304, over 25000.00 frames. ], tot_loss[loss=0.01173, audio_tagging_loss=0.01173, over 4941810.02 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:30:09,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1250226.6666666667, ans=0.125 2023-12-23 17:30:29,919 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:30:38,686 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2023-12-23 17:30:42,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1250493.3333333333, ans=0.0 2023-12-23 17:30:43,756 INFO [train.py:886] (1/4) Epoch 40, batch 1700, loss[loss=0.0129, audio_tagging_loss=0.0129, over 25000.00 frames. ], tot_loss[loss=0.01161, audio_tagging_loss=0.01161, over 4945996.10 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:30:50,035 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.211e+01 3.572e+01 3.767e+01 3.944e+01 4.587e+01, threshold=7.535e+01, percent-clipped=0.0 2023-12-23 17:30:56,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1250560.0, ans=0.1 2023-12-23 17:31:03,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1250560.0, ans=0.1 2023-12-23 17:31:07,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1250626.6666666667, ans=0.1 2023-12-23 17:31:09,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1250626.6666666667, ans=0.1 2023-12-23 17:31:35,847 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.67 vs. limit=22.5 2023-12-23 17:31:36,373 INFO [train.py:886] (1/4) Epoch 40, batch 1750, loss[loss=0.01372, audio_tagging_loss=0.01372, over 25000.00 frames. ], tot_loss[loss=0.01154, audio_tagging_loss=0.01154, over 4948203.74 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:31:43,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1250826.6666666667, ans=0.0 2023-12-23 17:31:57,921 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.93 vs. limit=15.0 2023-12-23 17:32:12,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1251026.6666666667, ans=0.07 2023-12-23 17:32:15,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1251026.6666666667, ans=0.125 2023-12-23 17:32:23,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=1251093.3333333333, ans=0.5 2023-12-23 17:32:26,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1251093.3333333333, ans=0.2 2023-12-23 17:32:28,087 INFO [train.py:886] (1/4) Epoch 40, batch 1800, loss[loss=0.01155, audio_tagging_loss=0.01155, over 25000.00 frames. ], tot_loss[loss=0.01148, audio_tagging_loss=0.01148, over 4955916.88 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:32:34,598 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.246e+01 3.649e+01 3.797e+01 4.032e+01 4.855e+01, threshold=7.595e+01, percent-clipped=0.0 2023-12-23 17:32:42,479 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.47 vs. limit=15.0 2023-12-23 17:32:57,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1251293.3333333333, ans=0.125 2023-12-23 17:32:58,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1251360.0, ans=0.125 2023-12-23 17:33:07,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1251360.0, ans=0.2 2023-12-23 17:33:20,814 INFO [train.py:886] (1/4) Epoch 40, batch 1850, loss[loss=0.00679, audio_tagging_loss=0.00679, over 24057.00 frames. ], tot_loss[loss=0.01151, audio_tagging_loss=0.01151, over 4958624.71 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:33:32,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1251560.0, ans=0.0 2023-12-23 17:33:54,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1251693.3333333333, ans=0.125 2023-12-23 17:34:12,176 INFO [train.py:886] (1/4) Epoch 40, batch 1900, loss[loss=0.01234, audio_tagging_loss=0.01234, over 24750.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4957912.54 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:34:18,622 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.248e+01 3.703e+01 3.895e+01 4.075e+01 4.598e+01, threshold=7.791e+01, percent-clipped=0.0 2023-12-23 17:34:25,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1251893.3333333333, ans=0.125 2023-12-23 17:34:32,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1251960.0, ans=0.125 2023-12-23 17:34:38,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1251960.0, ans=0.125 2023-12-23 17:34:39,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1251960.0, ans=0.125 2023-12-23 17:35:00,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1252093.3333333333, ans=0.125 2023-12-23 17:35:01,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1252093.3333333333, ans=0.1 2023-12-23 17:35:04,807 INFO [train.py:886] (1/4) Epoch 40, batch 1950, loss[loss=0.01147, audio_tagging_loss=0.01147, over 24750.00 frames. ], tot_loss[loss=0.01152, audio_tagging_loss=0.01152, over 4947567.69 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:35:08,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1252160.0, ans=0.125 2023-12-23 17:35:10,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1252160.0, ans=0.125 2023-12-23 17:35:15,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1252226.6666666667, ans=0.125 2023-12-23 17:35:17,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1252226.6666666667, ans=0.125 2023-12-23 17:35:56,367 INFO [train.py:886] (1/4) Epoch 40, batch 2000, loss[loss=0.01109, audio_tagging_loss=0.01109, over 24750.00 frames. ], tot_loss[loss=0.01139, audio_tagging_loss=0.01139, over 4945283.55 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:35:59,593 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.51 vs. limit=15.0 2023-12-23 17:36:02,095 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.938e+01 3.596e+01 3.830e+01 3.994e+01 4.617e+01, threshold=7.661e+01, percent-clipped=0.0 2023-12-23 17:36:10,479 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:36:42,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1252760.0, ans=0.125 2023-12-23 17:36:48,986 INFO [train.py:886] (1/4) Epoch 40, batch 2050, loss[loss=0.01178, audio_tagging_loss=0.01178, over 25000.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4944487.87 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:36:56,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1252826.6666666667, ans=0.125 2023-12-23 17:37:03,310 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.90 vs. limit=12.0 2023-12-23 17:37:05,229 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.56 vs. limit=6.0 2023-12-23 17:37:23,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1253026.6666666667, ans=0.2 2023-12-23 17:37:38,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1253093.3333333333, ans=0.09899494936611666 2023-12-23 17:37:39,205 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2023-12-23 17:37:39,749 INFO [train.py:886] (1/4) Epoch 40, batch 2100, loss[loss=0.01057, audio_tagging_loss=0.01057, over 25000.00 frames. ], tot_loss[loss=0.01139, audio_tagging_loss=0.01139, over 4948925.59 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:37:46,109 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.253e+01 3.675e+01 3.853e+01 3.940e+01 4.464e+01, threshold=7.707e+01, percent-clipped=0.0 2023-12-23 17:37:50,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1253226.6666666667, ans=0.0 2023-12-23 17:37:50,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1253226.6666666667, ans=0.0 2023-12-23 17:37:59,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1253293.3333333333, ans=0.125 2023-12-23 17:38:03,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1253293.3333333333, ans=0.1 2023-12-23 17:38:22,181 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.64 vs. limit=6.0 2023-12-23 17:38:32,978 INFO [train.py:886] (1/4) Epoch 40, batch 2150, loss[loss=0.01129, audio_tagging_loss=0.01129, over 25000.00 frames. ], tot_loss[loss=0.01144, audio_tagging_loss=0.01144, over 4955625.18 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:38:46,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1253560.0, ans=0.125 2023-12-23 17:38:54,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1253626.6666666667, ans=0.0 2023-12-23 17:39:02,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1253693.3333333333, ans=0.125 2023-12-23 17:39:11,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1253693.3333333333, ans=0.125 2023-12-23 17:39:16,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1253760.0, ans=0.0 2023-12-23 17:39:24,482 INFO [train.py:886] (1/4) Epoch 40, batch 2200, loss[loss=0.01181, audio_tagging_loss=0.01181, over 24750.00 frames. ], tot_loss[loss=0.01153, audio_tagging_loss=0.01153, over 4952492.47 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:39:25,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1253826.6666666667, ans=0.2 2023-12-23 17:39:30,818 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.315e+01 3.640e+01 3.872e+01 4.053e+01 7.102e+01, threshold=7.744e+01, percent-clipped=0.0 2023-12-23 17:39:42,525 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.49 vs. limit=15.0 2023-12-23 17:39:48,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1253960.0, ans=0.1 2023-12-23 17:39:58,112 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2023-12-23 17:40:11,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1254093.3333333333, ans=0.2 2023-12-23 17:40:13,550 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.63 vs. limit=15.0 2023-12-23 17:40:15,871 INFO [train.py:886] (1/4) Epoch 40, batch 2250, loss[loss=0.009546, audio_tagging_loss=0.009546, over 25000.00 frames. ], tot_loss[loss=0.01151, audio_tagging_loss=0.01151, over 4949594.57 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:40:31,017 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.01 vs. limit=15.0 2023-12-23 17:40:36,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1254293.3333333333, ans=0.125 2023-12-23 17:40:36,434 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.10 vs. limit=22.5 2023-12-23 17:41:04,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1254426.6666666667, ans=0.0 2023-12-23 17:41:08,088 INFO [train.py:886] (1/4) Epoch 40, batch 2300, loss[loss=0.00976, audio_tagging_loss=0.00976, over 24750.00 frames. ], tot_loss[loss=0.01144, audio_tagging_loss=0.01144, over 4947791.09 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:41:13,762 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.326e+01 3.633e+01 3.746e+01 3.973e+01 4.571e+01, threshold=7.491e+01, percent-clipped=0.0 2023-12-23 17:41:19,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1254560.0, ans=0.0 2023-12-23 17:41:25,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1254560.0, ans=0.125 2023-12-23 17:41:25,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1254560.0, ans=0.1 2023-12-23 17:41:36,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1254626.6666666667, ans=0.125 2023-12-23 17:41:38,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1254693.3333333333, ans=0.1 2023-12-23 17:41:39,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1254693.3333333333, ans=0.0 2023-12-23 17:41:55,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1254760.0, ans=0.125 2023-12-23 17:41:59,042 INFO [train.py:886] (1/4) Epoch 40, batch 2350, loss[loss=0.01134, audio_tagging_loss=0.01134, over 25000.00 frames. ], tot_loss[loss=0.01145, audio_tagging_loss=0.01145, over 4950732.74 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:42:01,801 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:42:12,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1254893.3333333333, ans=0.0 2023-12-23 17:42:12,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1254893.3333333333, ans=0.125 2023-12-23 17:42:16,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1254893.3333333333, ans=0.125 2023-12-23 17:42:22,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1254960.0, ans=0.1 2023-12-23 17:42:24,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1254960.0, ans=0.125 2023-12-23 17:42:28,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1254960.0, ans=0.0 2023-12-23 17:42:51,670 INFO [train.py:886] (1/4) Epoch 40, batch 2400, loss[loss=0.01347, audio_tagging_loss=0.01347, over 25000.00 frames. ], tot_loss[loss=0.01141, audio_tagging_loss=0.01141, over 4949037.94 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:42:57,950 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.192e+01 3.652e+01 3.798e+01 3.956e+01 4.622e+01, threshold=7.596e+01, percent-clipped=0.0 2023-12-23 17:43:07,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1255226.6666666667, ans=0.0 2023-12-23 17:43:08,450 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:43:11,149 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:43:15,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1255293.3333333333, ans=0.125 2023-12-23 17:43:23,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1255360.0, ans=0.0 2023-12-23 17:43:24,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1255360.0, ans=0.125 2023-12-23 17:43:30,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1255360.0, ans=0.125 2023-12-23 17:43:41,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1255426.6666666667, ans=0.125 2023-12-23 17:43:43,024 INFO [train.py:886] (1/4) Epoch 40, batch 2450, loss[loss=0.01232, audio_tagging_loss=0.01232, over 25000.00 frames. ], tot_loss[loss=0.01145, audio_tagging_loss=0.01145, over 4956026.63 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:43:45,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1255493.3333333333, ans=0.0 2023-12-23 17:43:54,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1255560.0, ans=0.0 2023-12-23 17:43:55,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1255560.0, ans=0.125 2023-12-23 17:44:01,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1255560.0, ans=0.125 2023-12-23 17:44:01,633 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2023-12-23 17:44:02,418 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.93 vs. limit=15.0 2023-12-23 17:44:05,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=1255626.6666666667, ans=0.2 2023-12-23 17:44:19,861 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.88 vs. limit=22.5 2023-12-23 17:44:26,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1255760.0, ans=0.2 2023-12-23 17:44:30,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1255760.0, ans=0.125 2023-12-23 17:44:34,765 INFO [train.py:886] (1/4) Epoch 40, batch 2500, loss[loss=0.01318, audio_tagging_loss=0.01318, over 24750.00 frames. ], tot_loss[loss=0.01162, audio_tagging_loss=0.01162, over 4954062.46 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:44:35,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1255826.6666666667, ans=0.0 2023-12-23 17:44:40,516 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.292e+01 3.682e+01 3.858e+01 3.997e+01 4.509e+01, threshold=7.716e+01, percent-clipped=0.0 2023-12-23 17:44:40,912 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.17 vs. limit=15.0 2023-12-23 17:44:50,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1255893.3333333333, ans=0.5 2023-12-23 17:44:55,576 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.10 vs. limit=12.0 2023-12-23 17:45:07,673 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.30 vs. limit=15.0 2023-12-23 17:45:14,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1256026.6666666667, ans=0.125 2023-12-23 17:45:26,461 INFO [train.py:886] (1/4) Epoch 40, batch 2550, loss[loss=0.0106, audio_tagging_loss=0.0106, over 24750.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4945323.52 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:45:51,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1256293.3333333333, ans=0.09899494936611666 2023-12-23 17:45:56,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1256360.0, ans=0.2 2023-12-23 17:46:03,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1256360.0, ans=10.0 2023-12-23 17:46:17,270 INFO [train.py:886] (1/4) Epoch 40, batch 2600, loss[loss=0.01102, audio_tagging_loss=0.01102, over 24750.00 frames. ], tot_loss[loss=0.01155, audio_tagging_loss=0.01155, over 4945412.77 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:46:22,945 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.345e+01 3.708e+01 3.865e+01 4.030e+01 5.247e+01, threshold=7.731e+01, percent-clipped=0.0 2023-12-23 17:46:23,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1256493.3333333333, ans=0.125 2023-12-23 17:46:30,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1256560.0, ans=0.125 2023-12-23 17:46:41,796 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.18 vs. limit=22.5 2023-12-23 17:46:49,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1256693.3333333333, ans=0.125 2023-12-23 17:46:58,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1256760.0, ans=0.125 2023-12-23 17:46:59,516 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.58 vs. limit=22.5 2023-12-23 17:47:01,790 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.59 vs. limit=6.0 2023-12-23 17:47:02,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1256760.0, ans=0.1 2023-12-23 17:47:03,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1256760.0, ans=0.125 2023-12-23 17:47:09,737 INFO [train.py:886] (1/4) Epoch 40, batch 2650, loss[loss=0.009059, audio_tagging_loss=0.009059, over 25000.00 frames. ], tot_loss[loss=0.01152, audio_tagging_loss=0.01152, over 4949366.61 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:48:00,530 INFO [train.py:886] (1/4) Epoch 40, batch 2700, loss[loss=0.01023, audio_tagging_loss=0.01023, over 24750.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4954582.50 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:48:00,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1257160.0, ans=0.1 2023-12-23 17:48:06,813 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.216e+01 3.619e+01 3.803e+01 3.974e+01 4.399e+01, threshold=7.606e+01, percent-clipped=0.0 2023-12-23 17:48:38,482 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=22.5 2023-12-23 17:48:44,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1257426.6666666667, ans=0.0 2023-12-23 17:48:46,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1257426.6666666667, ans=0.125 2023-12-23 17:48:52,685 INFO [train.py:886] (1/4) Epoch 40, batch 2750, loss[loss=0.01116, audio_tagging_loss=0.01116, over 25000.00 frames. ], tot_loss[loss=0.01141, audio_tagging_loss=0.01141, over 4950632.08 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:49:14,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1257626.6666666667, ans=0.09899494936611666 2023-12-23 17:49:25,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1257693.3333333333, ans=0.125 2023-12-23 17:49:27,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1257693.3333333333, ans=0.125 2023-12-23 17:49:30,180 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.93 vs. limit=22.5 2023-12-23 17:49:39,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1257760.0, ans=0.05 2023-12-23 17:49:44,292 INFO [train.py:886] (1/4) Epoch 40, batch 2800, loss[loss=0.01228, audio_tagging_loss=0.01228, over 24750.00 frames. ], tot_loss[loss=0.01151, audio_tagging_loss=0.01151, over 4949442.95 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:49:51,398 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.248e+01 3.697e+01 3.809e+01 4.002e+01 4.614e+01, threshold=7.617e+01, percent-clipped=0.0 2023-12-23 17:49:54,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1257893.3333333333, ans=0.125 2023-12-23 17:50:36,624 INFO [train.py:886] (1/4) Epoch 40, batch 2850, loss[loss=0.01319, audio_tagging_loss=0.01319, over 24750.00 frames. ], tot_loss[loss=0.01164, audio_tagging_loss=0.01164, over 4944735.37 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:50:38,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1258160.0, ans=0.2 2023-12-23 17:51:14,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1258360.0, ans=0.2 2023-12-23 17:51:23,207 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.68 vs. limit=10.0 2023-12-23 17:51:28,920 INFO [train.py:886] (1/4) Epoch 40, batch 2900, loss[loss=0.009185, audio_tagging_loss=0.009185, over 25000.00 frames. ], tot_loss[loss=0.01159, audio_tagging_loss=0.01159, over 4939943.98 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:51:30,288 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.45 vs. limit=6.0 2023-12-23 17:51:32,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1258493.3333333333, ans=0.2 2023-12-23 17:51:34,565 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.415e+01 3.699e+01 3.835e+01 4.007e+01 4.764e+01, threshold=7.669e+01, percent-clipped=0.0 2023-12-23 17:51:46,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1258560.0, ans=0.0 2023-12-23 17:51:48,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1258626.6666666667, ans=0.1 2023-12-23 17:52:20,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1258826.6666666667, ans=10.0 2023-12-23 17:52:20,887 INFO [train.py:886] (1/4) Epoch 40, batch 2950, loss[loss=0.01115, audio_tagging_loss=0.01115, over 24750.00 frames. ], tot_loss[loss=0.01151, audio_tagging_loss=0.01151, over 4942647.00 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:52:23,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1258826.6666666667, ans=0.125 2023-12-23 17:52:24,380 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.49 vs. limit=22.5 2023-12-23 17:52:53,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1259026.6666666667, ans=0.125 2023-12-23 17:52:56,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1259026.6666666667, ans=0.2 2023-12-23 17:53:10,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1259093.3333333333, ans=0.0 2023-12-23 17:53:12,734 INFO [train.py:886] (1/4) Epoch 40, batch 3000, loss[loss=0.009972, audio_tagging_loss=0.009972, over 24032.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4947714.51 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:53:12,734 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 17:53:33,972 INFO [train.py:917] (1/4) Epoch 40, validation: loss=0.03529, audio_tagging_loss=0.03529, over 3737520.00 frames. 2023-12-23 17:53:33,972 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 17:53:39,601 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.171e+01 3.612e+01 3.801e+01 4.054e+01 4.780e+01, threshold=7.602e+01, percent-clipped=0.0 2023-12-23 17:53:44,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1259226.6666666667, ans=0.1 2023-12-23 17:53:44,219 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:53:49,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1259226.6666666667, ans=0.125 2023-12-23 17:53:49,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1259226.6666666667, ans=0.0 2023-12-23 17:53:53,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1259293.3333333333, ans=0.125 2023-12-23 17:53:55,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1259293.3333333333, ans=0.125 2023-12-23 17:53:59,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1259293.3333333333, ans=0.0 2023-12-23 17:53:59,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1259293.3333333333, ans=0.0 2023-12-23 17:54:09,196 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=12.0 2023-12-23 17:54:18,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1259426.6666666667, ans=0.07 2023-12-23 17:54:25,435 INFO [train.py:886] (1/4) Epoch 40, batch 3050, loss[loss=0.01218, audio_tagging_loss=0.01218, over 25000.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4949463.72 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:54:30,537 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.63 vs. limit=15.0 2023-12-23 17:55:08,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1259760.0, ans=0.0 2023-12-23 17:55:12,882 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.53 vs. limit=15.0 2023-12-23 17:55:16,905 INFO [train.py:886] (1/4) Epoch 40, batch 3100, loss[loss=0.01418, audio_tagging_loss=0.01418, over 24750.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4950040.80 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:55:24,103 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.262e+01 3.674e+01 3.864e+01 4.007e+01 4.526e+01, threshold=7.728e+01, percent-clipped=0.0 2023-12-23 17:55:35,710 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.64 vs. limit=22.5 2023-12-23 17:55:44,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1259960.0, ans=0.0 2023-12-23 17:56:08,324 INFO [train.py:886] (1/4) Epoch 40, batch 3150, loss[loss=0.01593, audio_tagging_loss=0.01593, over 24750.00 frames. ], tot_loss[loss=0.01148, audio_tagging_loss=0.01148, over 4941702.40 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:56:32,254 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.56 vs. limit=15.0 2023-12-23 17:56:41,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1260360.0, ans=0.125 2023-12-23 17:56:44,974 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.38 vs. limit=22.5 2023-12-23 17:56:52,380 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=15.0 2023-12-23 17:57:00,307 INFO [train.py:886] (1/4) Epoch 40, batch 3200, loss[loss=0.01078, audio_tagging_loss=0.01078, over 24750.00 frames. ], tot_loss[loss=0.01144, audio_tagging_loss=0.01144, over 4936559.58 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:57:00,505 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:57:02,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1260493.3333333333, ans=0.125 2023-12-23 17:57:07,586 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.204e+01 3.736e+01 3.854e+01 4.070e+01 4.738e+01, threshold=7.708e+01, percent-clipped=0.0 2023-12-23 17:57:18,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1260560.0, ans=0.2 2023-12-23 17:57:20,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1260626.6666666667, ans=0.125 2023-12-23 17:57:28,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1260626.6666666667, ans=0.125 2023-12-23 17:57:30,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1260693.3333333333, ans=0.1 2023-12-23 17:57:39,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1260693.3333333333, ans=0.125 2023-12-23 17:57:49,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1260760.0, ans=0.2 2023-12-23 17:57:51,662 INFO [train.py:886] (1/4) Epoch 40, batch 3250, loss[loss=0.0117, audio_tagging_loss=0.0117, over 24750.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4938681.16 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:58:12,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1260960.0, ans=0.1 2023-12-23 17:58:12,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1260960.0, ans=0.0 2023-12-23 17:58:18,345 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:58:27,012 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.68 vs. limit=10.0 2023-12-23 17:58:35,215 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.57 vs. limit=15.0 2023-12-23 17:58:43,813 INFO [train.py:886] (1/4) Epoch 40, batch 3300, loss[loss=0.009768, audio_tagging_loss=0.009768, over 25000.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4938892.83 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:58:51,268 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.291e+01 3.619e+01 3.846e+01 4.004e+01 5.622e+01, threshold=7.691e+01, percent-clipped=0.0 2023-12-23 17:58:54,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1261226.6666666667, ans=0.2 2023-12-23 17:58:59,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1261226.6666666667, ans=0.0 2023-12-23 17:59:05,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1261293.3333333333, ans=0.125 2023-12-23 17:59:18,090 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.65 vs. limit=15.0 2023-12-23 17:59:23,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=15.0 2023-12-23 17:59:35,968 INFO [train.py:886] (1/4) Epoch 40, batch 3350, loss[loss=0.01023, audio_tagging_loss=0.01023, over 24750.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4945112.54 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:59:40,700 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:59:45,510 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.09 vs. limit=12.0 2023-12-23 17:59:49,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1261560.0, ans=0.5 2023-12-23 17:59:51,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1261560.0, ans=0.125 2023-12-23 17:59:57,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1261626.6666666667, ans=0.0 2023-12-23 18:00:10,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1261693.3333333333, ans=0.1 2023-12-23 18:00:12,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1261693.3333333333, ans=0.2 2023-12-23 18:00:12,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1261693.3333333333, ans=0.125 2023-12-23 18:00:28,505 INFO [train.py:886] (1/4) Epoch 40, batch 3400, loss[loss=0.01124, audio_tagging_loss=0.01124, over 25000.00 frames. ], tot_loss[loss=0.01145, audio_tagging_loss=0.01145, over 4952011.78 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 18:00:35,080 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.304e+01 3.654e+01 3.811e+01 4.007e+01 4.560e+01, threshold=7.622e+01, percent-clipped=0.0 2023-12-23 18:00:41,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1261893.3333333333, ans=0.025 2023-12-23 18:00:43,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1261893.3333333333, ans=0.125 2023-12-23 18:01:05,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1262026.6666666667, ans=0.125 2023-12-23 18:01:19,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1262160.0, ans=0.0 2023-12-23 18:01:20,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1262160.0, ans=0.125 2023-12-23 18:01:20,798 INFO [train.py:886] (1/4) Epoch 40, batch 3450, loss[loss=0.0117, audio_tagging_loss=0.0117, over 24750.00 frames. ], tot_loss[loss=0.01153, audio_tagging_loss=0.01153, over 4948460.55 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 18:01:27,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1262160.0, ans=0.125 2023-12-23 18:01:43,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1262293.3333333333, ans=0.125 2023-12-23 18:02:01,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1262426.6666666667, ans=0.125 2023-12-23 18:02:06,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1262426.6666666667, ans=0.0 2023-12-23 18:02:09,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1262426.6666666667, ans=0.0 2023-12-23 18:02:11,088 INFO [train.py:886] (1/4) Epoch 40, batch 3500, loss[loss=0.01018, audio_tagging_loss=0.01018, over 24750.00 frames. ], tot_loss[loss=0.01157, audio_tagging_loss=0.01157, over 4947160.12 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 18:02:13,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1262493.3333333333, ans=0.5 2023-12-23 18:02:18,418 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.273e+01 3.696e+01 3.842e+01 3.983e+01 4.882e+01, threshold=7.684e+01, percent-clipped=0.0 2023-12-23 18:02:28,965 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.63 vs. limit=15.0 2023-12-23 18:03:01,686 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.97 vs. limit=22.5 2023-12-23 18:03:04,011 INFO [train.py:886] (1/4) Epoch 40, batch 3550, loss[loss=0.01059, audio_tagging_loss=0.01059, over 25000.00 frames. ], tot_loss[loss=0.01151, audio_tagging_loss=0.01151, over 4947885.01 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 18:03:14,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1262893.3333333333, ans=0.0 2023-12-23 18:03:22,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1262893.3333333333, ans=0.125 2023-12-23 18:03:44,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=1263093.3333333333, ans=15.0 2023-12-23 18:03:45,452 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.83 vs. limit=15.0 2023-12-23 18:03:48,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1263093.3333333333, ans=0.125 2023-12-23 18:03:48,335 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.42 vs. limit=15.0 2023-12-23 18:03:55,063 INFO [train.py:886] (1/4) Epoch 40, batch 3600, loss[loss=0.01518, audio_tagging_loss=0.01518, over 25000.00 frames. ], tot_loss[loss=0.01143, audio_tagging_loss=0.01143, over 4948586.31 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 18:03:58,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1263160.0, ans=0.2 2023-12-23 18:04:03,185 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.291e+01 3.684e+01 3.812e+01 3.997e+01 4.511e+01, threshold=7.624e+01, percent-clipped=0.0 2023-12-23 18:04:09,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1263226.6666666667, ans=0.025 2023-12-23 18:04:11,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1263226.6666666667, ans=0.125 2023-12-23 18:04:47,312 INFO [train.py:886] (1/4) Epoch 40, batch 3650, loss[loss=0.00922, audio_tagging_loss=0.00922, over 24750.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4952820.53 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 18:04:53,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1263493.3333333333, ans=0.2 2023-12-23 18:05:25,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1263693.3333333333, ans=0.0 2023-12-23 18:05:38,709 INFO [train.py:886] (1/4) Epoch 40, batch 3700, loss[loss=0.01205, audio_tagging_loss=0.01205, over 24750.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4954630.56 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:05:46,058 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.278e+01 3.659e+01 3.785e+01 3.953e+01 4.581e+01, threshold=7.570e+01, percent-clipped=0.0 2023-12-23 18:05:53,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1263893.3333333333, ans=0.1 2023-12-23 18:05:57,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1263893.3333333333, ans=0.2 2023-12-23 18:06:24,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1264093.3333333333, ans=0.09899494936611666 2023-12-23 18:06:30,156 INFO [train.py:886] (1/4) Epoch 40, batch 3750, loss[loss=0.01398, audio_tagging_loss=0.01398, over 24750.00 frames. ], tot_loss[loss=0.01148, audio_tagging_loss=0.01148, over 4949753.35 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:06:42,633 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.64 vs. limit=22.5 2023-12-23 18:06:45,317 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.54 vs. limit=22.5 2023-12-23 18:06:46,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1264226.6666666667, ans=0.125 2023-12-23 18:06:46,806 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2023-12-23 18:06:47,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1264226.6666666667, ans=0.07 2023-12-23 18:06:56,391 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.24 vs. limit=15.0 2023-12-23 18:07:01,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1264360.0, ans=0.125 2023-12-23 18:07:17,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1264426.6666666667, ans=0.2 2023-12-23 18:07:23,135 INFO [train.py:886] (1/4) Epoch 40, batch 3800, loss[loss=0.012, audio_tagging_loss=0.012, over 24750.00 frames. ], tot_loss[loss=0.01162, audio_tagging_loss=0.01162, over 4947459.68 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:07:29,696 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.212e+01 3.721e+01 3.894e+01 4.067e+01 4.769e+01, threshold=7.788e+01, percent-clipped=0.0 2023-12-23 18:07:42,128 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.41 vs. limit=6.0 2023-12-23 18:07:47,697 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=15.0 2023-12-23 18:07:49,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1264626.6666666667, ans=0.125 2023-12-23 18:08:02,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1264760.0, ans=0.2 2023-12-23 18:08:13,789 INFO [train.py:886] (1/4) Epoch 40, batch 3850, loss[loss=0.01258, audio_tagging_loss=0.01258, over 24750.00 frames. ], tot_loss[loss=0.0115, audio_tagging_loss=0.0115, over 4940138.06 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:08:20,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1264826.6666666667, ans=0.05 2023-12-23 18:08:28,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1264893.3333333333, ans=0.2 2023-12-23 18:08:44,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1265026.6666666667, ans=0.0 2023-12-23 18:08:48,444 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.09 vs. limit=22.5 2023-12-23 18:08:56,675 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.85 vs. limit=15.0 2023-12-23 18:09:05,579 INFO [train.py:886] (1/4) Epoch 40, batch 3900, loss[loss=0.0118, audio_tagging_loss=0.0118, over 25000.00 frames. ], tot_loss[loss=0.01144, audio_tagging_loss=0.01144, over 4947197.91 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:09:12,986 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.264e+01 3.639e+01 3.828e+01 4.025e+01 4.570e+01, threshold=7.656e+01, percent-clipped=0.0 2023-12-23 18:09:18,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1265226.6666666667, ans=0.125 2023-12-23 18:09:26,905 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.38 vs. limit=5.0 2023-12-23 18:09:44,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1265360.0, ans=0.125 2023-12-23 18:09:55,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1265493.3333333333, ans=0.125 2023-12-23 18:09:56,688 INFO [train.py:886] (1/4) Epoch 40, batch 3950, loss[loss=0.009743, audio_tagging_loss=0.009743, over 24750.00 frames. ], tot_loss[loss=0.01141, audio_tagging_loss=0.01141, over 4954048.98 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:09:57,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1265493.3333333333, ans=0.0 2023-12-23 18:10:01,044 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.18 vs. limit=10.0 2023-12-23 18:10:03,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1265493.3333333333, ans=0.0 2023-12-23 18:10:38,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1265760.0, ans=0.125 2023-12-23 18:10:47,696 INFO [train.py:886] (1/4) Epoch 40, batch 4000, loss[loss=0.01467, audio_tagging_loss=0.01467, over 25000.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4957967.59 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:10:54,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1265826.6666666667, ans=0.0 2023-12-23 18:10:55,002 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.346e+01 3.601e+01 3.801e+01 3.964e+01 6.190e+01, threshold=7.601e+01, percent-clipped=0.0 2023-12-23 18:11:29,478 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.12 vs. limit=22.5 2023-12-23 18:11:36,907 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.23 vs. limit=12.0 2023-12-23 18:11:40,058 INFO [train.py:886] (1/4) Epoch 40, batch 4050, loss[loss=0.0118, audio_tagging_loss=0.0118, over 25000.00 frames. ], tot_loss[loss=0.01152, audio_tagging_loss=0.01152, over 4956887.33 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:11:43,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1266160.0, ans=0.0 2023-12-23 18:11:53,567 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.40 vs. limit=15.0 2023-12-23 18:11:55,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1266226.6666666667, ans=0.125 2023-12-23 18:12:05,180 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 18:12:07,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1266293.3333333333, ans=0.95 2023-12-23 18:12:18,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1266360.0, ans=0.0 2023-12-23 18:12:19,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1266360.0, ans=0.125 2023-12-23 18:12:20,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1266426.6666666667, ans=0.125 2023-12-23 18:12:31,440 INFO [train.py:886] (1/4) Epoch 40, batch 4100, loss[loss=0.01111, audio_tagging_loss=0.01111, over 24750.00 frames. ], tot_loss[loss=0.01161, audio_tagging_loss=0.01161, over 4944128.87 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:12:38,724 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.330e+01 3.783e+01 3.912e+01 4.094e+01 5.068e+01, threshold=7.824e+01, percent-clipped=0.0 2023-12-23 18:13:12,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1266760.0, ans=0.125 2023-12-23 18:13:18,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1266760.0, ans=0.0 2023-12-23 18:13:19,171 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.20 vs. limit=15.0 2023-12-23 18:13:24,057 INFO [train.py:886] (1/4) Epoch 40, batch 4150, loss[loss=0.008774, audio_tagging_loss=0.008774, over 24750.00 frames. ], tot_loss[loss=0.01155, audio_tagging_loss=0.01155, over 4943516.96 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:13:25,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.16 vs. limit=10.0 2023-12-23 18:13:34,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1266893.3333333333, ans=0.025 2023-12-23 18:13:34,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1266893.3333333333, ans=0.125 2023-12-23 18:13:34,629 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.88 vs. limit=15.0 2023-12-23 18:13:37,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1266893.3333333333, ans=0.2 2023-12-23 18:13:41,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1266893.3333333333, ans=0.125 2023-12-23 18:13:44,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1266960.0, ans=0.0 2023-12-23 18:13:51,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1266960.0, ans=0.1 2023-12-23 18:13:56,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1267026.6666666667, ans=0.2 2023-12-23 18:14:02,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1267026.6666666667, ans=0.125 2023-12-23 18:14:15,843 INFO [train.py:886] (1/4) Epoch 40, batch 4200, loss[loss=0.01245, audio_tagging_loss=0.01245, over 25000.00 frames. ], tot_loss[loss=0.01139, audio_tagging_loss=0.01139, over 4944491.61 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:14:23,316 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.248e+01 3.687e+01 3.822e+01 4.029e+01 4.624e+01, threshold=7.645e+01, percent-clipped=0.0 2023-12-23 18:14:24,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1267160.0, ans=0.2 2023-12-23 18:14:30,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1267226.6666666667, ans=0.0 2023-12-23 18:14:53,005 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=22.5 2023-12-23 18:14:54,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1267360.0, ans=0.2 2023-12-23 18:14:59,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1267426.6666666667, ans=0.0 2023-12-23 18:15:08,325 INFO [train.py:886] (1/4) Epoch 40, batch 4250, loss[loss=0.01067, audio_tagging_loss=0.01067, over 23973.00 frames. ], tot_loss[loss=0.01146, audio_tagging_loss=0.01146, over 4946822.01 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:15:16,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1267493.3333333333, ans=0.2 2023-12-23 18:15:34,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1267626.6666666667, ans=0.1 2023-12-23 18:15:45,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1267693.3333333333, ans=0.1 2023-12-23 18:15:59,893 INFO [train.py:886] (1/4) Epoch 40, batch 4300, loss[loss=0.01289, audio_tagging_loss=0.01289, over 25000.00 frames. ], tot_loss[loss=0.01144, audio_tagging_loss=0.01144, over 4946254.18 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:16:03,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1267826.6666666667, ans=0.125 2023-12-23 18:16:06,448 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.393e+01 3.668e+01 3.826e+01 3.970e+01 4.663e+01, threshold=7.653e+01, percent-clipped=0.0 2023-12-23 18:16:12,823 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 18:16:15,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1267893.3333333333, ans=0.035 2023-12-23 18:16:16,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1267893.3333333333, ans=0.0 2023-12-23 18:16:20,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1267960.0, ans=0.125 2023-12-23 18:16:21,995 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.80 vs. limit=15.0 2023-12-23 18:16:26,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=1267960.0, ans=22.5 2023-12-23 18:16:31,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1268026.6666666667, ans=0.125 2023-12-23 18:16:33,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1268026.6666666667, ans=0.125 2023-12-23 18:16:42,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1268093.3333333333, ans=0.0 2023-12-23 18:16:46,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1268093.3333333333, ans=0.0 2023-12-23 18:16:51,159 INFO [train.py:886] (1/4) Epoch 40, batch 4350, loss[loss=0.01175, audio_tagging_loss=0.01175, over 24750.00 frames. ], tot_loss[loss=0.01146, audio_tagging_loss=0.01146, over 4945942.61 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:16:55,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1268160.0, ans=0.125 2023-12-23 18:17:06,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1268226.6666666667, ans=0.2 2023-12-23 18:17:28,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1268360.0, ans=0.0 2023-12-23 18:17:40,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1268426.6666666667, ans=0.0 2023-12-23 18:17:42,519 INFO [train.py:886] (1/4) Epoch 40, batch 4400, loss[loss=0.01115, audio_tagging_loss=0.01115, over 24750.00 frames. ], tot_loss[loss=0.01151, audio_tagging_loss=0.01151, over 4943466.27 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:17:50,631 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.283e+01 3.697e+01 3.843e+01 4.013e+01 4.471e+01, threshold=7.687e+01, percent-clipped=0.0 2023-12-23 18:18:35,366 INFO [train.py:886] (1/4) Epoch 40, batch 4450, loss[loss=0.0108, audio_tagging_loss=0.0108, over 25000.00 frames. ], tot_loss[loss=0.01157, audio_tagging_loss=0.01157, over 4945528.87 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:18:36,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.78 vs. limit=6.0 2023-12-23 18:18:40,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1268826.6666666667, ans=0.2 2023-12-23 18:18:41,642 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.53 vs. limit=12.0 2023-12-23 18:18:42,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1268826.6666666667, ans=0.125 2023-12-23 18:18:55,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1268960.0, ans=0.125 2023-12-23 18:19:09,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1269026.6666666667, ans=0.0 2023-12-23 18:19:27,036 INFO [train.py:886] (1/4) Epoch 40, batch 4500, loss[loss=0.01104, audio_tagging_loss=0.01104, over 25000.00 frames. ], tot_loss[loss=0.01151, audio_tagging_loss=0.01151, over 4942922.54 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:19:31,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1269160.0, ans=0.125 2023-12-23 18:19:34,277 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.454e+01 3.692e+01 3.865e+01 4.053e+01 4.653e+01, threshold=7.730e+01, percent-clipped=0.0 2023-12-23 18:19:34,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1269160.0, ans=0.125 2023-12-23 18:19:37,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1269226.6666666667, ans=0.0 2023-12-23 18:19:41,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1269226.6666666667, ans=0.125 2023-12-23 18:19:42,907 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 18:20:04,750 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.61 vs. limit=15.0 2023-12-23 18:20:18,860 INFO [train.py:886] (1/4) Epoch 40, batch 4550, loss[loss=0.01119, audio_tagging_loss=0.01119, over 24750.00 frames. ], tot_loss[loss=0.01148, audio_tagging_loss=0.01148, over 4947265.89 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:20:50,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1269693.3333333333, ans=0.125 2023-12-23 18:20:55,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1269693.3333333333, ans=0.0 2023-12-23 18:20:58,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1269693.3333333333, ans=0.0 2023-12-23 18:21:01,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1269760.0, ans=0.0 2023-12-23 18:21:03,483 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 18:21:05,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1269760.0, ans=0.125 2023-12-23 18:21:10,434 INFO [train.py:886] (1/4) Epoch 40, batch 4600, loss[loss=0.01172, audio_tagging_loss=0.01172, over 25000.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4947013.35 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:21:11,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1269826.6666666667, ans=0.0 2023-12-23 18:21:12,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1269826.6666666667, ans=0.125 2023-12-23 18:21:14,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1269826.6666666667, ans=10.0 2023-12-23 18:21:17,650 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.202e+01 3.671e+01 3.796e+01 3.991e+01 4.710e+01, threshold=7.593e+01, percent-clipped=0.0 2023-12-23 18:21:29,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1269960.0, ans=0.1 2023-12-23 18:21:31,040 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.31 vs. limit=6.0 2023-12-23 18:21:37,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1269960.0, ans=0.125 2023-12-23 18:21:39,742 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.42 vs. limit=22.5 2023-12-23 18:21:44,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1270026.6666666667, ans=0.125 2023-12-23 18:21:45,390 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.16 vs. limit=15.0 2023-12-23 18:21:47,499 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2023-12-23 18:21:52,880 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=3.87 vs. limit=15.0 2023-12-23 18:22:00,876 INFO [train.py:886] (1/4) Epoch 40, batch 4650, loss[loss=0.01239, audio_tagging_loss=0.01239, over 25000.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4953801.10 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:22:05,925 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.26 vs. limit=6.0 2023-12-23 18:22:08,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1270160.0, ans=0.125 2023-12-23 18:22:13,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1270226.6666666667, ans=0.125 2023-12-23 18:22:22,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1270293.3333333333, ans=0.125 2023-12-23 18:22:52,823 INFO [train.py:886] (1/4) Epoch 40, batch 4700, loss[loss=0.009867, audio_tagging_loss=0.009867, over 24750.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4942027.12 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:22:59,155 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.413e+01 3.757e+01 3.899e+01 4.092e+01 5.478e+01, threshold=7.799e+01, percent-clipped=0.0 2023-12-23 18:22:59,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1270493.3333333333, ans=0.0 2023-12-23 18:23:07,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1270560.0, ans=0.1 2023-12-23 18:23:30,617 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.29 vs. limit=10.0 2023-12-23 18:23:39,623 INFO [train.py:886] (1/4) Epoch 40, batch 4750, loss[loss=0.01044, audio_tagging_loss=0.01044, over 24750.00 frames. ], tot_loss[loss=0.01154, audio_tagging_loss=0.01154, over 4941936.04 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:23:45,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1270826.6666666667, ans=0.025 2023-12-23 18:23:46,492 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.88 vs. limit=12.0 2023-12-23 18:24:13,759 INFO [train.py:886] (1/4) Epoch 41, batch 0, loss[loss=0.02418, audio_tagging_loss=0.02418, over 25000.00 frames. ], tot_loss[loss=0.02418, audio_tagging_loss=0.02418, over 25000.00 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:24:13,759 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 18:24:31,888 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.6461, 2.9854, 4.1463, 3.8282], device='cuda:1') 2023-12-23 18:24:35,138 INFO [train.py:917] (1/4) Epoch 41, validation: loss=0.03496, audio_tagging_loss=0.03496, over 3737520.00 frames. 2023-12-23 18:24:35,139 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 18:24:50,030 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.34 vs. limit=15.0 2023-12-23 18:24:59,196 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 18:25:03,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1271066.6666666667, ans=0.125 2023-12-23 18:25:17,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1271200.0, ans=0.0 2023-12-23 18:25:18,894 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.493e+01 3.874e+01 4.258e+01 5.303e+01 1.010e+02, threshold=8.517e+01, percent-clipped=7.0 2023-12-23 18:25:26,257 INFO [train.py:886] (1/4) Epoch 41, batch 50, loss[loss=0.01414, audio_tagging_loss=0.01414, over 25000.00 frames. ], tot_loss[loss=0.01817, audio_tagging_loss=0.01817, over 1112321.24 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 16.0 2023-12-23 18:25:54,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1271400.0, ans=0.0 2023-12-23 18:25:58,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1271466.6666666667, ans=0.125 2023-12-23 18:26:07,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1271533.3333333333, ans=0.1 2023-12-23 18:26:11,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1271533.3333333333, ans=0.125 2023-12-23 18:26:13,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1271533.3333333333, ans=0.2 2023-12-23 18:26:18,037 INFO [train.py:886] (1/4) Epoch 41, batch 100, loss[loss=0.01613, audio_tagging_loss=0.01613, over 25000.00 frames. ], tot_loss[loss=0.01591, audio_tagging_loss=0.01591, over 1965441.43 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 16.0 2023-12-23 18:26:35,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1271666.6666666667, ans=0.2 2023-12-23 18:26:36,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1271666.6666666667, ans=0.0 2023-12-23 18:26:41,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1271733.3333333333, ans=0.1 2023-12-23 18:26:45,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1271733.3333333333, ans=0.125 2023-12-23 18:26:46,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1271733.3333333333, ans=0.0 2023-12-23 18:26:51,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1271800.0, ans=0.125 2023-12-23 18:27:03,063 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.548e+01 3.949e+01 4.182e+01 4.375e+01 5.097e+01, threshold=8.364e+01, percent-clipped=0.0 2023-12-23 18:27:06,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1271866.6666666667, ans=0.0 2023-12-23 18:27:09,814 INFO [train.py:886] (1/4) Epoch 41, batch 150, loss[loss=0.01115, audio_tagging_loss=0.01115, over 25000.00 frames. ], tot_loss[loss=0.01451, audio_tagging_loss=0.01451, over 2633935.57 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 16.0 2023-12-23 18:27:27,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1272000.0, ans=0.0 2023-12-23 18:27:41,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1272133.3333333333, ans=0.025 2023-12-23 18:27:42,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1272133.3333333333, ans=0.125 2023-12-23 18:27:42,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.25 vs. limit=15.0 2023-12-23 18:28:02,503 INFO [train.py:886] (1/4) Epoch 41, batch 200, loss[loss=0.01057, audio_tagging_loss=0.01057, over 25000.00 frames. ], tot_loss[loss=0.01368, audio_tagging_loss=0.01368, over 3151462.22 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 16.0 2023-12-23 18:28:03,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1272266.6666666667, ans=0.0 2023-12-23 18:28:26,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1272400.0, ans=0.125 2023-12-23 18:28:27,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1272400.0, ans=0.0 2023-12-23 18:28:37,183 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.99 vs. limit=15.0 2023-12-23 18:28:42,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1272466.6666666667, ans=0.125 2023-12-23 18:28:46,332 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.74 vs. limit=15.0 2023-12-23 18:28:46,898 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.401e+01 3.687e+01 3.842e+01 3.967e+01 4.665e+01, threshold=7.685e+01, percent-clipped=0.0 2023-12-23 18:28:53,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1272600.0, ans=0.0 2023-12-23 18:28:54,261 INFO [train.py:886] (1/4) Epoch 41, batch 250, loss[loss=0.0121, audio_tagging_loss=0.0121, over 25000.00 frames. ], tot_loss[loss=0.01309, audio_tagging_loss=0.01309, over 3559701.62 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 16.0 2023-12-23 18:29:14,145 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.13 vs. limit=15.0 2023-12-23 18:29:45,609 INFO [train.py:886] (1/4) Epoch 41, batch 300, loss[loss=0.01142, audio_tagging_loss=0.01142, over 24750.00 frames. ], tot_loss[loss=0.01275, audio_tagging_loss=0.01275, over 3863851.48 frames. ], batch size: 99, lr: 2.63e-03, grad_scale: 16.0 2023-12-23 18:30:06,633 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.55 vs. limit=15.0 2023-12-23 18:30:29,128 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.255e+01 3.673e+01 3.844e+01 4.121e+01 4.840e+01, threshold=7.689e+01, percent-clipped=0.0 2023-12-23 18:30:36,419 INFO [train.py:886] (1/4) Epoch 41, batch 350, loss[loss=0.01298, audio_tagging_loss=0.01298, over 24940.00 frames. ], tot_loss[loss=0.01254, audio_tagging_loss=0.01254, over 4101308.06 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 16.0 2023-12-23 18:30:36,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1273266.6666666667, ans=0.125 2023-12-23 18:30:38,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1273266.6666666667, ans=0.125 2023-12-23 18:30:56,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1273400.0, ans=0.2 2023-12-23 18:31:03,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1273400.0, ans=0.125 2023-12-23 18:31:06,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1273466.6666666667, ans=0.125 2023-12-23 18:31:27,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1273533.3333333333, ans=0.125 2023-12-23 18:31:28,756 INFO [train.py:886] (1/4) Epoch 41, batch 400, loss[loss=0.009319, audio_tagging_loss=0.009319, over 24750.00 frames. ], tot_loss[loss=0.01215, audio_tagging_loss=0.01215, over 4291562.18 frames. ], batch size: 99, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:31:44,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1273666.6666666667, ans=0.125 2023-12-23 18:31:48,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1273733.3333333333, ans=0.125 2023-12-23 18:32:00,639 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.35 vs. limit=12.0 2023-12-23 18:32:07,206 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.48 vs. limit=15.0 2023-12-23 18:32:13,132 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.313e+01 3.591e+01 3.744e+01 3.939e+01 4.826e+01, threshold=7.487e+01, percent-clipped=0.0 2023-12-23 18:32:20,541 INFO [train.py:886] (1/4) Epoch 41, batch 450, loss[loss=0.01143, audio_tagging_loss=0.01143, over 25000.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4442909.55 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:32:29,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1273933.3333333333, ans=0.1 2023-12-23 18:32:39,478 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=7.84 vs. limit=15.0 2023-12-23 18:32:50,465 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=22.5 2023-12-23 18:33:12,245 INFO [train.py:886] (1/4) Epoch 41, batch 500, loss[loss=0.01034, audio_tagging_loss=0.01034, over 25000.00 frames. ], tot_loss[loss=0.01181, audio_tagging_loss=0.01181, over 4558023.96 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:33:14,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1274266.6666666667, ans=0.125 2023-12-23 18:33:25,186 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 18:33:33,918 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.07 vs. limit=12.0 2023-12-23 18:33:38,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1274400.0, ans=0.0 2023-12-23 18:33:46,874 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.83 vs. limit=15.0 2023-12-23 18:33:49,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1274466.6666666667, ans=0.125 2023-12-23 18:33:52,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1274466.6666666667, ans=0.125 2023-12-23 18:33:56,403 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.109e+01 3.650e+01 3.820e+01 3.998e+01 4.800e+01, threshold=7.639e+01, percent-clipped=0.0 2023-12-23 18:33:56,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1274533.3333333333, ans=0.2 2023-12-23 18:34:01,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1274533.3333333333, ans=0.125 2023-12-23 18:34:03,789 INFO [train.py:886] (1/4) Epoch 41, batch 550, loss[loss=0.01266, audio_tagging_loss=0.01266, over 24750.00 frames. ], tot_loss[loss=0.01172, audio_tagging_loss=0.01172, over 4646410.28 frames. ], batch size: 99, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:34:03,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1274600.0, ans=0.125 2023-12-23 18:34:15,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1274666.6666666667, ans=0.05 2023-12-23 18:34:23,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=1274666.6666666667, ans=22.5 2023-12-23 18:34:45,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1274866.6666666667, ans=0.1 2023-12-23 18:34:56,192 INFO [train.py:886] (1/4) Epoch 41, batch 600, loss[loss=0.01402, audio_tagging_loss=0.01402, over 24750.00 frames. ], tot_loss[loss=0.01178, audio_tagging_loss=0.01178, over 4710182.47 frames. ], batch size: 99, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:35:03,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1274933.3333333333, ans=0.125 2023-12-23 18:35:21,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1275066.6666666667, ans=0.125 2023-12-23 18:35:21,220 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.85 vs. limit=15.0 2023-12-23 18:35:40,234 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.244e+01 3.747e+01 3.891e+01 4.086e+01 5.120e+01, threshold=7.782e+01, percent-clipped=0.0 2023-12-23 18:35:44,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1275200.0, ans=0.0 2023-12-23 18:35:48,378 INFO [train.py:886] (1/4) Epoch 41, batch 650, loss[loss=0.01166, audio_tagging_loss=0.01166, over 24750.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4762231.84 frames. ], batch size: 99, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:35:58,221 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.05 vs. limit=12.0 2023-12-23 18:36:15,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1275400.0, ans=0.125 2023-12-23 18:36:37,577 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.87 vs. limit=15.0 2023-12-23 18:36:39,881 INFO [train.py:886] (1/4) Epoch 41, batch 700, loss[loss=0.01294, audio_tagging_loss=0.01294, over 25000.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4801515.44 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:36:49,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.63 vs. limit=15.0 2023-12-23 18:37:05,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1275733.3333333333, ans=0.125 2023-12-23 18:37:08,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1275733.3333333333, ans=0.0 2023-12-23 18:37:16,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1275800.0, ans=0.0 2023-12-23 18:37:19,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1275800.0, ans=0.1 2023-12-23 18:37:22,056 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.46 vs. limit=15.0 2023-12-23 18:37:24,900 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.413e+01 3.750e+01 3.860e+01 4.061e+01 4.621e+01, threshold=7.721e+01, percent-clipped=0.0 2023-12-23 18:37:32,212 INFO [train.py:886] (1/4) Epoch 41, batch 750, loss[loss=0.0113, audio_tagging_loss=0.0113, over 25000.00 frames. ], tot_loss[loss=0.01161, audio_tagging_loss=0.01161, over 4836321.12 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:37:43,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1276000.0, ans=0.0 2023-12-23 18:37:44,951 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.08 vs. limit=15.0 2023-12-23 18:37:47,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1276000.0, ans=0.1 2023-12-23 18:37:54,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1276066.6666666667, ans=0.0 2023-12-23 18:38:22,878 INFO [train.py:886] (1/4) Epoch 41, batch 800, loss[loss=0.009053, audio_tagging_loss=0.009053, over 22146.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4857767.53 frames. ], batch size: 107, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:38:32,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1276266.6666666667, ans=0.0 2023-12-23 18:38:53,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1276466.6666666667, ans=0.1 2023-12-23 18:39:08,859 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.333e+01 3.674e+01 3.793e+01 3.949e+01 4.603e+01, threshold=7.587e+01, percent-clipped=0.0 2023-12-23 18:39:10,305 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.05 vs. limit=15.0 2023-12-23 18:39:15,583 INFO [train.py:886] (1/4) Epoch 41, batch 850, loss[loss=0.0102, audio_tagging_loss=0.0102, over 25000.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4883087.18 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:39:35,409 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.40 vs. limit=15.0 2023-12-23 18:39:44,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1276733.3333333333, ans=0.1 2023-12-23 18:39:47,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1276800.0, ans=0.2 2023-12-23 18:39:54,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1276800.0, ans=0.07 2023-12-23 18:40:07,468 INFO [train.py:886] (1/4) Epoch 41, batch 900, loss[loss=0.01152, audio_tagging_loss=0.01152, over 24750.00 frames. ], tot_loss[loss=0.0115, audio_tagging_loss=0.0115, over 4899406.02 frames. ], batch size: 99, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:40:16,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1276933.3333333333, ans=0.0 2023-12-23 18:40:23,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1277000.0, ans=0.125 2023-12-23 18:40:23,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1277000.0, ans=0.1 2023-12-23 18:40:52,162 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.459e+01 3.699e+01 3.943e+01 4.115e+01 4.708e+01, threshold=7.886e+01, percent-clipped=0.0 2023-12-23 18:40:59,580 INFO [train.py:886] (1/4) Epoch 41, batch 950, loss[loss=0.009015, audio_tagging_loss=0.009015, over 24750.00 frames. ], tot_loss[loss=0.01154, audio_tagging_loss=0.01154, over 4900900.74 frames. ], batch size: 99, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:41:02,038 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.10 vs. limit=22.5 2023-12-23 18:41:14,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1277333.3333333333, ans=0.2 2023-12-23 18:41:20,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1277400.0, ans=0.125 2023-12-23 18:41:25,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1277400.0, ans=0.125 2023-12-23 18:41:52,147 INFO [train.py:886] (1/4) Epoch 41, batch 1000, loss[loss=0.01273, audio_tagging_loss=0.01273, over 24750.00 frames. ], tot_loss[loss=0.01163, audio_tagging_loss=0.01163, over 4910146.72 frames. ], batch size: 99, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:42:09,133 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.94 vs. limit=15.0 2023-12-23 18:42:35,892 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.182e+01 3.700e+01 3.857e+01 4.081e+01 5.233e+01, threshold=7.714e+01, percent-clipped=0.0 2023-12-23 18:42:43,290 INFO [train.py:886] (1/4) Epoch 41, batch 1050, loss[loss=0.01261, audio_tagging_loss=0.01261, over 25000.00 frames. ], tot_loss[loss=0.01157, audio_tagging_loss=0.01157, over 4921309.30 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:42:46,466 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.11 vs. limit=22.5 2023-12-23 18:42:50,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1277933.3333333333, ans=0.125 2023-12-23 18:43:08,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1278066.6666666667, ans=0.0 2023-12-23 18:43:34,890 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.17 vs. limit=15.0 2023-12-23 18:43:35,405 INFO [train.py:886] (1/4) Epoch 41, batch 1100, loss[loss=0.01195, audio_tagging_loss=0.01195, over 25000.00 frames. ], tot_loss[loss=0.01152, audio_tagging_loss=0.01152, over 4929072.33 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:43:36,015 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.34 vs. limit=15.0 2023-12-23 18:43:56,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1278400.0, ans=0.125 2023-12-23 18:44:00,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1278400.0, ans=0.0 2023-12-23 18:44:19,475 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.288e+01 3.657e+01 3.787e+01 4.008e+01 4.824e+01, threshold=7.573e+01, percent-clipped=0.0 2023-12-23 18:44:26,101 INFO [train.py:886] (1/4) Epoch 41, batch 1150, loss[loss=0.01015, audio_tagging_loss=0.01015, over 25000.00 frames. ], tot_loss[loss=0.01149, audio_tagging_loss=0.01149, over 4931074.31 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:44:32,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1278600.0, ans=0.125 2023-12-23 18:44:35,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.29 vs. limit=22.5 2023-12-23 18:44:40,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1278666.6666666667, ans=0.125 2023-12-23 18:44:49,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1278733.3333333333, ans=0.07 2023-12-23 18:45:03,236 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.36 vs. limit=15.0 2023-12-23 18:45:04,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1278800.0, ans=0.0 2023-12-23 18:45:13,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1278866.6666666667, ans=0.0 2023-12-23 18:45:18,180 INFO [train.py:886] (1/4) Epoch 41, batch 1200, loss[loss=0.01235, audio_tagging_loss=0.01235, over 25000.00 frames. ], tot_loss[loss=0.01154, audio_tagging_loss=0.01154, over 4938201.98 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:45:28,898 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.05 vs. limit=15.0 2023-12-23 18:45:34,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1279000.0, ans=0.07 2023-12-23 18:45:36,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1279000.0, ans=0.0 2023-12-23 18:45:52,784 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.10 vs. limit=22.5 2023-12-23 18:45:59,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1279200.0, ans=0.0 2023-12-23 18:46:02,172 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.367e+01 3.706e+01 3.872e+01 4.011e+01 4.696e+01, threshold=7.743e+01, percent-clipped=0.0 2023-12-23 18:46:05,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1279200.0, ans=0.125 2023-12-23 18:46:09,433 INFO [train.py:886] (1/4) Epoch 41, batch 1250, loss[loss=0.01158, audio_tagging_loss=0.01158, over 24750.00 frames. ], tot_loss[loss=0.0115, audio_tagging_loss=0.0115, over 4928761.07 frames. ], batch size: 99, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:46:14,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1279266.6666666667, ans=0.1 2023-12-23 18:46:27,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1279333.3333333333, ans=0.125 2023-12-23 18:46:38,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1279400.0, ans=0.2 2023-12-23 18:46:39,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1279466.6666666667, ans=0.2 2023-12-23 18:47:01,529 INFO [train.py:886] (1/4) Epoch 41, batch 1300, loss[loss=0.01192, audio_tagging_loss=0.01192, over 24750.00 frames. ], tot_loss[loss=0.0116, audio_tagging_loss=0.0116, over 4927539.76 frames. ], batch size: 99, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:47:02,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1279600.0, ans=0.015 2023-12-23 18:47:45,472 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.348e+01 3.706e+01 3.903e+01 4.052e+01 5.836e+01, threshold=7.805e+01, percent-clipped=0.0 2023-12-23 18:47:45,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1279866.6666666667, ans=0.2 2023-12-23 18:47:52,845 INFO [train.py:886] (1/4) Epoch 41, batch 1350, loss[loss=0.01263, audio_tagging_loss=0.01263, over 24921.00 frames. ], tot_loss[loss=0.01155, audio_tagging_loss=0.01155, over 4928488.79 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:48:01,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1279933.3333333333, ans=0.125 2023-12-23 18:48:08,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1280000.0, ans=0.0 2023-12-23 18:48:12,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1280000.0, ans=0.2 2023-12-23 18:48:16,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1280066.6666666667, ans=0.2 2023-12-23 18:48:35,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1280200.0, ans=0.07 2023-12-23 18:48:40,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1280200.0, ans=0.125 2023-12-23 18:48:45,526 INFO [train.py:886] (1/4) Epoch 41, batch 1400, loss[loss=0.01028, audio_tagging_loss=0.01028, over 25000.00 frames. ], tot_loss[loss=0.01143, audio_tagging_loss=0.01143, over 4937814.31 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:48:49,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1280266.6666666667, ans=0.125 2023-12-23 18:48:52,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1280266.6666666667, ans=0.07 2023-12-23 18:49:06,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1280400.0, ans=0.5 2023-12-23 18:49:17,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1280466.6666666667, ans=0.0 2023-12-23 18:49:18,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1280466.6666666667, ans=0.125 2023-12-23 18:49:30,415 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.274e+01 3.669e+01 3.890e+01 4.024e+01 4.902e+01, threshold=7.779e+01, percent-clipped=0.0 2023-12-23 18:49:37,011 INFO [train.py:886] (1/4) Epoch 41, batch 1450, loss[loss=0.01219, audio_tagging_loss=0.01219, over 24908.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4943698.82 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:49:51,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1280666.6666666667, ans=0.0 2023-12-23 18:50:21,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1280866.6666666667, ans=0.125 2023-12-23 18:50:28,725 INFO [train.py:886] (1/4) Epoch 41, batch 1500, loss[loss=0.01216, audio_tagging_loss=0.01216, over 25000.00 frames. ], tot_loss[loss=0.01145, audio_tagging_loss=0.01145, over 4950001.22 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:50:42,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1281000.0, ans=0.0 2023-12-23 18:51:09,892 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.81 vs. limit=22.5 2023-12-23 18:51:13,186 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.443e+01 3.676e+01 3.888e+01 4.062e+01 4.485e+01, threshold=7.775e+01, percent-clipped=0.0 2023-12-23 18:51:20,538 INFO [train.py:886] (1/4) Epoch 41, batch 1550, loss[loss=0.01211, audio_tagging_loss=0.01211, over 24750.00 frames. ], tot_loss[loss=0.01154, audio_tagging_loss=0.01154, over 4950327.73 frames. ], batch size: 99, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:51:24,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1281266.6666666667, ans=0.1 2023-12-23 18:51:34,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1281333.3333333333, ans=0.09899494936611666 2023-12-23 18:51:50,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1281400.0, ans=0.125 2023-12-23 18:51:58,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1281466.6666666667, ans=0.0 2023-12-23 18:51:58,594 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.65 vs. limit=15.0 2023-12-23 18:52:02,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1281533.3333333333, ans=0.1 2023-12-23 18:52:12,607 INFO [train.py:886] (1/4) Epoch 41, batch 1600, loss[loss=0.01363, audio_tagging_loss=0.01363, over 24750.00 frames. ], tot_loss[loss=0.0116, audio_tagging_loss=0.0116, over 4944057.50 frames. ], batch size: 99, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:52:37,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1281733.3333333333, ans=0.1 2023-12-23 18:52:40,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1281733.3333333333, ans=0.2 2023-12-23 18:52:45,055 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.59 vs. limit=15.0 2023-12-23 18:52:49,758 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.37 vs. limit=6.0 2023-12-23 18:52:56,611 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.336e+01 3.686e+01 3.838e+01 4.087e+01 6.862e+01, threshold=7.676e+01, percent-clipped=0.0 2023-12-23 18:52:58,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1281866.6666666667, ans=0.0 2023-12-23 18:53:03,980 INFO [train.py:886] (1/4) Epoch 41, batch 1650, loss[loss=0.01038, audio_tagging_loss=0.01038, over 24750.00 frames. ], tot_loss[loss=0.01153, audio_tagging_loss=0.01153, over 4945436.70 frames. ], batch size: 99, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:53:06,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1281933.3333333333, ans=0.1 2023-12-23 18:53:29,201 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.54 vs. limit=15.0 2023-12-23 18:53:39,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1282133.3333333333, ans=0.125 2023-12-23 18:53:52,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1282200.0, ans=0.125 2023-12-23 18:53:56,371 INFO [train.py:886] (1/4) Epoch 41, batch 1700, loss[loss=0.01184, audio_tagging_loss=0.01184, over 24750.00 frames. ], tot_loss[loss=0.01145, audio_tagging_loss=0.01145, over 4947079.65 frames. ], batch size: 99, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:53:56,985 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.87 vs. limit=12.0 2023-12-23 18:54:02,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1282266.6666666667, ans=0.1 2023-12-23 18:54:07,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1282333.3333333333, ans=0.125 2023-12-23 18:54:09,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.76 vs. limit=15.0 2023-12-23 18:54:27,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1282466.6666666667, ans=0.125 2023-12-23 18:54:29,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1282466.6666666667, ans=0.1 2023-12-23 18:54:34,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1282466.6666666667, ans=0.1 2023-12-23 18:54:40,557 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.322e+01 3.662e+01 3.806e+01 4.000e+01 4.786e+01, threshold=7.612e+01, percent-clipped=0.0 2023-12-23 18:54:48,042 INFO [train.py:886] (1/4) Epoch 41, batch 1750, loss[loss=0.01259, audio_tagging_loss=0.01259, over 25000.00 frames. ], tot_loss[loss=0.01141, audio_tagging_loss=0.01141, over 4949691.52 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:54:50,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1282600.0, ans=0.125 2023-12-23 18:54:54,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1282600.0, ans=0.125 2023-12-23 18:55:16,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1282733.3333333333, ans=0.125 2023-12-23 18:55:24,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1282800.0, ans=0.2 2023-12-23 18:55:37,471 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.54 vs. limit=22.5 2023-12-23 18:55:39,844 INFO [train.py:886] (1/4) Epoch 41, batch 1800, loss[loss=0.01236, audio_tagging_loss=0.01236, over 25000.00 frames. ], tot_loss[loss=0.01151, audio_tagging_loss=0.01151, over 4952898.62 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:55:50,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1283000.0, ans=0.0 2023-12-23 18:55:55,866 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.53 vs. limit=15.0 2023-12-23 18:56:07,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1283066.6666666667, ans=0.125 2023-12-23 18:56:17,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1283133.3333333333, ans=0.0 2023-12-23 18:56:23,835 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.191e+01 3.787e+01 3.915e+01 4.054e+01 5.277e+01, threshold=7.830e+01, percent-clipped=0.0 2023-12-23 18:56:30,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1283266.6666666667, ans=0.0 2023-12-23 18:56:31,243 INFO [train.py:886] (1/4) Epoch 41, batch 1850, loss[loss=0.01249, audio_tagging_loss=0.01249, over 25000.00 frames. ], tot_loss[loss=0.01155, audio_tagging_loss=0.01155, over 4958070.11 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:56:39,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1283266.6666666667, ans=0.125 2023-12-23 18:56:47,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1283333.3333333333, ans=0.0 2023-12-23 18:57:02,778 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.22 vs. limit=15.0 2023-12-23 18:57:12,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1283533.3333333333, ans=0.95 2023-12-23 18:57:14,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1283533.3333333333, ans=10.0 2023-12-23 18:57:16,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1283533.3333333333, ans=0.0 2023-12-23 18:57:22,454 INFO [train.py:886] (1/4) Epoch 41, batch 1900, loss[loss=0.009058, audio_tagging_loss=0.009058, over 24037.00 frames. ], tot_loss[loss=0.01159, audio_tagging_loss=0.01159, over 4944557.06 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:57:23,022 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.90 vs. limit=15.0 2023-12-23 18:57:23,914 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.88 vs. limit=10.0 2023-12-23 18:57:37,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1283666.6666666667, ans=0.125 2023-12-23 18:57:37,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1283666.6666666667, ans=0.0 2023-12-23 18:57:37,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1283666.6666666667, ans=0.2 2023-12-23 18:57:46,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1283733.3333333333, ans=0.1 2023-12-23 18:57:50,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1283733.3333333333, ans=0.125 2023-12-23 18:57:51,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1283733.3333333333, ans=0.125 2023-12-23 18:57:57,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1283800.0, ans=0.1 2023-12-23 18:58:01,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1283800.0, ans=0.2 2023-12-23 18:58:06,400 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.02 vs. limit=15.0 2023-12-23 18:58:06,576 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.411e+01 3.762e+01 3.907e+01 4.041e+01 4.562e+01, threshold=7.814e+01, percent-clipped=0.0 2023-12-23 18:58:13,901 INFO [train.py:886] (1/4) Epoch 41, batch 1950, loss[loss=0.0111, audio_tagging_loss=0.0111, over 24750.00 frames. ], tot_loss[loss=0.01154, audio_tagging_loss=0.01154, over 4946814.55 frames. ], batch size: 99, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:58:23,403 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.87 vs. limit=22.5 2023-12-23 18:58:31,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1284000.0, ans=0.125 2023-12-23 18:58:35,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1284066.6666666667, ans=0.1 2023-12-23 18:58:42,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1284066.6666666667, ans=0.2 2023-12-23 18:59:00,284 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.11 vs. limit=22.5 2023-12-23 18:59:00,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1284200.0, ans=0.0 2023-12-23 18:59:01,070 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.58 vs. limit=15.0 2023-12-23 18:59:06,063 INFO [train.py:886] (1/4) Epoch 41, batch 2000, loss[loss=0.01196, audio_tagging_loss=0.01196, over 24750.00 frames. ], tot_loss[loss=0.01152, audio_tagging_loss=0.01152, over 4950264.91 frames. ], batch size: 99, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:59:15,670 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.95 vs. limit=15.0 2023-12-23 18:59:50,425 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.337e+01 3.662e+01 3.860e+01 4.077e+01 4.836e+01, threshold=7.721e+01, percent-clipped=0.0 2023-12-23 18:59:56,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1284600.0, ans=0.125 2023-12-23 18:59:57,751 INFO [train.py:886] (1/4) Epoch 41, batch 2050, loss[loss=0.01096, audio_tagging_loss=0.01096, over 25000.00 frames. ], tot_loss[loss=0.01145, audio_tagging_loss=0.01145, over 4954908.01 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 64.0 2023-12-23 18:59:57,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1284600.0, ans=0.125 2023-12-23 19:00:03,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1284600.0, ans=0.125 2023-12-23 19:00:21,017 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.78 vs. limit=22.5 2023-12-23 19:00:24,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1284733.3333333333, ans=0.125 2023-12-23 19:00:49,183 INFO [train.py:886] (1/4) Epoch 41, batch 2100, loss[loss=0.009808, audio_tagging_loss=0.009808, over 25000.00 frames. ], tot_loss[loss=0.01143, audio_tagging_loss=0.01143, over 4960692.89 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 64.0 2023-12-23 19:00:49,755 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.08 vs. limit=12.0 2023-12-23 19:00:55,108 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.69 vs. limit=15.0 2023-12-23 19:01:02,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1285000.0, ans=0.0 2023-12-23 19:01:18,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1285066.6666666667, ans=0.125 2023-12-23 19:01:28,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1285133.3333333333, ans=0.1 2023-12-23 19:01:31,263 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.33 vs. limit=5.0 2023-12-23 19:01:34,049 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.417e+01 3.671e+01 3.827e+01 4.014e+01 4.652e+01, threshold=7.654e+01, percent-clipped=0.0 2023-12-23 19:01:41,387 INFO [train.py:886] (1/4) Epoch 41, batch 2150, loss[loss=0.01367, audio_tagging_loss=0.01367, over 24948.00 frames. ], tot_loss[loss=0.01144, audio_tagging_loss=0.01144, over 4957060.80 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 64.0 2023-12-23 19:01:41,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1285266.6666666667, ans=0.04949747468305833 2023-12-23 19:01:53,083 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2023-12-23 19:02:03,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1285400.0, ans=0.125 2023-12-23 19:02:04,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1285400.0, ans=0.04949747468305833 2023-12-23 19:02:04,403 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.58 vs. limit=15.0 2023-12-23 19:02:10,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1285400.0, ans=0.125 2023-12-23 19:02:28,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1285533.3333333333, ans=0.125 2023-12-23 19:02:33,080 INFO [train.py:886] (1/4) Epoch 41, batch 2200, loss[loss=0.01316, audio_tagging_loss=0.01316, over 24750.00 frames. ], tot_loss[loss=0.01142, audio_tagging_loss=0.01142, over 4948941.41 frames. ], batch size: 99, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:02:42,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1285600.0, ans=0.125 2023-12-23 19:02:44,352 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.64 vs. limit=22.5 2023-12-23 19:02:59,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1285733.3333333333, ans=0.125 2023-12-23 19:03:10,829 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:03:18,843 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.225e+01 3.764e+01 3.888e+01 4.003e+01 5.031e+01, threshold=7.777e+01, percent-clipped=0.0 2023-12-23 19:03:19,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1285866.6666666667, ans=0.125 2023-12-23 19:03:22,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1285866.6666666667, ans=0.1 2023-12-23 19:03:25,260 INFO [train.py:886] (1/4) Epoch 41, batch 2250, loss[loss=0.01309, audio_tagging_loss=0.01309, over 25000.00 frames. ], tot_loss[loss=0.01153, audio_tagging_loss=0.01153, over 4947469.06 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:03:25,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1285933.3333333333, ans=0.1 2023-12-23 19:03:37,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1286000.0, ans=0.125 2023-12-23 19:03:43,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=1286000.0, ans=0.025 2023-12-23 19:03:48,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1286066.6666666667, ans=0.125 2023-12-23 19:04:05,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1286200.0, ans=0.04949747468305833 2023-12-23 19:04:16,988 INFO [train.py:886] (1/4) Epoch 41, batch 2300, loss[loss=0.007987, audio_tagging_loss=0.007987, over 25000.00 frames. ], tot_loss[loss=0.0115, audio_tagging_loss=0.0115, over 4951746.09 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:04:20,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1286266.6666666667, ans=0.035 2023-12-23 19:04:20,335 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.00 vs. limit=10.0 2023-12-23 19:04:41,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1286400.0, ans=0.125 2023-12-23 19:04:44,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1286400.0, ans=0.2 2023-12-23 19:04:45,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.14 vs. limit=15.0 2023-12-23 19:05:02,014 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.305e+01 3.676e+01 3.827e+01 3.947e+01 4.404e+01, threshold=7.653e+01, percent-clipped=0.0 2023-12-23 19:05:07,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1286600.0, ans=0.2 2023-12-23 19:05:08,343 INFO [train.py:886] (1/4) Epoch 41, batch 2350, loss[loss=0.01003, audio_tagging_loss=0.01003, over 24750.00 frames. ], tot_loss[loss=0.01142, audio_tagging_loss=0.01142, over 4951540.53 frames. ], batch size: 99, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:05:26,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1286666.6666666667, ans=0.05 2023-12-23 19:06:00,417 INFO [train.py:886] (1/4) Epoch 41, batch 2400, loss[loss=0.01107, audio_tagging_loss=0.01107, over 25000.00 frames. ], tot_loss[loss=0.01139, audio_tagging_loss=0.01139, over 4954291.53 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:06:01,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1286933.3333333333, ans=0.125 2023-12-23 19:06:46,639 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.257e+01 3.616e+01 3.787e+01 3.992e+01 4.640e+01, threshold=7.573e+01, percent-clipped=0.0 2023-12-23 19:06:52,483 INFO [train.py:886] (1/4) Epoch 41, batch 2450, loss[loss=0.01418, audio_tagging_loss=0.01418, over 25000.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4958146.81 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:07:00,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1287266.6666666667, ans=0.0 2023-12-23 19:07:11,164 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.50 vs. limit=15.0 2023-12-23 19:07:11,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1287333.3333333333, ans=0.1 2023-12-23 19:07:18,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1287400.0, ans=0.0 2023-12-23 19:07:26,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1287466.6666666667, ans=0.125 2023-12-23 19:07:32,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1287466.6666666667, ans=0.2 2023-12-23 19:07:32,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1287533.3333333333, ans=0.125 2023-12-23 19:07:43,128 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.41 vs. limit=15.0 2023-12-23 19:07:44,512 INFO [train.py:886] (1/4) Epoch 41, batch 2500, loss[loss=0.01026, audio_tagging_loss=0.01026, over 24750.00 frames. ], tot_loss[loss=0.01145, audio_tagging_loss=0.01145, over 4960463.22 frames. ], batch size: 99, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:07:44,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1287600.0, ans=0.1 2023-12-23 19:07:49,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1287600.0, ans=0.0 2023-12-23 19:07:57,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1287666.6666666667, ans=0.125 2023-12-23 19:08:05,805 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:08:09,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1287733.3333333333, ans=0.0 2023-12-23 19:08:21,551 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.52 vs. limit=10.0 2023-12-23 19:08:29,632 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.365e+01 3.727e+01 3.879e+01 4.092e+01 4.648e+01, threshold=7.757e+01, percent-clipped=0.0 2023-12-23 19:08:36,082 INFO [train.py:886] (1/4) Epoch 41, batch 2550, loss[loss=0.0105, audio_tagging_loss=0.0105, over 25000.00 frames. ], tot_loss[loss=0.01152, audio_tagging_loss=0.01152, over 4955581.03 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:08:36,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1287933.3333333333, ans=0.125 2023-12-23 19:09:20,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1288200.0, ans=0.1 2023-12-23 19:09:26,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1288200.0, ans=0.125 2023-12-23 19:09:28,075 INFO [train.py:886] (1/4) Epoch 41, batch 2600, loss[loss=0.01216, audio_tagging_loss=0.01216, over 25000.00 frames. ], tot_loss[loss=0.01149, audio_tagging_loss=0.01149, over 4955138.66 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:09:45,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1288333.3333333333, ans=0.1 2023-12-23 19:09:48,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1288400.0, ans=0.0 2023-12-23 19:10:04,673 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.02 vs. limit=8.0 2023-12-23 19:10:09,841 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.06 vs. limit=10.0 2023-12-23 19:10:13,062 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.409e+01 3.737e+01 3.877e+01 4.049e+01 5.026e+01, threshold=7.754e+01, percent-clipped=0.0 2023-12-23 19:10:18,165 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.90 vs. limit=22.5 2023-12-23 19:10:20,181 INFO [train.py:886] (1/4) Epoch 41, batch 2650, loss[loss=0.01318, audio_tagging_loss=0.01318, over 24750.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4954277.40 frames. ], batch size: 99, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:10:21,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1288600.0, ans=0.125 2023-12-23 19:10:32,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1288666.6666666667, ans=0.0 2023-12-23 19:10:39,567 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.50 vs. limit=6.0 2023-12-23 19:10:42,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1288733.3333333333, ans=0.0 2023-12-23 19:10:49,710 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.67 vs. limit=22.5 2023-12-23 19:10:56,125 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.16 vs. limit=6.0 2023-12-23 19:11:11,316 INFO [train.py:886] (1/4) Epoch 41, batch 2700, loss[loss=0.01153, audio_tagging_loss=0.01153, over 25000.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4952575.85 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:11:11,771 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.57 vs. limit=12.0 2023-12-23 19:11:16,370 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.63 vs. limit=22.5 2023-12-23 19:11:16,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1288933.3333333333, ans=0.1 2023-12-23 19:11:29,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1289000.0, ans=0.2 2023-12-23 19:11:35,997 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.37 vs. limit=15.0 2023-12-23 19:11:56,743 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.268e+01 3.652e+01 3.827e+01 3.975e+01 4.292e+01, threshold=7.655e+01, percent-clipped=0.0 2023-12-23 19:12:03,183 INFO [train.py:886] (1/4) Epoch 41, batch 2750, loss[loss=0.01038, audio_tagging_loss=0.01038, over 24750.00 frames. ], tot_loss[loss=0.01142, audio_tagging_loss=0.01142, over 4948092.59 frames. ], batch size: 99, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:12:18,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1289333.3333333333, ans=0.125 2023-12-23 19:12:18,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1289333.3333333333, ans=0.0 2023-12-23 19:12:20,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=1289333.3333333333, ans=15.0 2023-12-23 19:12:31,574 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=15.0 2023-12-23 19:12:36,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1289466.6666666667, ans=0.125 2023-12-23 19:12:37,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1289466.6666666667, ans=0.0 2023-12-23 19:12:51,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1289533.3333333333, ans=0.5 2023-12-23 19:12:55,084 INFO [train.py:886] (1/4) Epoch 41, batch 2800, loss[loss=0.01272, audio_tagging_loss=0.01272, over 24750.00 frames. ], tot_loss[loss=0.01156, audio_tagging_loss=0.01156, over 4952307.07 frames. ], batch size: 99, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:12:58,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1289600.0, ans=0.125 2023-12-23 19:13:32,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1289800.0, ans=0.0 2023-12-23 19:13:41,859 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.413e+01 3.762e+01 3.896e+01 4.063e+01 4.589e+01, threshold=7.791e+01, percent-clipped=0.0 2023-12-23 19:13:47,607 INFO [train.py:886] (1/4) Epoch 41, batch 2850, loss[loss=0.01285, audio_tagging_loss=0.01285, over 24750.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4949555.71 frames. ], batch size: 99, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:14:04,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1290000.0, ans=0.09899494936611666 2023-12-23 19:14:13,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1290066.6666666667, ans=0.125 2023-12-23 19:14:14,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=1290066.6666666667, ans=0.1 2023-12-23 19:14:27,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1290133.3333333333, ans=0.0 2023-12-23 19:14:27,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.52 vs. limit=6.0 2023-12-23 19:14:28,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1290200.0, ans=0.1 2023-12-23 19:14:31,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1290200.0, ans=0.0 2023-12-23 19:14:35,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1290200.0, ans=0.125 2023-12-23 19:14:38,693 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.04 vs. limit=15.0 2023-12-23 19:14:39,149 INFO [train.py:886] (1/4) Epoch 41, batch 2900, loss[loss=0.01073, audio_tagging_loss=0.01073, over 24750.00 frames. ], tot_loss[loss=0.01153, audio_tagging_loss=0.01153, over 4949109.01 frames. ], batch size: 99, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:14:55,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1290333.3333333333, ans=0.2 2023-12-23 19:15:01,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1290400.0, ans=0.125 2023-12-23 19:15:05,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1290400.0, ans=0.0 2023-12-23 19:15:15,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1290466.6666666667, ans=0.125 2023-12-23 19:15:19,581 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:15:24,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1290533.3333333333, ans=0.125 2023-12-23 19:15:24,928 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.234e+01 3.667e+01 3.837e+01 4.047e+01 4.824e+01, threshold=7.673e+01, percent-clipped=0.0 2023-12-23 19:15:28,303 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.38 vs. limit=15.0 2023-12-23 19:15:31,333 INFO [train.py:886] (1/4) Epoch 41, batch 2950, loss[loss=0.01266, audio_tagging_loss=0.01266, over 25000.00 frames. ], tot_loss[loss=0.01145, audio_tagging_loss=0.01145, over 4954213.92 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:15:41,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1290666.6666666667, ans=0.2 2023-12-23 19:15:45,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1290666.6666666667, ans=0.0 2023-12-23 19:15:54,746 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.79 vs. limit=12.0 2023-12-23 19:15:58,463 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:16:00,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1290733.3333333333, ans=0.1 2023-12-23 19:16:23,713 INFO [train.py:886] (1/4) Epoch 41, batch 3000, loss[loss=0.0122, audio_tagging_loss=0.0122, over 25000.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4956120.88 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:16:23,714 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 19:16:44,999 INFO [train.py:917] (1/4) Epoch 41, validation: loss=0.03524, audio_tagging_loss=0.03524, over 3737520.00 frames. 2023-12-23 19:16:44,999 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 19:17:06,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1291066.6666666667, ans=0.125 2023-12-23 19:17:11,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=1291066.6666666667, ans=0.05 2023-12-23 19:17:15,622 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.67 vs. limit=10.0 2023-12-23 19:17:26,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1291200.0, ans=0.125 2023-12-23 19:17:29,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1291200.0, ans=0.0 2023-12-23 19:17:30,599 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.333e+01 3.633e+01 3.842e+01 3.990e+01 4.593e+01, threshold=7.683e+01, percent-clipped=0.0 2023-12-23 19:17:37,029 INFO [train.py:886] (1/4) Epoch 41, batch 3050, loss[loss=0.0109, audio_tagging_loss=0.0109, over 25000.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4956671.79 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:17:53,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1291333.3333333333, ans=0.0 2023-12-23 19:18:11,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1291466.6666666667, ans=0.125 2023-12-23 19:18:15,227 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.65 vs. limit=22.5 2023-12-23 19:18:28,509 INFO [train.py:886] (1/4) Epoch 41, batch 3100, loss[loss=0.01204, audio_tagging_loss=0.01204, over 24750.00 frames. ], tot_loss[loss=0.01142, audio_tagging_loss=0.01142, over 4960091.90 frames. ], batch size: 99, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:18:33,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1291600.0, ans=0.125 2023-12-23 19:18:51,627 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.42 vs. limit=15.0 2023-12-23 19:19:01,006 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.34 vs. limit=22.5 2023-12-23 19:19:14,104 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.432e+01 3.751e+01 3.878e+01 4.025e+01 4.905e+01, threshold=7.756e+01, percent-clipped=0.0 2023-12-23 19:19:19,792 INFO [train.py:886] (1/4) Epoch 41, batch 3150, loss[loss=0.01017, audio_tagging_loss=0.01017, over 24030.00 frames. ], tot_loss[loss=0.01141, audio_tagging_loss=0.01141, over 4946985.42 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:19:28,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1291933.3333333333, ans=0.0 2023-12-23 19:19:39,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1292000.0, ans=0.125 2023-12-23 19:19:40,504 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.49 vs. limit=15.0 2023-12-23 19:19:41,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1292066.6666666667, ans=0.125 2023-12-23 19:19:42,291 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:19:45,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1292066.6666666667, ans=0.2 2023-12-23 19:19:45,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1292066.6666666667, ans=0.025 2023-12-23 19:19:55,885 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:20:02,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1292200.0, ans=0.125 2023-12-23 19:20:12,138 INFO [train.py:886] (1/4) Epoch 41, batch 3200, loss[loss=0.01193, audio_tagging_loss=0.01193, over 21625.00 frames. ], tot_loss[loss=0.01139, audio_tagging_loss=0.01139, over 4942237.52 frames. ], batch size: 107, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:20:14,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1292266.6666666667, ans=10.0 2023-12-23 19:20:18,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1292266.6666666667, ans=0.125 2023-12-23 19:20:25,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1292333.3333333333, ans=0.125 2023-12-23 19:20:26,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1292333.3333333333, ans=0.125 2023-12-23 19:20:36,326 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.41 vs. limit=12.0 2023-12-23 19:20:38,993 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:20:40,413 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.04 vs. limit=12.0 2023-12-23 19:20:43,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1292466.6666666667, ans=0.125 2023-12-23 19:20:53,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1292533.3333333333, ans=0.1 2023-12-23 19:20:57,348 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.348e+01 3.736e+01 3.877e+01 4.151e+01 5.106e+01, threshold=7.754e+01, percent-clipped=0.0 2023-12-23 19:21:04,473 INFO [train.py:886] (1/4) Epoch 41, batch 3250, loss[loss=0.01265, audio_tagging_loss=0.01265, over 25000.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4946023.12 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:21:16,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1292666.6666666667, ans=0.125 2023-12-23 19:21:17,071 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.35 vs. limit=15.0 2023-12-23 19:21:23,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1292666.6666666667, ans=0.0 2023-12-23 19:21:35,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1292800.0, ans=0.0 2023-12-23 19:21:56,101 INFO [train.py:886] (1/4) Epoch 41, batch 3300, loss[loss=0.01277, audio_tagging_loss=0.01277, over 21741.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4949253.88 frames. ], batch size: 107, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:21:59,503 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.78 vs. limit=15.0 2023-12-23 19:22:11,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1293000.0, ans=0.0 2023-12-23 19:22:18,536 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.82 vs. limit=15.0 2023-12-23 19:22:34,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1293133.3333333333, ans=0.2 2023-12-23 19:22:38,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1293200.0, ans=0.125 2023-12-23 19:22:41,790 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.319e+01 3.618e+01 3.799e+01 4.010e+01 4.611e+01, threshold=7.598e+01, percent-clipped=0.0 2023-12-23 19:22:42,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1293200.0, ans=0.125 2023-12-23 19:22:47,454 INFO [train.py:886] (1/4) Epoch 41, batch 3350, loss[loss=0.01093, audio_tagging_loss=0.01093, over 25000.00 frames. ], tot_loss[loss=0.01129, audio_tagging_loss=0.01129, over 4951601.41 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:22:53,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1293266.6666666667, ans=0.125 2023-12-23 19:22:58,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1293333.3333333333, ans=0.1 2023-12-23 19:23:00,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1293333.3333333333, ans=0.125 2023-12-23 19:23:00,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1293333.3333333333, ans=0.1 2023-12-23 19:23:03,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1293333.3333333333, ans=0.0 2023-12-23 19:23:05,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1293333.3333333333, ans=0.125 2023-12-23 19:23:17,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1293466.6666666667, ans=0.1 2023-12-23 19:23:39,114 INFO [train.py:886] (1/4) Epoch 41, batch 3400, loss[loss=0.01239, audio_tagging_loss=0.01239, over 25000.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4956363.83 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:23:39,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1293600.0, ans=0.1 2023-12-23 19:24:08,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1293733.3333333333, ans=0.125 2023-12-23 19:24:24,807 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.278e+01 3.727e+01 3.891e+01 4.043e+01 4.833e+01, threshold=7.782e+01, percent-clipped=0.0 2023-12-23 19:24:30,516 INFO [train.py:886] (1/4) Epoch 41, batch 3450, loss[loss=0.01264, audio_tagging_loss=0.01264, over 24750.00 frames. ], tot_loss[loss=0.01145, audio_tagging_loss=0.01145, over 4958278.76 frames. ], batch size: 99, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:24:34,527 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.70 vs. limit=22.5 2023-12-23 19:25:00,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1294133.3333333333, ans=0.0 2023-12-23 19:25:19,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1294200.0, ans=0.125 2023-12-23 19:25:23,498 INFO [train.py:886] (1/4) Epoch 41, batch 3500, loss[loss=0.01066, audio_tagging_loss=0.01066, over 25000.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4948179.82 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:25:35,392 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.15 vs. limit=12.0 2023-12-23 19:25:40,246 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:26:07,727 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.332e+01 3.699e+01 3.860e+01 4.048e+01 4.617e+01, threshold=7.721e+01, percent-clipped=0.0 2023-12-23 19:26:14,144 INFO [train.py:886] (1/4) Epoch 41, batch 3550, loss[loss=0.01128, audio_tagging_loss=0.01128, over 24750.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4947546.76 frames. ], batch size: 99, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:26:15,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1294600.0, ans=0.1 2023-12-23 19:26:17,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1294600.0, ans=0.2 2023-12-23 19:26:24,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1294666.6666666667, ans=0.125 2023-12-23 19:26:57,077 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.39 vs. limit=5.0 2023-12-23 19:27:05,729 INFO [train.py:886] (1/4) Epoch 41, batch 3600, loss[loss=0.009647, audio_tagging_loss=0.009647, over 25000.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4943376.27 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:27:08,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1294933.3333333333, ans=0.2 2023-12-23 19:27:13,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1294933.3333333333, ans=0.1 2023-12-23 19:27:15,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=1295000.0, ans=0.02 2023-12-23 19:27:19,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1295000.0, ans=0.125 2023-12-23 19:27:25,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1295000.0, ans=0.125 2023-12-23 19:27:26,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1295000.0, ans=0.125 2023-12-23 19:27:27,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1295066.6666666667, ans=0.2 2023-12-23 19:27:49,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1295200.0, ans=0.1 2023-12-23 19:27:50,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1295200.0, ans=0.0 2023-12-23 19:27:51,208 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.401e+01 3.671e+01 3.840e+01 4.001e+01 4.394e+01, threshold=7.680e+01, percent-clipped=0.0 2023-12-23 19:27:58,372 INFO [train.py:886] (1/4) Epoch 41, batch 3650, loss[loss=0.01189, audio_tagging_loss=0.01189, over 25000.00 frames. ], tot_loss[loss=0.01126, audio_tagging_loss=0.01126, over 4944904.67 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:28:08,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1295333.3333333333, ans=0.0 2023-12-23 19:28:13,179 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.19 vs. limit=15.0 2023-12-23 19:28:18,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1295400.0, ans=0.125 2023-12-23 19:28:22,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1295400.0, ans=0.0 2023-12-23 19:28:28,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1295466.6666666667, ans=0.125 2023-12-23 19:28:47,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1295600.0, ans=0.125 2023-12-23 19:28:47,974 INFO [train.py:886] (1/4) Epoch 41, batch 3700, loss[loss=0.01184, audio_tagging_loss=0.01184, over 25000.00 frames. ], tot_loss[loss=0.01127, audio_tagging_loss=0.01127, over 4951903.48 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:29:02,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1295666.6666666667, ans=0.0 2023-12-23 19:29:06,070 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:29:22,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1295800.0, ans=0.125 2023-12-23 19:29:23,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1295800.0, ans=0.2 2023-12-23 19:29:26,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1295800.0, ans=0.0 2023-12-23 19:29:28,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1295866.6666666667, ans=0.1 2023-12-23 19:29:35,184 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.261e+01 3.680e+01 3.875e+01 4.029e+01 4.590e+01, threshold=7.750e+01, percent-clipped=0.0 2023-12-23 19:29:40,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1295933.3333333333, ans=0.125 2023-12-23 19:29:40,937 INFO [train.py:886] (1/4) Epoch 41, batch 3750, loss[loss=0.01232, audio_tagging_loss=0.01232, over 25000.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4950557.99 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:29:50,838 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:30:10,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1296133.3333333333, ans=0.0 2023-12-23 19:30:17,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1296133.3333333333, ans=0.125 2023-12-23 19:30:17,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1296133.3333333333, ans=0.2 2023-12-23 19:30:28,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1296200.0, ans=0.0 2023-12-23 19:30:28,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1296200.0, ans=0.1 2023-12-23 19:30:30,890 INFO [train.py:886] (1/4) Epoch 41, batch 3800, loss[loss=0.01212, audio_tagging_loss=0.01212, over 25000.00 frames. ], tot_loss[loss=0.01143, audio_tagging_loss=0.01143, over 4949922.41 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:30:32,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1296266.6666666667, ans=0.1 2023-12-23 19:30:37,378 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.08 vs. limit=15.0 2023-12-23 19:31:14,832 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.38 vs. limit=15.0 2023-12-23 19:31:17,209 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.336e+01 3.684e+01 3.876e+01 4.085e+01 4.684e+01, threshold=7.752e+01, percent-clipped=0.0 2023-12-23 19:31:23,012 INFO [train.py:886] (1/4) Epoch 41, batch 3850, loss[loss=0.008606, audio_tagging_loss=0.008606, over 25000.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4953721.43 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:31:34,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1296666.6666666667, ans=0.1 2023-12-23 19:31:39,101 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.32 vs. limit=10.0 2023-12-23 19:31:39,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1296666.6666666667, ans=0.0 2023-12-23 19:31:52,095 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.89 vs. limit=15.0 2023-12-23 19:32:02,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1296800.0, ans=0.125 2023-12-23 19:32:06,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1296866.6666666667, ans=0.2 2023-12-23 19:32:09,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1296866.6666666667, ans=10.0 2023-12-23 19:32:16,080 INFO [train.py:886] (1/4) Epoch 41, batch 3900, loss[loss=0.01147, audio_tagging_loss=0.01147, over 25000.00 frames. ], tot_loss[loss=0.01129, audio_tagging_loss=0.01129, over 4955402.92 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:32:22,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1296933.3333333333, ans=0.09899494936611666 2023-12-23 19:32:26,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1297000.0, ans=0.0 2023-12-23 19:32:33,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1297000.0, ans=0.1 2023-12-23 19:32:43,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1297066.6666666667, ans=0.0 2023-12-23 19:32:44,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1297066.6666666667, ans=0.125 2023-12-23 19:32:44,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1297066.6666666667, ans=0.2 2023-12-23 19:33:00,597 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.392e+01 3.712e+01 3.871e+01 3.981e+01 4.576e+01, threshold=7.742e+01, percent-clipped=0.0 2023-12-23 19:33:02,214 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.57 vs. limit=22.5 2023-12-23 19:33:07,020 INFO [train.py:886] (1/4) Epoch 41, batch 3950, loss[loss=0.01247, audio_tagging_loss=0.01247, over 24041.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4956896.54 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:33:08,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1297266.6666666667, ans=0.0 2023-12-23 19:33:17,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1297333.3333333333, ans=0.125 2023-12-23 19:33:19,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1297333.3333333333, ans=0.125 2023-12-23 19:33:52,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1297533.3333333333, ans=0.125 2023-12-23 19:33:53,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1297533.3333333333, ans=0.04949747468305833 2023-12-23 19:33:57,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1297533.3333333333, ans=0.0 2023-12-23 19:33:59,516 INFO [train.py:886] (1/4) Epoch 41, batch 4000, loss[loss=0.01321, audio_tagging_loss=0.01321, over 25000.00 frames. ], tot_loss[loss=0.01129, audio_tagging_loss=0.01129, over 4960184.09 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:34:02,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1297600.0, ans=0.125 2023-12-23 19:34:05,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1297600.0, ans=0.125 2023-12-23 19:34:05,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1297600.0, ans=0.125 2023-12-23 19:34:11,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1297666.6666666667, ans=0.07 2023-12-23 19:34:22,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1297733.3333333333, ans=0.125 2023-12-23 19:34:28,642 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.70 vs. limit=15.0 2023-12-23 19:34:30,491 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=3.87 vs. limit=15.0 2023-12-23 19:34:43,744 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2023-12-23 19:34:44,944 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.343e+01 3.739e+01 3.854e+01 4.037e+01 4.729e+01, threshold=7.708e+01, percent-clipped=0.0 2023-12-23 19:34:51,365 INFO [train.py:886] (1/4) Epoch 41, batch 4050, loss[loss=0.01048, audio_tagging_loss=0.01048, over 25000.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4961763.22 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:34:55,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1297933.3333333333, ans=0.0 2023-12-23 19:35:02,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1298000.0, ans=0.2 2023-12-23 19:35:43,285 INFO [train.py:886] (1/4) Epoch 41, batch 4100, loss[loss=0.0111, audio_tagging_loss=0.0111, over 24750.00 frames. ], tot_loss[loss=0.01141, audio_tagging_loss=0.01141, over 4954104.83 frames. ], batch size: 99, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:35:59,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1298333.3333333333, ans=0.015 2023-12-23 19:36:02,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1298333.3333333333, ans=0.07 2023-12-23 19:36:05,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1298400.0, ans=0.2 2023-12-23 19:36:16,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1298466.6666666667, ans=0.0 2023-12-23 19:36:27,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1298533.3333333333, ans=0.125 2023-12-23 19:36:29,705 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.286e+01 3.678e+01 3.896e+01 4.080e+01 4.675e+01, threshold=7.792e+01, percent-clipped=0.0 2023-12-23 19:36:35,440 INFO [train.py:886] (1/4) Epoch 41, batch 4150, loss[loss=0.011, audio_tagging_loss=0.011, over 24750.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4950272.01 frames. ], batch size: 99, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:36:58,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1298733.3333333333, ans=0.0 2023-12-23 19:37:01,408 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:37:02,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1298733.3333333333, ans=0.125 2023-12-23 19:37:03,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1298733.3333333333, ans=0.0 2023-12-23 19:37:06,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1298800.0, ans=0.0 2023-12-23 19:37:16,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1298866.6666666667, ans=0.2 2023-12-23 19:37:27,143 INFO [train.py:886] (1/4) Epoch 41, batch 4200, loss[loss=0.009697, audio_tagging_loss=0.009697, over 25000.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4949042.21 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 64.0 2023-12-23 19:37:29,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1298933.3333333333, ans=0.0 2023-12-23 19:37:39,300 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.13 vs. limit=15.0 2023-12-23 19:38:05,022 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.62 vs. limit=15.0 2023-12-23 19:38:12,739 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.418e+01 3.759e+01 3.866e+01 4.012e+01 4.707e+01, threshold=7.732e+01, percent-clipped=0.0 2023-12-23 19:38:19,218 INFO [train.py:886] (1/4) Epoch 41, batch 4250, loss[loss=0.01074, audio_tagging_loss=0.01074, over 25000.00 frames. ], tot_loss[loss=0.01135, audio_tagging_loss=0.01135, over 4954322.21 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 64.0 2023-12-23 19:38:20,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1299266.6666666667, ans=0.125 2023-12-23 19:38:46,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1299400.0, ans=0.1 2023-12-23 19:38:53,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1299466.6666666667, ans=0.125 2023-12-23 19:39:03,783 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.57 vs. limit=15.0 2023-12-23 19:39:04,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1299533.3333333333, ans=0.05 2023-12-23 19:39:11,495 INFO [train.py:886] (1/4) Epoch 41, batch 4300, loss[loss=0.0126, audio_tagging_loss=0.0126, over 25000.00 frames. ], tot_loss[loss=0.01133, audio_tagging_loss=0.01133, over 4957122.38 frames. ], batch size: 100, lr: 2.60e-03, grad_scale: 64.0 2023-12-23 19:39:12,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1299600.0, ans=0.0 2023-12-23 19:39:14,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1299600.0, ans=0.125 2023-12-23 19:39:16,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1299600.0, ans=0.125 2023-12-23 19:39:18,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1299600.0, ans=0.0 2023-12-23 19:39:20,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1299666.6666666667, ans=0.125 2023-12-23 19:39:26,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1299666.6666666667, ans=0.125 2023-12-23 19:39:26,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1299666.6666666667, ans=0.125 2023-12-23 19:39:39,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1299733.3333333333, ans=0.1 2023-12-23 19:39:45,254 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.21 vs. limit=15.0 2023-12-23 19:39:46,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.26 vs. limit=15.0 2023-12-23 19:39:55,933 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.398e+01 3.671e+01 3.818e+01 3.950e+01 4.671e+01, threshold=7.635e+01, percent-clipped=0.0 2023-12-23 19:40:02,285 INFO [train.py:886] (1/4) Epoch 41, batch 4350, loss[loss=0.009174, audio_tagging_loss=0.009174, over 25000.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4957933.54 frames. ], batch size: 100, lr: 2.60e-03, grad_scale: 64.0 2023-12-23 19:40:18,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1300000.0, ans=0.125 2023-12-23 19:40:18,460 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.95 vs. limit=15.0 2023-12-23 19:40:22,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1300066.6666666667, ans=0.05 2023-12-23 19:40:38,599 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.34 vs. limit=12.0 2023-12-23 19:40:43,045 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.32 vs. limit=15.0 2023-12-23 19:40:50,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1300200.0, ans=0.0 2023-12-23 19:40:53,764 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:40:54,469 INFO [train.py:886] (1/4) Epoch 41, batch 4400, loss[loss=0.01267, audio_tagging_loss=0.01267, over 24750.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4946888.76 frames. ], batch size: 99, lr: 2.60e-03, grad_scale: 64.0 2023-12-23 19:41:01,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1300266.6666666667, ans=0.1 2023-12-23 19:41:05,982 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.32 vs. limit=22.5 2023-12-23 19:41:16,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1300400.0, ans=0.05 2023-12-23 19:41:22,298 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.87 vs. limit=15.0 2023-12-23 19:41:29,664 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.46 vs. limit=6.0 2023-12-23 19:41:31,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1300466.6666666667, ans=0.125 2023-12-23 19:41:39,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1300533.3333333333, ans=0.0 2023-12-23 19:41:39,977 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.419e+01 3.802e+01 3.969e+01 4.169e+01 5.691e+01, threshold=7.937e+01, percent-clipped=0.0 2023-12-23 19:41:46,404 INFO [train.py:886] (1/4) Epoch 41, batch 4450, loss[loss=0.01052, audio_tagging_loss=0.01052, over 24750.00 frames. ], tot_loss[loss=0.01164, audio_tagging_loss=0.01164, over 4949890.19 frames. ], batch size: 99, lr: 2.60e-03, grad_scale: 64.0 2023-12-23 19:41:46,780 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.98 vs. limit=15.0 2023-12-23 19:42:04,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1300666.6666666667, ans=0.1 2023-12-23 19:42:26,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1300800.0, ans=0.125 2023-12-23 19:42:30,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1300866.6666666667, ans=0.1 2023-12-23 19:42:33,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1300866.6666666667, ans=0.125 2023-12-23 19:42:37,322 INFO [train.py:886] (1/4) Epoch 41, batch 4500, loss[loss=0.01105, audio_tagging_loss=0.01105, over 25000.00 frames. ], tot_loss[loss=0.01154, audio_tagging_loss=0.01154, over 4940559.09 frames. ], batch size: 100, lr: 2.60e-03, grad_scale: 64.0 2023-12-23 19:43:04,263 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:43:04,559 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.72 vs. limit=15.0 2023-12-23 19:43:08,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1301133.3333333333, ans=0.2 2023-12-23 19:43:10,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1301133.3333333333, ans=0.1 2023-12-23 19:43:24,736 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.397e+01 3.696e+01 3.844e+01 4.118e+01 4.848e+01, threshold=7.689e+01, percent-clipped=0.0 2023-12-23 19:43:27,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1301200.0, ans=0.125 2023-12-23 19:43:27,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1301200.0, ans=0.05 2023-12-23 19:43:27,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1301200.0, ans=0.125 2023-12-23 19:43:30,447 INFO [train.py:886] (1/4) Epoch 41, batch 4550, loss[loss=0.01008, audio_tagging_loss=0.01008, over 25000.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4950460.29 frames. ], batch size: 100, lr: 2.60e-03, grad_scale: 64.0 2023-12-23 19:43:43,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1301333.3333333333, ans=0.2 2023-12-23 19:43:48,049 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.39 vs. limit=6.0 2023-12-23 19:43:48,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1301333.3333333333, ans=0.1 2023-12-23 19:43:51,766 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.71 vs. limit=15.0 2023-12-23 19:44:21,576 INFO [train.py:886] (1/4) Epoch 41, batch 4600, loss[loss=0.01174, audio_tagging_loss=0.01174, over 24750.00 frames. ], tot_loss[loss=0.01142, audio_tagging_loss=0.01142, over 4956285.64 frames. ], batch size: 99, lr: 2.60e-03, grad_scale: 64.0 2023-12-23 19:44:21,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1301600.0, ans=0.1 2023-12-23 19:44:24,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1301600.0, ans=0.0 2023-12-23 19:44:32,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1301666.6666666667, ans=0.07 2023-12-23 19:44:43,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1301733.3333333333, ans=0.1 2023-12-23 19:44:50,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1301800.0, ans=0.125 2023-12-23 19:44:51,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1301800.0, ans=0.1 2023-12-23 19:44:53,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1301800.0, ans=0.125 2023-12-23 19:44:55,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.83 vs. limit=12.0 2023-12-23 19:45:04,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1301866.6666666667, ans=0.125 2023-12-23 19:45:08,421 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.368e+01 3.753e+01 3.899e+01 4.048e+01 4.729e+01, threshold=7.798e+01, percent-clipped=0.0 2023-12-23 19:45:12,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1301933.3333333333, ans=0.125 2023-12-23 19:45:12,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1301933.3333333333, ans=0.1 2023-12-23 19:45:13,163 INFO [train.py:886] (1/4) Epoch 41, batch 4650, loss[loss=0.01237, audio_tagging_loss=0.01237, over 25000.00 frames. ], tot_loss[loss=0.01142, audio_tagging_loss=0.01142, over 4964686.83 frames. ], batch size: 100, lr: 2.60e-03, grad_scale: 32.0 2023-12-23 19:45:22,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1302000.0, ans=0.0 2023-12-23 19:45:26,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1302000.0, ans=0.2 2023-12-23 19:45:30,475 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2023-12-23 19:45:31,980 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.51 vs. limit=15.0 2023-12-23 19:45:49,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=1302133.3333333333, ans=0.05 2023-12-23 19:45:52,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1302200.0, ans=0.125 2023-12-23 19:46:03,511 INFO [train.py:886] (1/4) Epoch 41, batch 4700, loss[loss=0.01154, audio_tagging_loss=0.01154, over 24750.00 frames. ], tot_loss[loss=0.0115, audio_tagging_loss=0.0115, over 4961087.28 frames. ], batch size: 99, lr: 2.60e-03, grad_scale: 32.0 2023-12-23 19:46:06,021 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.38 vs. limit=5.0 2023-12-23 19:46:34,112 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.03 vs. limit=12.0 2023-12-23 19:46:38,493 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.04 vs. limit=15.0 2023-12-23 19:46:41,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1302533.3333333333, ans=0.125 2023-12-23 19:46:46,303 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.396e+01 3.793e+01 3.987e+01 4.159e+01 5.000e+01, threshold=7.973e+01, percent-clipped=0.0 2023-12-23 19:46:50,812 INFO [train.py:886] (1/4) Epoch 41, batch 4750, loss[loss=0.01062, audio_tagging_loss=0.01062, over 24750.00 frames. ], tot_loss[loss=0.01155, audio_tagging_loss=0.01155, over 4956041.79 frames. ], batch size: 99, lr: 2.60e-03, grad_scale: 32.0 2023-12-23 19:46:53,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1302600.0, ans=0.2 2023-12-23 19:46:53,954 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.43 vs. limit=15.0 2023-12-23 19:46:56,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1302600.0, ans=0.125 2023-12-23 19:47:02,162 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.39 vs. limit=15.0 2023-12-23 19:47:24,816 INFO [train.py:886] (1/4) Epoch 42, batch 0, loss[loss=0.02475, audio_tagging_loss=0.02475, over 24012.00 frames. ], tot_loss[loss=0.02475, audio_tagging_loss=0.02475, over 24012.00 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 32.0 2023-12-23 19:47:24,817 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 19:47:45,226 INFO [train.py:917] (1/4) Epoch 42, validation: loss=0.03462, audio_tagging_loss=0.03462, over 3737520.00 frames. 2023-12-23 19:47:45,226 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 19:47:46,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1302706.6666666667, ans=0.1 2023-12-23 19:47:54,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1302773.3333333333, ans=0.125 2023-12-23 19:47:58,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1302773.3333333333, ans=0.1 2023-12-23 19:48:18,723 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.31 vs. limit=22.5 2023-12-23 19:48:23,260 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.25 vs. limit=22.5 2023-12-23 19:48:23,356 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.06 vs. limit=6.0 2023-12-23 19:48:34,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1302973.3333333333, ans=0.125 2023-12-23 19:48:37,184 INFO [train.py:886] (1/4) Epoch 42, batch 50, loss[loss=0.01551, audio_tagging_loss=0.01551, over 25000.00 frames. ], tot_loss[loss=0.01843, audio_tagging_loss=0.01843, over 1121035.61 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 16.0 2023-12-23 19:48:46,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1303106.6666666667, ans=0.125 2023-12-23 19:48:46,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1303106.6666666667, ans=0.0 2023-12-23 19:48:53,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1303106.6666666667, ans=0.125 2023-12-23 19:49:08,059 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.81 vs. limit=15.0 2023-12-23 19:49:08,455 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.767e+01 4.244e+01 4.772e+01 5.544e+01 1.220e+02, threshold=9.545e+01, percent-clipped=3.0 2023-12-23 19:49:09,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1303240.0, ans=0.1 2023-12-23 19:49:14,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1303240.0, ans=0.125 2023-12-23 19:49:28,207 INFO [train.py:886] (1/4) Epoch 42, batch 100, loss[loss=0.01236, audio_tagging_loss=0.01236, over 25000.00 frames. ], tot_loss[loss=0.01581, audio_tagging_loss=0.01581, over 1974129.24 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 16.0 2023-12-23 19:49:42,141 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.75 vs. limit=6.0 2023-12-23 19:49:57,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1303506.6666666667, ans=0.0 2023-12-23 19:50:08,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1303640.0, ans=0.125 2023-12-23 19:50:19,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1303706.6666666667, ans=0.125 2023-12-23 19:50:20,325 INFO [train.py:886] (1/4) Epoch 42, batch 150, loss[loss=0.0143, audio_tagging_loss=0.0143, over 25000.00 frames. ], tot_loss[loss=0.01436, audio_tagging_loss=0.01436, over 2638239.57 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 16.0 2023-12-23 19:50:22,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.82 vs. limit=22.5 2023-12-23 19:50:34,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1303773.3333333333, ans=0.1 2023-12-23 19:50:36,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1303773.3333333333, ans=0.125 2023-12-23 19:50:36,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1303773.3333333333, ans=0.0 2023-12-23 19:50:39,467 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.54 vs. limit=10.0 2023-12-23 19:50:46,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1303840.0, ans=0.125 2023-12-23 19:50:49,125 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.49 vs. limit=6.0 2023-12-23 19:50:49,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1303840.0, ans=0.0 2023-12-23 19:50:51,579 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.377e+01 3.833e+01 4.065e+01 4.320e+01 5.040e+01, threshold=8.130e+01, percent-clipped=0.0 2023-12-23 19:51:02,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1303973.3333333333, ans=0.2 2023-12-23 19:51:12,088 INFO [train.py:886] (1/4) Epoch 42, batch 200, loss[loss=0.01022, audio_tagging_loss=0.01022, over 25000.00 frames. ], tot_loss[loss=0.01349, audio_tagging_loss=0.01349, over 3154357.03 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 16.0 2023-12-23 19:51:25,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1304106.6666666667, ans=0.0 2023-12-23 19:51:52,261 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.04 vs. limit=15.0 2023-12-23 19:51:52,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1304306.6666666667, ans=0.125 2023-12-23 19:52:02,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1304373.3333333333, ans=0.0 2023-12-23 19:52:03,099 INFO [train.py:886] (1/4) Epoch 42, batch 250, loss[loss=0.01384, audio_tagging_loss=0.01384, over 24750.00 frames. ], tot_loss[loss=0.01309, audio_tagging_loss=0.01309, over 3555133.53 frames. ], batch size: 99, lr: 2.57e-03, grad_scale: 16.0 2023-12-23 19:52:05,417 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.77 vs. limit=15.0 2023-12-23 19:52:19,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1304440.0, ans=0.2 2023-12-23 19:52:20,952 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.89 vs. limit=6.0 2023-12-23 19:52:29,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1304506.6666666667, ans=0.1 2023-12-23 19:52:34,083 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.363e+01 3.769e+01 3.930e+01 4.156e+01 4.971e+01, threshold=7.859e+01, percent-clipped=0.0 2023-12-23 19:52:36,420 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.24 vs. limit=15.0 2023-12-23 19:52:41,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1304573.3333333333, ans=0.2 2023-12-23 19:52:54,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1304706.6666666667, ans=0.125 2023-12-23 19:52:55,102 INFO [train.py:886] (1/4) Epoch 42, batch 300, loss[loss=0.01162, audio_tagging_loss=0.01162, over 24750.00 frames. ], tot_loss[loss=0.01266, audio_tagging_loss=0.01266, over 3863639.41 frames. ], batch size: 99, lr: 2.57e-03, grad_scale: 16.0 2023-12-23 19:52:56,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1304706.6666666667, ans=0.125 2023-12-23 19:53:06,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1304773.3333333333, ans=0.0 2023-12-23 19:53:42,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1304973.3333333333, ans=0.125 2023-12-23 19:53:45,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1305040.0, ans=0.09899494936611666 2023-12-23 19:53:46,158 INFO [train.py:886] (1/4) Epoch 42, batch 350, loss[loss=0.01225, audio_tagging_loss=0.01225, over 24750.00 frames. ], tot_loss[loss=0.01251, audio_tagging_loss=0.01251, over 4104337.89 frames. ], batch size: 99, lr: 2.57e-03, grad_scale: 16.0 2023-12-23 19:54:08,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1305173.3333333333, ans=0.0 2023-12-23 19:54:17,294 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.289e+01 3.742e+01 3.933e+01 4.114e+01 4.736e+01, threshold=7.865e+01, percent-clipped=0.0 2023-12-23 19:54:19,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.64 vs. limit=22.5 2023-12-23 19:54:20,541 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.10 vs. limit=10.0 2023-12-23 19:54:31,532 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.09 vs. limit=15.0 2023-12-23 19:54:38,504 INFO [train.py:886] (1/4) Epoch 42, batch 400, loss[loss=0.01155, audio_tagging_loss=0.01155, over 25000.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 4294559.19 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 32.0 2023-12-23 19:54:45,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1305373.3333333333, ans=0.125 2023-12-23 19:55:13,424 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.05 vs. limit=15.0 2023-12-23 19:55:26,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1305640.0, ans=0.125 2023-12-23 19:55:30,177 INFO [train.py:886] (1/4) Epoch 42, batch 450, loss[loss=0.008424, audio_tagging_loss=0.008424, over 23978.00 frames. ], tot_loss[loss=0.01184, audio_tagging_loss=0.01184, over 4438982.00 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 32.0 2023-12-23 19:55:38,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1305706.6666666667, ans=0.125 2023-12-23 19:55:39,142 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.31 vs. limit=15.0 2023-12-23 19:55:39,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1305773.3333333333, ans=0.1 2023-12-23 19:55:43,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1305773.3333333333, ans=0.0 2023-12-23 19:55:47,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1305773.3333333333, ans=0.125 2023-12-23 19:55:53,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1305840.0, ans=0.125 2023-12-23 19:56:01,853 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.347e+01 3.725e+01 3.880e+01 4.088e+01 4.948e+01, threshold=7.759e+01, percent-clipped=0.0 2023-12-23 19:56:03,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1305906.6666666667, ans=0.125 2023-12-23 19:56:17,084 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.92 vs. limit=22.5 2023-12-23 19:56:19,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1305973.3333333333, ans=0.125 2023-12-23 19:56:19,939 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.74 vs. limit=12.0 2023-12-23 19:56:22,303 INFO [train.py:886] (1/4) Epoch 42, batch 500, loss[loss=0.009819, audio_tagging_loss=0.009819, over 22028.00 frames. ], tot_loss[loss=0.01166, audio_tagging_loss=0.01166, over 4555151.23 frames. ], batch size: 107, lr: 2.57e-03, grad_scale: 32.0 2023-12-23 19:56:24,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1306040.0, ans=0.2 2023-12-23 19:56:28,935 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:56:34,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1306106.6666666667, ans=0.125 2023-12-23 19:56:44,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1306173.3333333333, ans=0.0 2023-12-23 19:56:52,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1306240.0, ans=0.0 2023-12-23 19:56:54,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1306240.0, ans=0.125 2023-12-23 19:56:56,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1306240.0, ans=0.125 2023-12-23 19:56:56,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1306240.0, ans=0.125 2023-12-23 19:57:15,174 INFO [train.py:886] (1/4) Epoch 42, batch 550, loss[loss=0.01196, audio_tagging_loss=0.01196, over 25000.00 frames. ], tot_loss[loss=0.0116, audio_tagging_loss=0.0116, over 4649176.96 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 32.0 2023-12-23 19:57:46,367 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.310e+01 3.673e+01 3.825e+01 3.941e+01 4.727e+01, threshold=7.651e+01, percent-clipped=0.0 2023-12-23 19:57:54,419 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:58:03,961 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.16 vs. limit=10.0 2023-12-23 19:58:09,080 INFO [train.py:886] (1/4) Epoch 42, batch 600, loss[loss=0.0126, audio_tagging_loss=0.0126, over 24042.00 frames. ], tot_loss[loss=0.01167, audio_tagging_loss=0.01167, over 4717890.42 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 32.0 2023-12-23 19:58:09,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1306706.6666666667, ans=0.0 2023-12-23 19:58:13,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1306706.6666666667, ans=0.09899494936611666 2023-12-23 19:58:16,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1306706.6666666667, ans=0.05 2023-12-23 19:58:28,843 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.45 vs. limit=15.0 2023-12-23 19:58:37,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1306840.0, ans=0.2 2023-12-23 19:58:40,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1306906.6666666667, ans=0.2 2023-12-23 19:58:46,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1306906.6666666667, ans=0.125 2023-12-23 19:59:00,397 INFO [train.py:886] (1/4) Epoch 42, batch 650, loss[loss=0.01219, audio_tagging_loss=0.01219, over 22317.00 frames. ], tot_loss[loss=0.01172, audio_tagging_loss=0.01172, over 4763659.35 frames. ], batch size: 107, lr: 2.57e-03, grad_scale: 32.0 2023-12-23 19:59:20,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1307106.6666666667, ans=0.1 2023-12-23 19:59:25,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1307173.3333333333, ans=0.0 2023-12-23 19:59:30,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1307173.3333333333, ans=0.125 2023-12-23 19:59:31,743 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.438e+01 3.758e+01 3.912e+01 4.116e+01 4.641e+01, threshold=7.823e+01, percent-clipped=0.0 2023-12-23 19:59:51,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1307306.6666666667, ans=0.2 2023-12-23 19:59:53,704 INFO [train.py:886] (1/4) Epoch 42, batch 700, loss[loss=0.0114, audio_tagging_loss=0.0114, over 25000.00 frames. ], tot_loss[loss=0.01159, audio_tagging_loss=0.01159, over 4801945.99 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 32.0 2023-12-23 19:59:56,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1307373.3333333333, ans=0.125 2023-12-23 20:00:03,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1307440.0, ans=0.125 2023-12-23 20:00:18,256 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.19 vs. limit=15.0 2023-12-23 20:00:26,641 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.15 vs. limit=22.5 2023-12-23 20:00:28,275 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.75 vs. limit=15.0 2023-12-23 20:00:34,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1307640.0, ans=0.025 2023-12-23 20:00:41,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1307640.0, ans=0.0 2023-12-23 20:00:45,221 INFO [train.py:886] (1/4) Epoch 42, batch 750, loss[loss=0.01046, audio_tagging_loss=0.01046, over 24750.00 frames. ], tot_loss[loss=0.01154, audio_tagging_loss=0.01154, over 4840755.18 frames. ], batch size: 99, lr: 2.57e-03, grad_scale: 32.0 2023-12-23 20:00:51,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1307706.6666666667, ans=0.0 2023-12-23 20:00:59,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1307773.3333333333, ans=0.125 2023-12-23 20:01:16,355 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.283e+01 3.690e+01 3.847e+01 4.054e+01 6.024e+01, threshold=7.694e+01, percent-clipped=0.0 2023-12-23 20:01:28,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1307973.3333333333, ans=0.1 2023-12-23 20:01:34,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1307973.3333333333, ans=0.1 2023-12-23 20:01:37,183 INFO [train.py:886] (1/4) Epoch 42, batch 800, loss[loss=0.0103, audio_tagging_loss=0.0103, over 25000.00 frames. ], tot_loss[loss=0.01143, audio_tagging_loss=0.01143, over 4868744.29 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:01:39,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1308040.0, ans=0.2 2023-12-23 20:01:46,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1308106.6666666667, ans=0.0 2023-12-23 20:01:46,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1308106.6666666667, ans=0.125 2023-12-23 20:01:46,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1308106.6666666667, ans=0.125 2023-12-23 20:02:01,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1308173.3333333333, ans=0.2 2023-12-23 20:02:08,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1308240.0, ans=0.1 2023-12-23 20:02:19,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1308306.6666666667, ans=0.125 2023-12-23 20:02:23,237 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.68 vs. limit=15.0 2023-12-23 20:02:27,254 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.82 vs. limit=6.0 2023-12-23 20:02:27,616 INFO [train.py:886] (1/4) Epoch 42, batch 850, loss[loss=0.01116, audio_tagging_loss=0.01116, over 25000.00 frames. ], tot_loss[loss=0.01149, audio_tagging_loss=0.01149, over 4896761.92 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:02:29,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1308373.3333333333, ans=0.0 2023-12-23 20:02:38,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1308440.0, ans=0.1 2023-12-23 20:02:49,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1308506.6666666667, ans=0.0 2023-12-23 20:02:58,922 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.387e+01 3.709e+01 3.857e+01 4.015e+01 4.625e+01, threshold=7.713e+01, percent-clipped=0.0 2023-12-23 20:03:06,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1308573.3333333333, ans=0.125 2023-12-23 20:03:07,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1308573.3333333333, ans=0.125 2023-12-23 20:03:08,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1308573.3333333333, ans=0.125 2023-12-23 20:03:18,033 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.14 vs. limit=10.0 2023-12-23 20:03:19,476 INFO [train.py:886] (1/4) Epoch 42, batch 900, loss[loss=0.01269, audio_tagging_loss=0.01269, over 24750.00 frames. ], tot_loss[loss=0.0115, audio_tagging_loss=0.0115, over 4906521.75 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:03:19,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1308706.6666666667, ans=0.125 2023-12-23 20:03:28,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1308706.6666666667, ans=15.0 2023-12-23 20:03:38,999 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.48 vs. limit=15.0 2023-12-23 20:03:51,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1308906.6666666667, ans=0.2 2023-12-23 20:04:05,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1308973.3333333333, ans=0.125 2023-12-23 20:04:12,450 INFO [train.py:886] (1/4) Epoch 42, batch 950, loss[loss=0.01402, audio_tagging_loss=0.01402, over 24750.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4913955.44 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:04:13,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1309040.0, ans=0.1 2023-12-23 20:04:17,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1309040.0, ans=0.125 2023-12-23 20:04:22,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1309106.6666666667, ans=0.0 2023-12-23 20:04:26,269 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.92 vs. limit=12.0 2023-12-23 20:04:43,681 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.390e+01 3.775e+01 3.915e+01 4.098e+01 4.959e+01, threshold=7.831e+01, percent-clipped=0.0 2023-12-23 20:04:49,774 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.52 vs. limit=15.0 2023-12-23 20:04:50,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1309240.0, ans=0.125 2023-12-23 20:05:02,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1309373.3333333333, ans=0.0 2023-12-23 20:05:02,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1309373.3333333333, ans=0.2 2023-12-23 20:05:02,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1309373.3333333333, ans=0.0 2023-12-23 20:05:04,308 INFO [train.py:886] (1/4) Epoch 42, batch 1000, loss[loss=0.01105, audio_tagging_loss=0.01105, over 24750.00 frames. ], tot_loss[loss=0.01151, audio_tagging_loss=0.01151, over 4921257.88 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:05:34,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1309573.3333333333, ans=0.125 2023-12-23 20:05:34,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1309573.3333333333, ans=0.0 2023-12-23 20:05:36,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1309573.3333333333, ans=0.125 2023-12-23 20:05:44,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1309573.3333333333, ans=0.125 2023-12-23 20:05:56,010 INFO [train.py:886] (1/4) Epoch 42, batch 1050, loss[loss=0.009854, audio_tagging_loss=0.009854, over 25000.00 frames. ], tot_loss[loss=0.01145, audio_tagging_loss=0.01145, over 4926939.35 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:06:01,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1309706.6666666667, ans=0.1 2023-12-23 20:06:17,614 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.67 vs. limit=15.0 2023-12-23 20:06:23,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1309840.0, ans=0.125 2023-12-23 20:06:25,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1309840.0, ans=0.125 2023-12-23 20:06:25,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1309840.0, ans=0.0 2023-12-23 20:06:26,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1309906.6666666667, ans=0.125 2023-12-23 20:06:27,322 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.280e+01 3.716e+01 3.855e+01 4.036e+01 4.580e+01, threshold=7.710e+01, percent-clipped=0.0 2023-12-23 20:06:31,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1309906.6666666667, ans=0.125 2023-12-23 20:06:47,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1310040.0, ans=0.125 2023-12-23 20:06:48,269 INFO [train.py:886] (1/4) Epoch 42, batch 1100, loss[loss=0.01088, audio_tagging_loss=0.01088, over 25000.00 frames. ], tot_loss[loss=0.01135, audio_tagging_loss=0.01135, over 4927949.95 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:06:52,307 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.75 vs. limit=22.5 2023-12-23 20:07:00,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1310106.6666666667, ans=0.1 2023-12-23 20:07:24,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1310240.0, ans=0.125 2023-12-23 20:07:26,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1310240.0, ans=0.125 2023-12-23 20:07:36,509 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.13 vs. limit=15.0 2023-12-23 20:07:38,710 INFO [train.py:886] (1/4) Epoch 42, batch 1150, loss[loss=0.01363, audio_tagging_loss=0.01363, over 25000.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4936376.89 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:07:38,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1310373.3333333333, ans=0.09899494936611666 2023-12-23 20:07:44,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=1310373.3333333333, ans=10.0 2023-12-23 20:08:10,039 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.331e+01 3.706e+01 3.860e+01 4.040e+01 4.470e+01, threshold=7.720e+01, percent-clipped=0.0 2023-12-23 20:08:30,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1310640.0, ans=0.0 2023-12-23 20:08:32,220 INFO [train.py:886] (1/4) Epoch 42, batch 1200, loss[loss=0.01309, audio_tagging_loss=0.01309, over 25000.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4937307.03 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:08:37,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1310706.6666666667, ans=0.125 2023-12-23 20:09:01,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1310840.0, ans=0.2 2023-12-23 20:09:06,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1310906.6666666667, ans=0.0 2023-12-23 20:09:22,920 INFO [train.py:886] (1/4) Epoch 42, batch 1250, loss[loss=0.01106, audio_tagging_loss=0.01106, over 22284.00 frames. ], tot_loss[loss=0.01144, audio_tagging_loss=0.01144, over 4935142.81 frames. ], batch size: 107, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:09:53,464 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.422e+01 3.763e+01 3.909e+01 4.123e+01 4.708e+01, threshold=7.819e+01, percent-clipped=0.0 2023-12-23 20:10:13,806 INFO [train.py:886] (1/4) Epoch 42, batch 1300, loss[loss=0.01137, audio_tagging_loss=0.01137, over 24750.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4932597.43 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:10:14,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1311373.3333333333, ans=0.1 2023-12-23 20:10:29,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1311440.0, ans=0.0 2023-12-23 20:10:29,848 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.31 vs. limit=15.0 2023-12-23 20:10:35,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1311506.6666666667, ans=0.0 2023-12-23 20:10:36,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1311506.6666666667, ans=0.125 2023-12-23 20:11:05,803 INFO [train.py:886] (1/4) Epoch 42, batch 1350, loss[loss=0.01087, audio_tagging_loss=0.01087, over 24750.00 frames. ], tot_loss[loss=0.01146, audio_tagging_loss=0.01146, over 4933282.16 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:11:08,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1311706.6666666667, ans=0.0 2023-12-23 20:11:15,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1311773.3333333333, ans=0.1 2023-12-23 20:11:25,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1311840.0, ans=0.125 2023-12-23 20:11:36,142 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.65 vs. limit=10.0 2023-12-23 20:11:36,585 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.337e+01 3.709e+01 3.877e+01 4.071e+01 5.035e+01, threshold=7.754e+01, percent-clipped=0.0 2023-12-23 20:11:57,004 INFO [train.py:886] (1/4) Epoch 42, batch 1400, loss[loss=0.01047, audio_tagging_loss=0.01047, over 25000.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4939083.53 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:11:59,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1312040.0, ans=0.125 2023-12-23 20:12:00,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1312040.0, ans=0.125 2023-12-23 20:12:00,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1312040.0, ans=0.125 2023-12-23 20:12:13,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1312106.6666666667, ans=0.0 2023-12-23 20:12:17,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1312173.3333333333, ans=0.0 2023-12-23 20:12:23,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1312173.3333333333, ans=0.125 2023-12-23 20:12:33,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1312240.0, ans=0.125 2023-12-23 20:12:45,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=1312306.6666666667, ans=15.0 2023-12-23 20:12:49,217 INFO [train.py:886] (1/4) Epoch 42, batch 1450, loss[loss=0.01057, audio_tagging_loss=0.01057, over 25000.00 frames. ], tot_loss[loss=0.01133, audio_tagging_loss=0.01133, over 4946307.14 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:12:49,559 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.95 vs. limit=15.0 2023-12-23 20:13:10,099 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.39 vs. limit=15.0 2023-12-23 20:13:20,250 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.348e+01 3.643e+01 3.891e+01 4.028e+01 4.685e+01, threshold=7.783e+01, percent-clipped=0.0 2023-12-23 20:13:39,828 INFO [train.py:886] (1/4) Epoch 42, batch 1500, loss[loss=0.01042, audio_tagging_loss=0.01042, over 24750.00 frames. ], tot_loss[loss=0.01135, audio_tagging_loss=0.01135, over 4949739.96 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:13:41,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.43 vs. limit=5.0 2023-12-23 20:14:27,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1312973.3333333333, ans=0.125 2023-12-23 20:14:32,124 INFO [train.py:886] (1/4) Epoch 42, batch 1550, loss[loss=0.01125, audio_tagging_loss=0.01125, over 24750.00 frames. ], tot_loss[loss=0.01151, audio_tagging_loss=0.01151, over 4953133.67 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:14:52,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1313173.3333333333, ans=0.125 2023-12-23 20:15:03,379 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.417e+01 3.724e+01 3.969e+01 4.156e+01 4.573e+01, threshold=7.937e+01, percent-clipped=0.0 2023-12-23 20:15:13,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1313306.6666666667, ans=0.2 2023-12-23 20:15:13,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1313306.6666666667, ans=0.1 2023-12-23 20:15:19,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1313306.6666666667, ans=0.0 2023-12-23 20:15:24,379 INFO [train.py:886] (1/4) Epoch 42, batch 1600, loss[loss=0.01041, audio_tagging_loss=0.01041, over 24750.00 frames. ], tot_loss[loss=0.0116, audio_tagging_loss=0.0116, over 4948800.14 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:15:24,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1313373.3333333333, ans=0.0 2023-12-23 20:15:25,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1313373.3333333333, ans=0.125 2023-12-23 20:15:31,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1313373.3333333333, ans=0.125 2023-12-23 20:15:54,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1313506.6666666667, ans=0.0 2023-12-23 20:15:55,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1313573.3333333333, ans=0.125 2023-12-23 20:16:13,077 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.13 vs. limit=15.0 2023-12-23 20:16:13,819 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 20:16:16,136 INFO [train.py:886] (1/4) Epoch 42, batch 1650, loss[loss=0.01021, audio_tagging_loss=0.01021, over 25000.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4943413.58 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:16:17,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1313706.6666666667, ans=0.2 2023-12-23 20:16:27,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1313773.3333333333, ans=0.2 2023-12-23 20:16:31,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2023-12-23 20:16:47,632 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.317e+01 3.698e+01 3.919e+01 4.098e+01 5.152e+01, threshold=7.837e+01, percent-clipped=0.0 2023-12-23 20:16:57,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1313973.3333333333, ans=0.2 2023-12-23 20:17:00,900 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.67 vs. limit=12.0 2023-12-23 20:17:07,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1314040.0, ans=0.125 2023-12-23 20:17:07,869 INFO [train.py:886] (1/4) Epoch 42, batch 1700, loss[loss=0.009577, audio_tagging_loss=0.009577, over 24750.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4943856.55 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:17:22,165 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.14 vs. limit=15.0 2023-12-23 20:17:37,084 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.58 vs. limit=22.5 2023-12-23 20:17:43,778 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.76 vs. limit=15.0 2023-12-23 20:17:49,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1314306.6666666667, ans=0.125 2023-12-23 20:17:54,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1314306.6666666667, ans=0.0 2023-12-23 20:18:00,808 INFO [train.py:886] (1/4) Epoch 42, batch 1750, loss[loss=0.009973, audio_tagging_loss=0.009973, over 24750.00 frames. ], tot_loss[loss=0.01127, audio_tagging_loss=0.01127, over 4951433.26 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:18:17,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1314440.0, ans=0.1 2023-12-23 20:18:22,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1314506.6666666667, ans=0.125 2023-12-23 20:18:27,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1314506.6666666667, ans=0.125 2023-12-23 20:18:31,278 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.382e+01 3.733e+01 3.893e+01 4.031e+01 4.588e+01, threshold=7.786e+01, percent-clipped=0.0 2023-12-23 20:18:35,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1314573.3333333333, ans=0.125 2023-12-23 20:18:45,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1314640.0, ans=0.1 2023-12-23 20:18:52,544 INFO [train.py:886] (1/4) Epoch 42, batch 1800, loss[loss=0.01068, audio_tagging_loss=0.01068, over 21663.00 frames. ], tot_loss[loss=0.01126, audio_tagging_loss=0.01126, over 4951310.72 frames. ], batch size: 107, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:19:10,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1314773.3333333333, ans=0.125 2023-12-23 20:19:16,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1314840.0, ans=0.1 2023-12-23 20:19:18,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1314840.0, ans=0.125 2023-12-23 20:19:23,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1314906.6666666667, ans=0.0 2023-12-23 20:19:24,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1314906.6666666667, ans=0.125 2023-12-23 20:19:28,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1314906.6666666667, ans=0.125 2023-12-23 20:19:30,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1314906.6666666667, ans=0.125 2023-12-23 20:19:34,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1314973.3333333333, ans=0.125 2023-12-23 20:19:39,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1314973.3333333333, ans=0.125 2023-12-23 20:19:41,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1314973.3333333333, ans=0.1 2023-12-23 20:19:45,033 INFO [train.py:886] (1/4) Epoch 42, batch 1850, loss[loss=0.01225, audio_tagging_loss=0.01225, over 24750.00 frames. ], tot_loss[loss=0.01133, audio_tagging_loss=0.01133, over 4951271.58 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:19:50,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1315040.0, ans=0.125 2023-12-23 20:20:02,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1315106.6666666667, ans=0.125 2023-12-23 20:20:16,577 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.366e+01 3.781e+01 3.954e+01 4.123e+01 5.023e+01, threshold=7.908e+01, percent-clipped=0.0 2023-12-23 20:20:19,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1315240.0, ans=0.0 2023-12-23 20:20:21,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1315240.0, ans=0.0 2023-12-23 20:20:29,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1315306.6666666667, ans=0.1 2023-12-23 20:20:31,785 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.45 vs. limit=15.0 2023-12-23 20:20:37,028 INFO [train.py:886] (1/4) Epoch 42, batch 1900, loss[loss=0.01157, audio_tagging_loss=0.01157, over 25000.00 frames. ], tot_loss[loss=0.01145, audio_tagging_loss=0.01145, over 4943589.82 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:20:59,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1315506.6666666667, ans=0.0 2023-12-23 20:21:00,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1315506.6666666667, ans=0.0 2023-12-23 20:21:10,789 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.45 vs. limit=15.0 2023-12-23 20:21:18,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1315640.0, ans=0.0 2023-12-23 20:21:28,609 INFO [train.py:886] (1/4) Epoch 42, batch 1950, loss[loss=0.01035, audio_tagging_loss=0.01035, over 24750.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4940852.98 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:21:46,112 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.91 vs. limit=22.5 2023-12-23 20:21:47,189 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.91 vs. limit=22.5 2023-12-23 20:21:49,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1315840.0, ans=0.125 2023-12-23 20:21:59,666 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.467e+01 3.743e+01 3.848e+01 4.008e+01 4.780e+01, threshold=7.697e+01, percent-clipped=0.0 2023-12-23 20:22:21,590 INFO [train.py:886] (1/4) Epoch 42, batch 2000, loss[loss=0.01074, audio_tagging_loss=0.01074, over 25000.00 frames. ], tot_loss[loss=0.01129, audio_tagging_loss=0.01129, over 4943638.77 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:22:37,351 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.18 vs. limit=15.0 2023-12-23 20:22:38,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1316106.6666666667, ans=0.2 2023-12-23 20:22:40,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1316173.3333333333, ans=0.1 2023-12-23 20:22:45,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1316173.3333333333, ans=0.0 2023-12-23 20:22:46,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1316173.3333333333, ans=0.125 2023-12-23 20:22:52,777 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=22.5 2023-12-23 20:22:53,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1316240.0, ans=0.125 2023-12-23 20:23:02,228 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.81 vs. limit=15.0 2023-12-23 20:23:08,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1316306.6666666667, ans=0.5 2023-12-23 20:23:11,337 INFO [train.py:886] (1/4) Epoch 42, batch 2050, loss[loss=0.009753, audio_tagging_loss=0.009753, over 25000.00 frames. ], tot_loss[loss=0.01119, audio_tagging_loss=0.01119, over 4948784.17 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 64.0 2023-12-23 20:23:36,404 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.32 vs. limit=15.0 2023-12-23 20:23:38,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1316506.6666666667, ans=0.1 2023-12-23 20:23:40,313 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.51 vs. limit=10.0 2023-12-23 20:23:41,639 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.285e+01 3.684e+01 3.851e+01 4.044e+01 4.837e+01, threshold=7.702e+01, percent-clipped=0.0 2023-12-23 20:23:51,623 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.13 vs. limit=15.0 2023-12-23 20:24:02,930 INFO [train.py:886] (1/4) Epoch 42, batch 2100, loss[loss=0.01005, audio_tagging_loss=0.01005, over 25000.00 frames. ], tot_loss[loss=0.01109, audio_tagging_loss=0.01109, over 4951196.83 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 64.0 2023-12-23 20:24:21,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1316840.0, ans=0.0 2023-12-23 20:24:43,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1316973.3333333333, ans=0.125 2023-12-23 20:24:53,623 INFO [train.py:886] (1/4) Epoch 42, batch 2150, loss[loss=0.01214, audio_tagging_loss=0.01214, over 24750.00 frames. ], tot_loss[loss=0.01109, audio_tagging_loss=0.01109, over 4955978.63 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 64.0 2023-12-23 20:24:59,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1317040.0, ans=0.2 2023-12-23 20:25:21,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1317173.3333333333, ans=0.125 2023-12-23 20:25:24,376 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.516e+01 3.741e+01 3.904e+01 4.093e+01 4.524e+01, threshold=7.808e+01, percent-clipped=0.0 2023-12-23 20:25:31,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1317240.0, ans=0.0 2023-12-23 20:25:33,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1317240.0, ans=0.0 2023-12-23 20:25:41,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1317306.6666666667, ans=0.1 2023-12-23 20:25:45,643 INFO [train.py:886] (1/4) Epoch 42, batch 2200, loss[loss=0.007688, audio_tagging_loss=0.007688, over 24750.00 frames. ], tot_loss[loss=0.01121, audio_tagging_loss=0.01121, over 4950441.74 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 64.0 2023-12-23 20:26:05,172 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.88 vs. limit=15.0 2023-12-23 20:26:05,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1317506.6666666667, ans=0.1 2023-12-23 20:26:09,912 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 20:26:12,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1317506.6666666667, ans=0.0 2023-12-23 20:26:14,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1317506.6666666667, ans=0.1 2023-12-23 20:26:19,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1317573.3333333333, ans=0.2 2023-12-23 20:26:20,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1317573.3333333333, ans=0.0 2023-12-23 20:26:21,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1317573.3333333333, ans=0.125 2023-12-23 20:26:22,069 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 20:26:27,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1317640.0, ans=0.0 2023-12-23 20:26:33,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1317640.0, ans=0.0 2023-12-23 20:26:37,034 INFO [train.py:886] (1/4) Epoch 42, batch 2250, loss[loss=0.01082, audio_tagging_loss=0.01082, over 25000.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4942683.16 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 64.0 2023-12-23 20:26:46,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.10 vs. limit=22.5 2023-12-23 20:26:49,935 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.56 vs. limit=10.0 2023-12-23 20:26:51,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1317773.3333333333, ans=0.125 2023-12-23 20:27:03,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1317840.0, ans=10.0 2023-12-23 20:27:06,284 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.66 vs. limit=15.0 2023-12-23 20:27:07,753 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.424e+01 3.775e+01 3.919e+01 4.105e+01 4.695e+01, threshold=7.838e+01, percent-clipped=0.0 2023-12-23 20:27:07,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1317906.6666666667, ans=0.125 2023-12-23 20:27:26,585 INFO [train.py:886] (1/4) Epoch 42, batch 2300, loss[loss=0.01107, audio_tagging_loss=0.01107, over 24750.00 frames. ], tot_loss[loss=0.01133, audio_tagging_loss=0.01133, over 4945484.93 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 64.0 2023-12-23 20:28:00,373 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.43 vs. limit=15.0 2023-12-23 20:28:20,064 INFO [train.py:886] (1/4) Epoch 42, batch 2350, loss[loss=0.01379, audio_tagging_loss=0.01379, over 25000.00 frames. ], tot_loss[loss=0.01133, audio_tagging_loss=0.01133, over 4947188.24 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:28:25,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1318373.3333333333, ans=0.0 2023-12-23 20:28:27,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1318373.3333333333, ans=0.2 2023-12-23 20:28:40,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1318506.6666666667, ans=0.125 2023-12-23 20:28:42,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1318506.6666666667, ans=0.0 2023-12-23 20:28:48,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1318506.6666666667, ans=0.125 2023-12-23 20:28:48,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1318506.6666666667, ans=0.125 2023-12-23 20:28:50,512 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.317e+01 3.647e+01 3.807e+01 4.043e+01 4.652e+01, threshold=7.613e+01, percent-clipped=0.0 2023-12-23 20:29:04,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1318640.0, ans=0.125 2023-12-23 20:29:08,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1318640.0, ans=0.0 2023-12-23 20:29:10,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1318706.6666666667, ans=0.125 2023-12-23 20:29:12,145 INFO [train.py:886] (1/4) Epoch 42, batch 2400, loss[loss=0.01117, audio_tagging_loss=0.01117, over 25000.00 frames. ], tot_loss[loss=0.01127, audio_tagging_loss=0.01127, over 4948759.22 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:30:03,216 INFO [train.py:886] (1/4) Epoch 42, batch 2450, loss[loss=0.008903, audio_tagging_loss=0.008903, over 21497.00 frames. ], tot_loss[loss=0.01119, audio_tagging_loss=0.01119, over 4944793.36 frames. ], batch size: 107, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:30:06,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.90 vs. limit=12.0 2023-12-23 20:30:20,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1319106.6666666667, ans=0.0 2023-12-23 20:30:34,571 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.396e+01 3.760e+01 3.936e+01 4.132e+01 5.379e+01, threshold=7.871e+01, percent-clipped=0.0 2023-12-23 20:30:34,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1319240.0, ans=0.2 2023-12-23 20:30:36,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1319240.0, ans=0.125 2023-12-23 20:30:44,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1319306.6666666667, ans=0.0 2023-12-23 20:30:55,607 INFO [train.py:886] (1/4) Epoch 42, batch 2500, loss[loss=0.01254, audio_tagging_loss=0.01254, over 24750.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4945081.60 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:30:56,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1319373.3333333333, ans=0.1 2023-12-23 20:31:03,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1319373.3333333333, ans=0.125 2023-12-23 20:31:12,493 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 20:31:46,303 INFO [train.py:886] (1/4) Epoch 42, batch 2550, loss[loss=0.01247, audio_tagging_loss=0.01247, over 24750.00 frames. ], tot_loss[loss=0.01143, audio_tagging_loss=0.01143, over 4945129.99 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:31:52,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1319706.6666666667, ans=0.0 2023-12-23 20:31:55,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1319706.6666666667, ans=0.1 2023-12-23 20:32:03,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1319773.3333333333, ans=0.125 2023-12-23 20:32:03,455 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.84 vs. limit=15.0 2023-12-23 20:32:11,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1319840.0, ans=0.0 2023-12-23 20:32:15,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1319840.0, ans=0.125 2023-12-23 20:32:17,485 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.464e+01 3.824e+01 3.969e+01 4.161e+01 4.691e+01, threshold=7.938e+01, percent-clipped=0.0 2023-12-23 20:32:39,492 INFO [train.py:886] (1/4) Epoch 42, batch 2600, loss[loss=0.01402, audio_tagging_loss=0.01402, over 24750.00 frames. ], tot_loss[loss=0.01135, audio_tagging_loss=0.01135, over 4945075.49 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:32:39,911 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.78 vs. limit=22.5 2023-12-23 20:32:44,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1320040.0, ans=0.0 2023-12-23 20:32:57,384 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.15 vs. limit=10.0 2023-12-23 20:33:31,690 INFO [train.py:886] (1/4) Epoch 42, batch 2650, loss[loss=0.01307, audio_tagging_loss=0.01307, over 25000.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4942128.52 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:33:56,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1320506.6666666667, ans=0.125 2023-12-23 20:34:02,590 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.266e+01 3.676e+01 3.866e+01 4.014e+01 4.776e+01, threshold=7.733e+01, percent-clipped=0.0 2023-12-23 20:34:08,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1320573.3333333333, ans=0.2 2023-12-23 20:34:12,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1320640.0, ans=0.125 2023-12-23 20:34:18,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1320640.0, ans=0.0 2023-12-23 20:34:19,271 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.29 vs. limit=22.5 2023-12-23 20:34:19,304 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.78 vs. limit=15.0 2023-12-23 20:34:22,621 INFO [train.py:886] (1/4) Epoch 42, batch 2700, loss[loss=0.01135, audio_tagging_loss=0.01135, over 24750.00 frames. ], tot_loss[loss=0.01129, audio_tagging_loss=0.01129, over 4946415.46 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:34:36,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1320773.3333333333, ans=0.125 2023-12-23 20:34:40,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1320773.3333333333, ans=0.0 2023-12-23 20:34:43,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1320840.0, ans=0.1 2023-12-23 20:34:46,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1320840.0, ans=0.125 2023-12-23 20:34:58,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1320906.6666666667, ans=0.125 2023-12-23 20:35:01,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1320906.6666666667, ans=0.125 2023-12-23 20:35:15,201 INFO [train.py:886] (1/4) Epoch 42, batch 2750, loss[loss=0.0119, audio_tagging_loss=0.0119, over 24908.00 frames. ], tot_loss[loss=0.01124, audio_tagging_loss=0.01124, over 4948510.56 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:35:15,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1321040.0, ans=0.125 2023-12-23 20:35:32,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1321106.6666666667, ans=0.125 2023-12-23 20:35:43,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1321173.3333333333, ans=0.125 2023-12-23 20:35:46,367 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.326e+01 3.739e+01 3.858e+01 4.112e+01 5.048e+01, threshold=7.715e+01, percent-clipped=0.0 2023-12-23 20:35:47,731 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.68 vs. limit=15.0 2023-12-23 20:35:48,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1321240.0, ans=0.04949747468305833 2023-12-23 20:35:58,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1321306.6666666667, ans=0.1 2023-12-23 20:36:05,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1321373.3333333333, ans=0.125 2023-12-23 20:36:07,109 INFO [train.py:886] (1/4) Epoch 42, batch 2800, loss[loss=0.01006, audio_tagging_loss=0.01006, over 24750.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4947320.05 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:36:18,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1321440.0, ans=0.0 2023-12-23 20:36:24,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1321440.0, ans=0.125 2023-12-23 20:36:40,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1321573.3333333333, ans=0.125 2023-12-23 20:36:59,079 INFO [train.py:886] (1/4) Epoch 42, batch 2850, loss[loss=0.01188, audio_tagging_loss=0.01188, over 24750.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4946219.48 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:37:14,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1321773.3333333333, ans=0.125 2023-12-23 20:37:18,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1321773.3333333333, ans=0.125 2023-12-23 20:37:20,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1321840.0, ans=0.2 2023-12-23 20:37:29,691 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.343e+01 3.804e+01 3.965e+01 4.086e+01 5.143e+01, threshold=7.930e+01, percent-clipped=0.0 2023-12-23 20:37:44,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1321973.3333333333, ans=0.125 2023-12-23 20:37:47,245 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.68 vs. limit=15.0 2023-12-23 20:37:49,117 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.31 vs. limit=10.0 2023-12-23 20:37:51,432 INFO [train.py:886] (1/4) Epoch 42, batch 2900, loss[loss=0.01164, audio_tagging_loss=0.01164, over 24750.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4946429.14 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:38:01,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1322106.6666666667, ans=0.125 2023-12-23 20:38:03,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1322106.6666666667, ans=0.0 2023-12-23 20:38:04,953 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.81 vs. limit=15.0 2023-12-23 20:38:26,006 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.08 vs. limit=15.0 2023-12-23 20:38:37,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1322306.6666666667, ans=0.09899494936611666 2023-12-23 20:38:42,610 INFO [train.py:886] (1/4) Epoch 42, batch 2950, loss[loss=0.01329, audio_tagging_loss=0.01329, over 24934.00 frames. ], tot_loss[loss=0.01139, audio_tagging_loss=0.01139, over 4947454.47 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:38:44,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1322373.3333333333, ans=0.0 2023-12-23 20:38:59,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1322440.0, ans=0.1 2023-12-23 20:39:03,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1322506.6666666667, ans=0.0 2023-12-23 20:39:08,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1322506.6666666667, ans=0.125 2023-12-23 20:39:14,046 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.161e+01 3.693e+01 3.854e+01 4.054e+01 4.408e+01, threshold=7.709e+01, percent-clipped=0.0 2023-12-23 20:39:21,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1322573.3333333333, ans=0.125 2023-12-23 20:39:22,417 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.05 vs. limit=22.5 2023-12-23 20:39:27,951 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.05 vs. limit=22.5 2023-12-23 20:39:35,458 INFO [train.py:886] (1/4) Epoch 42, batch 3000, loss[loss=0.01086, audio_tagging_loss=0.01086, over 24750.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4950534.08 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:39:35,459 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 20:39:51,339 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.6370, 3.9627, 4.0896, 3.8052], device='cuda:1') 2023-12-23 20:39:56,132 INFO [train.py:917] (1/4) Epoch 42, validation: loss=0.03585, audio_tagging_loss=0.03585, over 3737520.00 frames. 2023-12-23 20:39:56,132 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 20:40:46,851 INFO [train.py:886] (1/4) Epoch 42, batch 3050, loss[loss=0.01191, audio_tagging_loss=0.01191, over 25000.00 frames. ], tot_loss[loss=0.01126, audio_tagging_loss=0.01126, over 4953701.33 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:40:47,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1323040.0, ans=0.125 2023-12-23 20:41:08,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1323173.3333333333, ans=0.2 2023-12-23 20:41:12,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1323173.3333333333, ans=0.125 2023-12-23 20:41:17,727 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.294e+01 3.719e+01 3.883e+01 4.038e+01 4.632e+01, threshold=7.765e+01, percent-clipped=0.0 2023-12-23 20:41:18,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1323240.0, ans=0.0 2023-12-23 20:41:22,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1323240.0, ans=0.09899494936611666 2023-12-23 20:41:29,289 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.64 vs. limit=15.0 2023-12-23 20:41:39,288 INFO [train.py:886] (1/4) Epoch 42, batch 3100, loss[loss=0.01199, audio_tagging_loss=0.01199, over 24750.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4960418.43 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:41:41,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1323373.3333333333, ans=0.0 2023-12-23 20:41:58,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1323440.0, ans=0.2 2023-12-23 20:42:15,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.89 vs. limit=10.0 2023-12-23 20:42:20,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1323640.0, ans=0.0 2023-12-23 20:42:31,535 INFO [train.py:886] (1/4) Epoch 42, batch 3150, loss[loss=0.01165, audio_tagging_loss=0.01165, over 24750.00 frames. ], tot_loss[loss=0.01139, audio_tagging_loss=0.01139, over 4960307.73 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:42:34,822 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.55 vs. limit=22.5 2023-12-23 20:42:57,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1323840.0, ans=0.125 2023-12-23 20:43:01,737 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.454e+01 3.789e+01 3.951e+01 4.110e+01 5.774e+01, threshold=7.902e+01, percent-clipped=0.0 2023-12-23 20:43:16,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=1323973.3333333333, ans=15.0 2023-12-23 20:43:19,535 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.03 vs. limit=15.0 2023-12-23 20:43:21,054 INFO [train.py:886] (1/4) Epoch 42, batch 3200, loss[loss=0.008904, audio_tagging_loss=0.008904, over 25000.00 frames. ], tot_loss[loss=0.01142, audio_tagging_loss=0.01142, over 4953064.24 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:43:57,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1324240.0, ans=0.0 2023-12-23 20:44:05,264 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.09 vs. limit=15.0 2023-12-23 20:44:13,411 INFO [train.py:886] (1/4) Epoch 42, batch 3250, loss[loss=0.01127, audio_tagging_loss=0.01127, over 24074.00 frames. ], tot_loss[loss=0.01139, audio_tagging_loss=0.01139, over 4950001.14 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:44:13,799 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.83 vs. limit=15.0 2023-12-23 20:44:23,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1324440.0, ans=0.125 2023-12-23 20:44:23,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=1324440.0, ans=0.02 2023-12-23 20:44:33,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1324506.6666666667, ans=0.0 2023-12-23 20:44:44,156 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.306e+01 3.683e+01 3.870e+01 4.032e+01 4.579e+01, threshold=7.741e+01, percent-clipped=0.0 2023-12-23 20:44:49,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1324573.3333333333, ans=0.1 2023-12-23 20:44:51,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1324573.3333333333, ans=0.0 2023-12-23 20:44:56,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1324640.0, ans=0.0 2023-12-23 20:44:59,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1324640.0, ans=0.2 2023-12-23 20:45:03,262 INFO [train.py:886] (1/4) Epoch 42, batch 3300, loss[loss=0.01076, audio_tagging_loss=0.01076, over 25000.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4938858.38 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:45:09,035 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.61 vs. limit=10.0 2023-12-23 20:45:10,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1324706.6666666667, ans=0.1 2023-12-23 20:45:31,566 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.86 vs. limit=15.0 2023-12-23 20:45:55,689 INFO [train.py:886] (1/4) Epoch 42, batch 3350, loss[loss=0.009809, audio_tagging_loss=0.009809, over 25000.00 frames. ], tot_loss[loss=0.01133, audio_tagging_loss=0.01133, over 4947308.43 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:45:57,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1325040.0, ans=0.0 2023-12-23 20:46:03,678 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.84 vs. limit=15.0 2023-12-23 20:46:08,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1325106.6666666667, ans=0.1 2023-12-23 20:46:12,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1325106.6666666667, ans=0.0 2023-12-23 20:46:15,583 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.06 vs. limit=15.0 2023-12-23 20:46:27,254 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.349e+01 3.683e+01 3.878e+01 4.092e+01 4.733e+01, threshold=7.755e+01, percent-clipped=0.0 2023-12-23 20:46:48,521 INFO [train.py:886] (1/4) Epoch 42, batch 3400, loss[loss=0.01137, audio_tagging_loss=0.01137, over 24750.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4943244.89 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:46:59,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1325440.0, ans=0.1 2023-12-23 20:47:01,199 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2023-12-23 20:47:38,515 INFO [train.py:886] (1/4) Epoch 42, batch 3450, loss[loss=0.009114, audio_tagging_loss=0.009114, over 25000.00 frames. ], tot_loss[loss=0.01142, audio_tagging_loss=0.01142, over 4942068.04 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:47:47,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1325706.6666666667, ans=0.125 2023-12-23 20:47:51,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1325773.3333333333, ans=0.125 2023-12-23 20:47:58,728 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.86 vs. limit=15.0 2023-12-23 20:48:09,691 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.370e+01 3.738e+01 3.925e+01 4.101e+01 5.009e+01, threshold=7.849e+01, percent-clipped=0.0 2023-12-23 20:48:15,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1325906.6666666667, ans=0.1 2023-12-23 20:48:19,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1325973.3333333333, ans=0.0 2023-12-23 20:48:26,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1325973.3333333333, ans=10.0 2023-12-23 20:48:30,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1326040.0, ans=0.1 2023-12-23 20:48:31,451 INFO [train.py:886] (1/4) Epoch 42, batch 3500, loss[loss=0.01243, audio_tagging_loss=0.01243, over 24750.00 frames. ], tot_loss[loss=0.01143, audio_tagging_loss=0.01143, over 4939903.60 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:48:39,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1326040.0, ans=0.125 2023-12-23 20:48:46,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1326106.6666666667, ans=0.125 2023-12-23 20:48:48,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1326106.6666666667, ans=0.1 2023-12-23 20:48:55,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1326173.3333333333, ans=0.125 2023-12-23 20:49:22,273 INFO [train.py:886] (1/4) Epoch 42, batch 3550, loss[loss=0.01132, audio_tagging_loss=0.01132, over 25000.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4938973.90 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:49:27,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1326373.3333333333, ans=0.125 2023-12-23 20:49:35,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1326440.0, ans=0.0 2023-12-23 20:49:50,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1326506.6666666667, ans=0.0 2023-12-23 20:49:53,672 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.537e+01 3.732e+01 3.866e+01 4.048e+01 4.845e+01, threshold=7.731e+01, percent-clipped=0.0 2023-12-23 20:50:14,581 INFO [train.py:886] (1/4) Epoch 42, batch 3600, loss[loss=0.008431, audio_tagging_loss=0.008431, over 25000.00 frames. ], tot_loss[loss=0.01124, audio_tagging_loss=0.01124, over 4946618.08 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:50:21,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1326706.6666666667, ans=0.1 2023-12-23 20:50:23,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1326773.3333333333, ans=0.2 2023-12-23 20:50:32,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1326773.3333333333, ans=0.1 2023-12-23 20:50:56,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1326973.3333333333, ans=0.1 2023-12-23 20:51:07,221 INFO [train.py:886] (1/4) Epoch 42, batch 3650, loss[loss=0.01159, audio_tagging_loss=0.01159, over 25000.00 frames. ], tot_loss[loss=0.01122, audio_tagging_loss=0.01122, over 4949001.04 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:51:13,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1327040.0, ans=0.0 2023-12-23 20:51:32,534 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.63 vs. limit=22.5 2023-12-23 20:51:38,588 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.336e+01 3.685e+01 3.869e+01 4.096e+01 4.964e+01, threshold=7.739e+01, percent-clipped=0.0 2023-12-23 20:51:46,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1327240.0, ans=0.0 2023-12-23 20:51:49,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1327306.6666666667, ans=0.125 2023-12-23 20:51:55,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1327306.6666666667, ans=0.2 2023-12-23 20:51:55,935 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2023-12-23 20:51:58,200 INFO [train.py:886] (1/4) Epoch 42, batch 3700, loss[loss=0.009312, audio_tagging_loss=0.009312, over 25000.00 frames. ], tot_loss[loss=0.01119, audio_tagging_loss=0.01119, over 4949815.37 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:52:11,353 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2023-12-23 20:52:23,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.25 vs. limit=15.0 2023-12-23 20:52:27,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1327506.6666666667, ans=0.025 2023-12-23 20:52:39,455 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 20:52:40,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1327640.0, ans=0.125 2023-12-23 20:52:42,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1327640.0, ans=0.07 2023-12-23 20:52:49,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1327706.6666666667, ans=0.125 2023-12-23 20:52:50,278 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.28 vs. limit=15.0 2023-12-23 20:52:50,632 INFO [train.py:886] (1/4) Epoch 42, batch 3750, loss[loss=0.01223, audio_tagging_loss=0.01223, over 24750.00 frames. ], tot_loss[loss=0.01117, audio_tagging_loss=0.01117, over 4952078.99 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:53:21,794 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.364e+01 3.847e+01 3.984e+01 4.181e+01 4.844e+01, threshold=7.969e+01, percent-clipped=0.0 2023-12-23 20:53:21,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1327906.6666666667, ans=0.035 2023-12-23 20:53:23,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1327906.6666666667, ans=0.125 2023-12-23 20:53:24,047 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.77 vs. limit=15.0 2023-12-23 20:53:25,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1327906.6666666667, ans=0.0 2023-12-23 20:53:43,717 INFO [train.py:886] (1/4) Epoch 42, batch 3800, loss[loss=0.01009, audio_tagging_loss=0.01009, over 24750.00 frames. ], tot_loss[loss=0.01129, audio_tagging_loss=0.01129, over 4943622.23 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:53:44,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1328040.0, ans=0.0 2023-12-23 20:54:00,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1328106.6666666667, ans=0.05 2023-12-23 20:54:19,703 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 20:54:34,337 INFO [train.py:886] (1/4) Epoch 42, batch 3850, loss[loss=0.01178, audio_tagging_loss=0.01178, over 25000.00 frames. ], tot_loss[loss=0.01115, audio_tagging_loss=0.01115, over 4939474.01 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:54:48,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1328440.0, ans=0.125 2023-12-23 20:55:03,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1328506.6666666667, ans=0.1 2023-12-23 20:55:05,496 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.259e+01 3.729e+01 3.866e+01 4.038e+01 4.994e+01, threshold=7.731e+01, percent-clipped=0.0 2023-12-23 20:55:09,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1328573.3333333333, ans=0.07 2023-12-23 20:55:11,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1328573.3333333333, ans=0.2 2023-12-23 20:55:13,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1328573.3333333333, ans=0.125 2023-12-23 20:55:15,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1328640.0, ans=0.125 2023-12-23 20:55:22,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1328640.0, ans=0.125 2023-12-23 20:55:26,614 INFO [train.py:886] (1/4) Epoch 42, batch 3900, loss[loss=0.01271, audio_tagging_loss=0.01271, over 25000.00 frames. ], tot_loss[loss=0.01108, audio_tagging_loss=0.01108, over 4939257.79 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 64.0 2023-12-23 20:55:49,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1328840.0, ans=0.2 2023-12-23 20:56:05,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1328906.6666666667, ans=0.125 2023-12-23 20:56:12,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1328973.3333333333, ans=0.1 2023-12-23 20:56:17,560 INFO [train.py:886] (1/4) Epoch 42, batch 3950, loss[loss=0.01334, audio_tagging_loss=0.01334, over 25000.00 frames. ], tot_loss[loss=0.0111, audio_tagging_loss=0.0111, over 4943713.10 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 64.0 2023-12-23 20:56:21,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1329040.0, ans=0.015 2023-12-23 20:56:31,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1329106.6666666667, ans=0.1 2023-12-23 20:56:33,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1329106.6666666667, ans=0.125 2023-12-23 20:56:49,247 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.418e+01 3.779e+01 3.890e+01 4.083e+01 4.663e+01, threshold=7.780e+01, percent-clipped=0.0 2023-12-23 20:56:55,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1329240.0, ans=0.125 2023-12-23 20:57:10,636 INFO [train.py:886] (1/4) Epoch 42, batch 4000, loss[loss=0.0126, audio_tagging_loss=0.0126, over 25000.00 frames. ], tot_loss[loss=0.01109, audio_tagging_loss=0.01109, over 4947494.38 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 64.0 2023-12-23 20:57:21,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1329440.0, ans=0.0 2023-12-23 20:57:23,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1329440.0, ans=0.125 2023-12-23 20:57:33,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1329506.6666666667, ans=0.1 2023-12-23 20:57:34,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1329506.6666666667, ans=10.0 2023-12-23 20:57:41,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1329573.3333333333, ans=0.125 2023-12-23 20:57:58,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1329640.0, ans=0.0 2023-12-23 20:58:03,477 INFO [train.py:886] (1/4) Epoch 42, batch 4050, loss[loss=0.009357, audio_tagging_loss=0.009357, over 24750.00 frames. ], tot_loss[loss=0.01114, audio_tagging_loss=0.01114, over 4946943.03 frames. ], batch size: 99, lr: 2.54e-03, grad_scale: 64.0 2023-12-23 20:58:06,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1329706.6666666667, ans=0.0 2023-12-23 20:58:30,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1329840.0, ans=0.0 2023-12-23 20:58:30,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1329840.0, ans=0.125 2023-12-23 20:58:32,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1329840.0, ans=0.2 2023-12-23 20:58:35,421 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.417e+01 3.762e+01 3.929e+01 4.087e+01 5.094e+01, threshold=7.857e+01, percent-clipped=0.0 2023-12-23 20:58:43,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1329973.3333333333, ans=0.125 2023-12-23 20:58:53,489 INFO [train.py:886] (1/4) Epoch 42, batch 4100, loss[loss=0.0151, audio_tagging_loss=0.0151, over 24750.00 frames. ], tot_loss[loss=0.01128, audio_tagging_loss=0.01128, over 4949408.63 frames. ], batch size: 99, lr: 2.54e-03, grad_scale: 64.0 2023-12-23 20:59:13,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1330173.3333333333, ans=0.0 2023-12-23 20:59:17,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1330173.3333333333, ans=0.0 2023-12-23 20:59:27,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.04 vs. limit=6.0 2023-12-23 20:59:37,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1330306.6666666667, ans=0.1 2023-12-23 20:59:39,097 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.71 vs. limit=15.0 2023-12-23 20:59:45,167 INFO [train.py:886] (1/4) Epoch 42, batch 4150, loss[loss=0.01059, audio_tagging_loss=0.01059, over 24750.00 frames. ], tot_loss[loss=0.01128, audio_tagging_loss=0.01128, over 4947765.85 frames. ], batch size: 99, lr: 2.54e-03, grad_scale: 64.0 2023-12-23 20:59:47,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=1330373.3333333333, ans=0.05 2023-12-23 20:59:49,507 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.20 vs. limit=15.0 2023-12-23 21:00:15,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1330573.3333333333, ans=0.125 2023-12-23 21:00:16,779 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.428e+01 3.800e+01 3.926e+01 4.112e+01 4.569e+01, threshold=7.851e+01, percent-clipped=0.0 2023-12-23 21:00:17,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1330573.3333333333, ans=0.125 2023-12-23 21:00:20,187 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.87 vs. limit=6.0 2023-12-23 21:00:36,533 INFO [train.py:886] (1/4) Epoch 42, batch 4200, loss[loss=0.009477, audio_tagging_loss=0.009477, over 24750.00 frames. ], tot_loss[loss=0.0112, audio_tagging_loss=0.0112, over 4950601.99 frames. ], batch size: 99, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:00:53,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1330773.3333333333, ans=0.04949747468305833 2023-12-23 21:01:13,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1330906.6666666667, ans=0.0 2023-12-23 21:01:26,735 INFO [train.py:886] (1/4) Epoch 42, batch 4250, loss[loss=0.009648, audio_tagging_loss=0.009648, over 25000.00 frames. ], tot_loss[loss=0.01117, audio_tagging_loss=0.01117, over 4944257.70 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:01:28,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1331040.0, ans=0.0 2023-12-23 21:01:30,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1331040.0, ans=0.1 2023-12-23 21:01:30,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1331040.0, ans=0.1 2023-12-23 21:01:37,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1331106.6666666667, ans=0.125 2023-12-23 21:01:54,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1331173.3333333333, ans=0.1 2023-12-23 21:01:59,511 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.477e+01 3.725e+01 3.912e+01 4.109e+01 5.134e+01, threshold=7.824e+01, percent-clipped=0.0 2023-12-23 21:02:18,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1331373.3333333333, ans=0.2 2023-12-23 21:02:18,810 INFO [train.py:886] (1/4) Epoch 42, batch 4300, loss[loss=0.01162, audio_tagging_loss=0.01162, over 25000.00 frames. ], tot_loss[loss=0.01122, audio_tagging_loss=0.01122, over 4941970.27 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:02:21,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1331373.3333333333, ans=0.1 2023-12-23 21:02:42,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1331506.6666666667, ans=0.0 2023-12-23 21:02:54,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1331573.3333333333, ans=0.125 2023-12-23 21:03:04,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1331640.0, ans=0.125 2023-12-23 21:03:05,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1331640.0, ans=0.125 2023-12-23 21:03:09,441 INFO [train.py:886] (1/4) Epoch 42, batch 4350, loss[loss=0.01094, audio_tagging_loss=0.01094, over 24750.00 frames. ], tot_loss[loss=0.01122, audio_tagging_loss=0.01122, over 4942224.68 frames. ], batch size: 99, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:03:21,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1331773.3333333333, ans=0.1 2023-12-23 21:03:29,715 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:03:41,578 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.317e+01 3.795e+01 3.947e+01 4.145e+01 4.884e+01, threshold=7.894e+01, percent-clipped=0.0 2023-12-23 21:03:44,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1331906.6666666667, ans=0.0 2023-12-23 21:04:01,133 INFO [train.py:886] (1/4) Epoch 42, batch 4400, loss[loss=0.00945, audio_tagging_loss=0.00945, over 24102.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4944433.67 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:04:33,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.79 vs. limit=15.0 2023-12-23 21:04:36,368 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.57 vs. limit=12.0 2023-12-23 21:04:47,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1332306.6666666667, ans=0.1 2023-12-23 21:04:52,966 INFO [train.py:886] (1/4) Epoch 42, batch 4450, loss[loss=0.01076, audio_tagging_loss=0.01076, over 25000.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4937916.58 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:04:59,852 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:05:15,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1332506.6666666667, ans=0.125 2023-12-23 21:05:24,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1332573.3333333333, ans=0.2 2023-12-23 21:05:26,397 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.459e+01 3.806e+01 3.979e+01 4.216e+01 4.903e+01, threshold=7.957e+01, percent-clipped=0.0 2023-12-23 21:05:39,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1332640.0, ans=0.0 2023-12-23 21:05:40,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1332640.0, ans=0.1 2023-12-23 21:05:43,551 INFO [train.py:886] (1/4) Epoch 42, batch 4500, loss[loss=0.01157, audio_tagging_loss=0.01157, over 25000.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4938298.56 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:06:03,530 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:06:05,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1332840.0, ans=0.2 2023-12-23 21:06:16,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1332906.6666666667, ans=0.0 2023-12-23 21:06:19,929 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=6.61 vs. limit=15.0 2023-12-23 21:06:25,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1332973.3333333333, ans=0.0 2023-12-23 21:06:34,843 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2023-12-23 21:06:36,313 INFO [train.py:886] (1/4) Epoch 42, batch 4550, loss[loss=0.008173, audio_tagging_loss=0.008173, over 25000.00 frames. ], tot_loss[loss=0.01132, audio_tagging_loss=0.01132, over 4940042.54 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:06:47,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1333106.6666666667, ans=0.1 2023-12-23 21:06:48,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1333106.6666666667, ans=0.125 2023-12-23 21:06:57,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1333173.3333333333, ans=0.1 2023-12-23 21:07:05,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1333173.3333333333, ans=0.125 2023-12-23 21:07:09,417 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.368e+01 3.739e+01 3.901e+01 4.086e+01 5.054e+01, threshold=7.802e+01, percent-clipped=0.0 2023-12-23 21:07:23,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1333306.6666666667, ans=0.125 2023-12-23 21:07:28,502 INFO [train.py:886] (1/4) Epoch 42, batch 4600, loss[loss=0.01098, audio_tagging_loss=0.01098, over 25000.00 frames. ], tot_loss[loss=0.01124, audio_tagging_loss=0.01124, over 4945807.32 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:07:45,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1333440.0, ans=0.0 2023-12-23 21:07:46,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1333440.0, ans=0.125 2023-12-23 21:07:56,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1333506.6666666667, ans=0.125 2023-12-23 21:07:58,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1333573.3333333333, ans=0.0 2023-12-23 21:08:05,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1333573.3333333333, ans=0.05 2023-12-23 21:08:20,868 INFO [train.py:886] (1/4) Epoch 42, batch 4650, loss[loss=0.01203, audio_tagging_loss=0.01203, over 25000.00 frames. ], tot_loss[loss=0.01132, audio_tagging_loss=0.01132, over 4951429.97 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:08:26,996 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.52 vs. limit=15.0 2023-12-23 21:08:31,775 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.80 vs. limit=6.0 2023-12-23 21:08:40,677 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.12 vs. limit=22.5 2023-12-23 21:08:44,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1333840.0, ans=0.125 2023-12-23 21:08:54,141 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.336e+01 3.819e+01 3.947e+01 4.083e+01 4.797e+01, threshold=7.893e+01, percent-clipped=0.0 2023-12-23 21:09:10,782 INFO [train.py:886] (1/4) Epoch 42, batch 4700, loss[loss=0.01111, audio_tagging_loss=0.01111, over 24750.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4950415.56 frames. ], batch size: 99, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:09:13,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=1334040.0, ans=10.0 2023-12-23 21:09:27,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.39 vs. limit=22.5 2023-12-23 21:09:28,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1334106.6666666667, ans=0.125 2023-12-23 21:09:28,474 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.36 vs. limit=22.5 2023-12-23 21:09:29,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1334106.6666666667, ans=0.0 2023-12-23 21:09:34,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1334173.3333333333, ans=0.125 2023-12-23 21:09:55,663 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.69 vs. limit=6.0 2023-12-23 21:09:58,625 INFO [train.py:886] (1/4) Epoch 42, batch 4750, loss[loss=0.01442, audio_tagging_loss=0.01442, over 24750.00 frames. ], tot_loss[loss=0.01143, audio_tagging_loss=0.01143, over 4945363.09 frames. ], batch size: 99, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:10:01,665 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.86 vs. limit=15.0 2023-12-23 21:10:04,556 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.57 vs. limit=15.0 2023-12-23 21:10:10,931 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.64 vs. limit=15.0 2023-12-23 21:10:32,654 INFO [train.py:886] (1/4) Epoch 43, batch 0, loss[loss=0.02783, audio_tagging_loss=0.02783, over 21199.00 frames. ], tot_loss[loss=0.02783, audio_tagging_loss=0.02783, over 21199.00 frames. ], batch size: 107, lr: 2.51e-03, grad_scale: 32.0 2023-12-23 21:10:32,654 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 21:10:41,402 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.4428, 3.5982, 3.3016, 0.7741], device='cuda:1') 2023-12-23 21:10:53,526 INFO [train.py:917] (1/4) Epoch 43, validation: loss=0.0346, audio_tagging_loss=0.0346, over 3737520.00 frames. 2023-12-23 21:10:53,526 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 21:10:53,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1334480.0, ans=0.0 2023-12-23 21:11:02,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1334480.0, ans=0.125 2023-12-23 21:11:04,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1334546.6666666667, ans=0.125 2023-12-23 21:11:11,323 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.518e+01 3.885e+01 4.049e+01 4.321e+01 9.986e+01, threshold=8.099e+01, percent-clipped=5.0 2023-12-23 21:11:18,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1334613.3333333333, ans=0.125 2023-12-23 21:11:21,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1334613.3333333333, ans=0.0 2023-12-23 21:11:29,431 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:11:33,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1334680.0, ans=0.125 2023-12-23 21:11:40,367 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.86 vs. limit=15.0 2023-12-23 21:11:41,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1334746.6666666667, ans=0.2 2023-12-23 21:11:45,496 INFO [train.py:886] (1/4) Epoch 43, batch 50, loss[loss=0.01535, audio_tagging_loss=0.01535, over 25000.00 frames. ], tot_loss[loss=0.0183, audio_tagging_loss=0.0183, over 1119495.46 frames. ], batch size: 100, lr: 2.51e-03, grad_scale: 16.0 2023-12-23 21:11:59,270 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=15.0 2023-12-23 21:12:10,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1334946.6666666667, ans=0.125 2023-12-23 21:12:14,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1334946.6666666667, ans=0.125 2023-12-23 21:12:37,577 INFO [train.py:886] (1/4) Epoch 43, batch 100, loss[loss=0.0109, audio_tagging_loss=0.0109, over 25000.00 frames. ], tot_loss[loss=0.01565, audio_tagging_loss=0.01565, over 1973661.75 frames. ], batch size: 100, lr: 2.51e-03, grad_scale: 16.0 2023-12-23 21:12:54,979 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.840e+01 4.268e+01 4.587e+01 4.998e+01 5.925e+01, threshold=9.173e+01, percent-clipped=0.0 2023-12-23 21:12:56,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1335213.3333333333, ans=0.0 2023-12-23 21:13:20,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1335413.3333333333, ans=0.0 2023-12-23 21:13:28,994 INFO [train.py:886] (1/4) Epoch 43, batch 150, loss[loss=0.01241, audio_tagging_loss=0.01241, over 25000.00 frames. ], tot_loss[loss=0.01425, audio_tagging_loss=0.01425, over 2640437.04 frames. ], batch size: 100, lr: 2.51e-03, grad_scale: 16.0 2023-12-23 21:13:39,847 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.43 vs. limit=22.5 2023-12-23 21:13:57,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1335613.3333333333, ans=0.1 2023-12-23 21:14:22,070 INFO [train.py:886] (1/4) Epoch 43, batch 200, loss[loss=0.0141, audio_tagging_loss=0.0141, over 25000.00 frames. ], tot_loss[loss=0.01339, audio_tagging_loss=0.01339, over 3154813.68 frames. ], batch size: 100, lr: 2.51e-03, grad_scale: 16.0 2023-12-23 21:14:30,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1335813.3333333333, ans=0.0 2023-12-23 21:14:30,295 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.24 vs. limit=15.0 2023-12-23 21:14:38,285 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.529e+01 3.831e+01 3.990e+01 4.242e+01 5.537e+01, threshold=7.979e+01, percent-clipped=0.0 2023-12-23 21:14:49,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1335946.6666666667, ans=0.125 2023-12-23 21:14:51,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1335946.6666666667, ans=0.125 2023-12-23 21:14:54,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1336013.3333333333, ans=0.0 2023-12-23 21:15:09,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1336080.0, ans=0.125 2023-12-23 21:15:12,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1336146.6666666667, ans=0.1 2023-12-23 21:15:12,766 INFO [train.py:886] (1/4) Epoch 43, batch 250, loss[loss=0.009198, audio_tagging_loss=0.009198, over 25000.00 frames. ], tot_loss[loss=0.01285, audio_tagging_loss=0.01285, over 3556297.91 frames. ], batch size: 100, lr: 2.51e-03, grad_scale: 16.0 2023-12-23 21:15:20,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1336146.6666666667, ans=0.0 2023-12-23 21:15:21,622 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.92 vs. limit=15.0 2023-12-23 21:15:28,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1336213.3333333333, ans=0.0 2023-12-23 21:15:34,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1336280.0, ans=0.2 2023-12-23 21:15:47,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1336346.6666666667, ans=0.125 2023-12-23 21:15:48,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1336346.6666666667, ans=0.1 2023-12-23 21:15:49,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1336346.6666666667, ans=10.0 2023-12-23 21:16:02,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1336413.3333333333, ans=0.0 2023-12-23 21:16:06,120 INFO [train.py:886] (1/4) Epoch 43, batch 300, loss[loss=0.01114, audio_tagging_loss=0.01114, over 24750.00 frames. ], tot_loss[loss=0.01244, audio_tagging_loss=0.01244, over 3860194.03 frames. ], batch size: 99, lr: 2.51e-03, grad_scale: 16.0 2023-12-23 21:16:16,288 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.07 vs. limit=15.0 2023-12-23 21:16:17,279 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.44 vs. limit=22.5 2023-12-23 21:16:22,917 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.462e+01 3.757e+01 3.899e+01 4.075e+01 4.664e+01, threshold=7.797e+01, percent-clipped=0.0 2023-12-23 21:16:24,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1336546.6666666667, ans=0.125 2023-12-23 21:16:28,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1336613.3333333333, ans=0.125 2023-12-23 21:16:46,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1336746.6666666667, ans=0.0 2023-12-23 21:16:51,231 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.64 vs. limit=15.0 2023-12-23 21:16:51,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1336746.6666666667, ans=0.2 2023-12-23 21:16:57,858 INFO [train.py:886] (1/4) Epoch 43, batch 350, loss[loss=0.01241, audio_tagging_loss=0.01241, over 24750.00 frames. ], tot_loss[loss=0.0122, audio_tagging_loss=0.0122, over 4101319.00 frames. ], batch size: 99, lr: 2.51e-03, grad_scale: 16.0 2023-12-23 21:17:11,100 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.88 vs. limit=15.0 2023-12-23 21:17:20,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1336946.6666666667, ans=0.125 2023-12-23 21:17:35,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1337013.3333333333, ans=0.0 2023-12-23 21:17:44,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.41 vs. limit=6.0 2023-12-23 21:17:45,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1337080.0, ans=0.1 2023-12-23 21:17:49,703 INFO [train.py:886] (1/4) Epoch 43, batch 400, loss[loss=0.01182, audio_tagging_loss=0.01182, over 22375.00 frames. ], tot_loss[loss=0.01186, audio_tagging_loss=0.01186, over 4286080.53 frames. ], batch size: 107, lr: 2.51e-03, grad_scale: 32.0 2023-12-23 21:18:08,775 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.477e+01 3.775e+01 3.908e+01 4.060e+01 5.626e+01, threshold=7.816e+01, percent-clipped=0.0 2023-12-23 21:18:34,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1337413.3333333333, ans=0.125 2023-12-23 21:18:42,390 INFO [train.py:886] (1/4) Epoch 43, batch 450, loss[loss=0.01128, audio_tagging_loss=0.01128, over 24750.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4435071.09 frames. ], batch size: 99, lr: 2.51e-03, grad_scale: 32.0 2023-12-23 21:18:47,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1337480.0, ans=0.0 2023-12-23 21:18:54,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1337546.6666666667, ans=0.0 2023-12-23 21:18:54,501 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2023-12-23 21:19:03,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1337613.3333333333, ans=0.125 2023-12-23 21:19:16,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1337680.0, ans=0.1 2023-12-23 21:19:33,098 INFO [train.py:886] (1/4) Epoch 43, batch 500, loss[loss=0.01312, audio_tagging_loss=0.01312, over 25000.00 frames. ], tot_loss[loss=0.01162, audio_tagging_loss=0.01162, over 4549305.10 frames. ], batch size: 100, lr: 2.51e-03, grad_scale: 32.0 2023-12-23 21:19:33,536 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.52 vs. limit=15.0 2023-12-23 21:19:35,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1337813.3333333333, ans=0.0 2023-12-23 21:19:48,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1337880.0, ans=0.125 2023-12-23 21:19:48,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1337880.0, ans=0.2 2023-12-23 21:19:50,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1337880.0, ans=0.0 2023-12-23 21:19:51,662 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.285e+01 3.753e+01 3.929e+01 4.110e+01 4.615e+01, threshold=7.857e+01, percent-clipped=0.0 2023-12-23 21:19:54,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1337946.6666666667, ans=0.125 2023-12-23 21:20:18,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1338080.0, ans=0.1 2023-12-23 21:20:25,818 INFO [train.py:886] (1/4) Epoch 43, batch 550, loss[loss=0.01244, audio_tagging_loss=0.01244, over 25000.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4638081.16 frames. ], batch size: 100, lr: 2.51e-03, grad_scale: 32.0 2023-12-23 21:20:41,663 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.94 vs. limit=15.0 2023-12-23 21:20:51,244 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.41 vs. limit=12.0 2023-12-23 21:20:53,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1338280.0, ans=0.125 2023-12-23 21:20:55,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1338346.6666666667, ans=0.125 2023-12-23 21:21:11,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1338413.3333333333, ans=0.125 2023-12-23 21:21:16,813 INFO [train.py:886] (1/4) Epoch 43, batch 600, loss[loss=0.01408, audio_tagging_loss=0.01408, over 24750.00 frames. ], tot_loss[loss=0.01166, audio_tagging_loss=0.01166, over 4711702.27 frames. ], batch size: 99, lr: 2.51e-03, grad_scale: 32.0 2023-12-23 21:21:19,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1338480.0, ans=0.125 2023-12-23 21:21:26,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1338546.6666666667, ans=0.0 2023-12-23 21:21:34,498 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.403e+01 3.800e+01 3.964e+01 4.144e+01 4.733e+01, threshold=7.928e+01, percent-clipped=0.0 2023-12-23 21:21:55,093 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.99 vs. limit=12.0 2023-12-23 21:21:57,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1338680.0, ans=0.125 2023-12-23 21:21:58,730 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2023-12-23 21:22:07,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1338813.3333333333, ans=0.1 2023-12-23 21:22:08,609 INFO [train.py:886] (1/4) Epoch 43, batch 650, loss[loss=0.0114, audio_tagging_loss=0.0114, over 24750.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4754679.31 frames. ], batch size: 99, lr: 2.51e-03, grad_scale: 32.0 2023-12-23 21:22:23,418 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.20 vs. limit=10.0 2023-12-23 21:22:28,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1338880.0, ans=0.125 2023-12-23 21:22:29,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1338946.6666666667, ans=0.1 2023-12-23 21:22:30,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1338946.6666666667, ans=0.2 2023-12-23 21:22:30,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1338946.6666666667, ans=0.125 2023-12-23 21:22:38,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1339013.3333333333, ans=0.1 2023-12-23 21:22:47,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=1339013.3333333333, ans=15.0 2023-12-23 21:23:01,675 INFO [train.py:886] (1/4) Epoch 43, batch 700, loss[loss=0.01102, audio_tagging_loss=0.01102, over 24750.00 frames. ], tot_loss[loss=0.01166, audio_tagging_loss=0.01166, over 4793433.81 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:23:05,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1339146.6666666667, ans=0.125 2023-12-23 21:23:17,911 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.445e+01 3.835e+01 3.964e+01 4.110e+01 4.993e+01, threshold=7.927e+01, percent-clipped=0.0 2023-12-23 21:23:24,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1339280.0, ans=0.125 2023-12-23 21:23:33,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1339346.6666666667, ans=0.125 2023-12-23 21:23:36,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1339346.6666666667, ans=10.0 2023-12-23 21:23:43,089 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.42 vs. limit=15.0 2023-12-23 21:23:52,665 INFO [train.py:886] (1/4) Epoch 43, batch 750, loss[loss=0.01249, audio_tagging_loss=0.01249, over 24750.00 frames. ], tot_loss[loss=0.01154, audio_tagging_loss=0.01154, over 4828563.35 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:24:10,588 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.67 vs. limit=10.0 2023-12-23 21:24:14,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1339613.3333333333, ans=0.1 2023-12-23 21:24:15,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1339613.3333333333, ans=0.125 2023-12-23 21:24:41,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1339746.6666666667, ans=0.125 2023-12-23 21:24:41,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1339746.6666666667, ans=0.125 2023-12-23 21:24:45,200 INFO [train.py:886] (1/4) Epoch 43, batch 800, loss[loss=0.01131, audio_tagging_loss=0.01131, over 25000.00 frames. ], tot_loss[loss=0.01144, audio_tagging_loss=0.01144, over 4858653.12 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:24:46,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1339813.3333333333, ans=0.125 2023-12-23 21:24:49,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1339813.3333333333, ans=0.0 2023-12-23 21:24:54,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1339880.0, ans=0.1 2023-12-23 21:25:03,503 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.317e+01 3.756e+01 3.877e+01 4.084e+01 5.332e+01, threshold=7.753e+01, percent-clipped=0.0 2023-12-23 21:25:05,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1339946.6666666667, ans=0.125 2023-12-23 21:25:12,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1339946.6666666667, ans=0.125 2023-12-23 21:25:13,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1339946.6666666667, ans=0.015 2023-12-23 21:25:16,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1340013.3333333333, ans=0.2 2023-12-23 21:25:21,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1340013.3333333333, ans=0.0 2023-12-23 21:25:29,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1340080.0, ans=0.125 2023-12-23 21:25:31,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1340080.0, ans=0.1 2023-12-23 21:25:38,241 INFO [train.py:886] (1/4) Epoch 43, batch 850, loss[loss=0.01353, audio_tagging_loss=0.01353, over 25000.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4885675.51 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:25:39,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1340146.6666666667, ans=0.0 2023-12-23 21:26:20,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=1340413.3333333333, ans=0.1 2023-12-23 21:26:29,274 INFO [train.py:886] (1/4) Epoch 43, batch 900, loss[loss=0.01103, audio_tagging_loss=0.01103, over 24750.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4898162.66 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:26:30,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1340480.0, ans=0.0 2023-12-23 21:26:35,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1340480.0, ans=0.125 2023-12-23 21:26:42,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1340546.6666666667, ans=0.0 2023-12-23 21:26:47,366 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.377e+01 3.810e+01 3.949e+01 4.133e+01 4.627e+01, threshold=7.898e+01, percent-clipped=0.0 2023-12-23 21:26:54,214 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.50 vs. limit=15.0 2023-12-23 21:26:55,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1340613.3333333333, ans=0.125 2023-12-23 21:26:56,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1340613.3333333333, ans=0.0 2023-12-23 21:26:57,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1340613.3333333333, ans=0.05 2023-12-23 21:26:59,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1340680.0, ans=0.125 2023-12-23 21:27:20,775 INFO [train.py:886] (1/4) Epoch 43, batch 950, loss[loss=0.00948, audio_tagging_loss=0.00948, over 24750.00 frames. ], tot_loss[loss=0.01146, audio_tagging_loss=0.01146, over 4904692.14 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:27:27,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1340813.3333333333, ans=0.0 2023-12-23 21:27:45,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1340946.6666666667, ans=0.125 2023-12-23 21:27:46,060 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=22.5 2023-12-23 21:27:47,106 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.13 vs. limit=15.0 2023-12-23 21:27:50,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.15 vs. limit=15.0 2023-12-23 21:27:55,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1341013.3333333333, ans=0.125 2023-12-23 21:28:00,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1341013.3333333333, ans=0.125 2023-12-23 21:28:04,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1341080.0, ans=0.0 2023-12-23 21:28:10,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1341080.0, ans=0.125 2023-12-23 21:28:13,279 INFO [train.py:886] (1/4) Epoch 43, batch 1000, loss[loss=0.01111, audio_tagging_loss=0.01111, over 24750.00 frames. ], tot_loss[loss=0.01145, audio_tagging_loss=0.01145, over 4916269.81 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:28:14,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1341146.6666666667, ans=0.0 2023-12-23 21:28:19,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1341146.6666666667, ans=0.1 2023-12-23 21:28:30,323 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.266e+01 3.775e+01 3.954e+01 4.117e+01 5.148e+01, threshold=7.909e+01, percent-clipped=0.0 2023-12-23 21:28:38,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1341280.0, ans=0.0 2023-12-23 21:28:47,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1341346.6666666667, ans=0.125 2023-12-23 21:28:59,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1341413.3333333333, ans=0.125 2023-12-23 21:29:04,835 INFO [train.py:886] (1/4) Epoch 43, batch 1050, loss[loss=0.01205, audio_tagging_loss=0.01205, over 24750.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4922123.34 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:29:05,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1341480.0, ans=0.2 2023-12-23 21:29:12,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1341480.0, ans=0.125 2023-12-23 21:29:18,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1341546.6666666667, ans=0.125 2023-12-23 21:29:21,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1341546.6666666667, ans=0.125 2023-12-23 21:29:31,432 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.74 vs. limit=10.0 2023-12-23 21:29:49,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1341746.6666666667, ans=0.125 2023-12-23 21:29:51,117 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.13 vs. limit=22.5 2023-12-23 21:29:57,273 INFO [train.py:886] (1/4) Epoch 43, batch 1100, loss[loss=0.01027, audio_tagging_loss=0.01027, over 24907.00 frames. ], tot_loss[loss=0.01128, audio_tagging_loss=0.01128, over 4930293.64 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:30:01,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1341813.3333333333, ans=0.0 2023-12-23 21:30:04,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1341813.3333333333, ans=0.125 2023-12-23 21:30:08,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1341880.0, ans=0.125 2023-12-23 21:30:09,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1341880.0, ans=0.1 2023-12-23 21:30:14,086 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.243e+01 3.738e+01 3.893e+01 4.064e+01 5.194e+01, threshold=7.786e+01, percent-clipped=0.0 2023-12-23 21:30:18,540 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.38 vs. limit=6.0 2023-12-23 21:30:21,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1341946.6666666667, ans=0.0 2023-12-23 21:30:33,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1342013.3333333333, ans=0.2 2023-12-23 21:30:42,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1342080.0, ans=0.125 2023-12-23 21:30:48,488 INFO [train.py:886] (1/4) Epoch 43, batch 1150, loss[loss=0.01085, audio_tagging_loss=0.01085, over 23997.00 frames. ], tot_loss[loss=0.01125, audio_tagging_loss=0.01125, over 4933418.25 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:30:56,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1342146.6666666667, ans=0.1 2023-12-23 21:30:59,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1342213.3333333333, ans=0.0 2023-12-23 21:31:27,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1342346.6666666667, ans=0.125 2023-12-23 21:31:37,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1342413.3333333333, ans=0.0 2023-12-23 21:31:40,786 INFO [train.py:886] (1/4) Epoch 43, batch 1200, loss[loss=0.01228, audio_tagging_loss=0.01228, over 25000.00 frames. ], tot_loss[loss=0.01125, audio_tagging_loss=0.01125, over 4934972.14 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:31:59,048 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.434e+01 3.771e+01 3.906e+01 4.055e+01 4.735e+01, threshold=7.811e+01, percent-clipped=0.0 2023-12-23 21:32:00,484 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.87 vs. limit=15.0 2023-12-23 21:32:18,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1342680.0, ans=0.0 2023-12-23 21:32:19,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1342680.0, ans=0.125 2023-12-23 21:32:33,205 INFO [train.py:886] (1/4) Epoch 43, batch 1250, loss[loss=0.01084, audio_tagging_loss=0.01084, over 24750.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4937992.20 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:32:50,583 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.65 vs. limit=15.0 2023-12-23 21:32:55,008 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.70 vs. limit=22.5 2023-12-23 21:33:11,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1343013.3333333333, ans=0.1 2023-12-23 21:33:13,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1343080.0, ans=0.125 2023-12-23 21:33:13,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1343080.0, ans=0.125 2023-12-23 21:33:18,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.57 vs. limit=6.0 2023-12-23 21:33:22,784 INFO [train.py:886] (1/4) Epoch 43, batch 1300, loss[loss=0.01145, audio_tagging_loss=0.01145, over 24750.00 frames. ], tot_loss[loss=0.01145, audio_tagging_loss=0.01145, over 4940695.32 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:33:31,205 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.59 vs. limit=22.5 2023-12-23 21:33:36,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.73 vs. limit=22.5 2023-12-23 21:33:41,676 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.269e+01 3.738e+01 3.919e+01 4.127e+01 4.801e+01, threshold=7.839e+01, percent-clipped=0.0 2023-12-23 21:33:44,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1343280.0, ans=0.0 2023-12-23 21:33:54,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1343346.6666666667, ans=0.125 2023-12-23 21:33:59,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1343346.6666666667, ans=0.125 2023-12-23 21:34:16,076 INFO [train.py:886] (1/4) Epoch 43, batch 1350, loss[loss=0.008286, audio_tagging_loss=0.008286, over 25000.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4944849.79 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:34:23,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1343480.0, ans=0.125 2023-12-23 21:34:24,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1343546.6666666667, ans=0.05 2023-12-23 21:34:31,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1343546.6666666667, ans=0.0 2023-12-23 21:34:34,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1343546.6666666667, ans=0.0 2023-12-23 21:34:42,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1343613.3333333333, ans=0.125 2023-12-23 21:34:45,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1343613.3333333333, ans=0.07 2023-12-23 21:34:52,044 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.60 vs. limit=8.0 2023-12-23 21:34:55,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1343680.0, ans=0.125 2023-12-23 21:34:56,431 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.94 vs. limit=15.0 2023-12-23 21:35:07,993 INFO [train.py:886] (1/4) Epoch 43, batch 1400, loss[loss=0.009362, audio_tagging_loss=0.009362, over 24750.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4947390.46 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:35:15,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1343813.3333333333, ans=0.0 2023-12-23 21:35:25,465 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.254e+01 3.705e+01 3.872e+01 4.110e+01 5.093e+01, threshold=7.744e+01, percent-clipped=0.0 2023-12-23 21:35:30,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1343946.6666666667, ans=0.0 2023-12-23 21:35:46,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1344013.3333333333, ans=0.125 2023-12-23 21:35:52,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1344080.0, ans=0.125 2023-12-23 21:35:59,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1344146.6666666667, ans=0.0 2023-12-23 21:36:00,020 INFO [train.py:886] (1/4) Epoch 43, batch 1450, loss[loss=0.01237, audio_tagging_loss=0.01237, over 25000.00 frames. ], tot_loss[loss=0.01126, audio_tagging_loss=0.01126, over 4953640.55 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:36:04,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1344146.6666666667, ans=0.2 2023-12-23 21:36:12,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1344213.3333333333, ans=0.0 2023-12-23 21:36:19,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1344213.3333333333, ans=0.2 2023-12-23 21:36:19,342 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.45 vs. limit=12.0 2023-12-23 21:36:29,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1344280.0, ans=0.1 2023-12-23 21:36:53,279 INFO [train.py:886] (1/4) Epoch 43, batch 1500, loss[loss=0.01215, audio_tagging_loss=0.01215, over 25000.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4957674.47 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:37:08,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1344546.6666666667, ans=0.125 2023-12-23 21:37:09,394 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.371e+01 3.779e+01 3.922e+01 4.118e+01 4.489e+01, threshold=7.843e+01, percent-clipped=0.0 2023-12-23 21:37:29,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1344680.0, ans=0.1 2023-12-23 21:37:42,886 INFO [train.py:886] (1/4) Epoch 43, batch 1550, loss[loss=0.0106, audio_tagging_loss=0.0106, over 24750.00 frames. ], tot_loss[loss=0.01141, audio_tagging_loss=0.01141, over 4960893.39 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:37:53,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1344813.3333333333, ans=0.125 2023-12-23 21:37:56,685 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.95 vs. limit=15.0 2023-12-23 21:37:59,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1344880.0, ans=0.2 2023-12-23 21:38:36,343 INFO [train.py:886] (1/4) Epoch 43, batch 1600, loss[loss=0.01085, audio_tagging_loss=0.01085, over 24750.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4957811.75 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:38:46,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1345213.3333333333, ans=0.0 2023-12-23 21:38:53,036 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.409e+01 3.814e+01 3.969e+01 4.149e+01 4.804e+01, threshold=7.938e+01, percent-clipped=0.0 2023-12-23 21:39:12,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1345346.6666666667, ans=0.0 2023-12-23 21:39:17,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1345413.3333333333, ans=0.125 2023-12-23 21:39:18,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1345413.3333333333, ans=0.0 2023-12-23 21:39:28,158 INFO [train.py:886] (1/4) Epoch 43, batch 1650, loss[loss=0.007971, audio_tagging_loss=0.007971, over 25000.00 frames. ], tot_loss[loss=0.01144, audio_tagging_loss=0.01144, over 4952671.55 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:39:35,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1345480.0, ans=0.125 2023-12-23 21:39:35,978 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.41 vs. limit=22.5 2023-12-23 21:40:00,515 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.42 vs. limit=15.0 2023-12-23 21:40:05,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1345680.0, ans=10.0 2023-12-23 21:40:15,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1345746.6666666667, ans=0.0 2023-12-23 21:40:19,279 INFO [train.py:886] (1/4) Epoch 43, batch 1700, loss[loss=0.011, audio_tagging_loss=0.011, over 24750.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4954561.03 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:40:32,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1345880.0, ans=0.125 2023-12-23 21:40:38,053 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.392e+01 3.780e+01 3.982e+01 4.194e+01 5.083e+01, threshold=7.964e+01, percent-clipped=0.0 2023-12-23 21:40:47,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1345946.6666666667, ans=0.1 2023-12-23 21:40:49,359 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.95 vs. limit=6.0 2023-12-23 21:40:57,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1346013.3333333333, ans=0.0 2023-12-23 21:41:07,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1346080.0, ans=0.0 2023-12-23 21:41:07,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1346080.0, ans=0.125 2023-12-23 21:41:11,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1346146.6666666667, ans=0.125 2023-12-23 21:41:12,544 INFO [train.py:886] (1/4) Epoch 43, batch 1750, loss[loss=0.0123, audio_tagging_loss=0.0123, over 25000.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4957391.97 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:41:14,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1346146.6666666667, ans=0.2 2023-12-23 21:41:17,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1346146.6666666667, ans=0.125 2023-12-23 21:41:37,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1346280.0, ans=0.125 2023-12-23 21:41:42,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1346346.6666666667, ans=0.0 2023-12-23 21:41:51,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1346346.6666666667, ans=0.125 2023-12-23 21:42:02,809 INFO [train.py:886] (1/4) Epoch 43, batch 1800, loss[loss=0.01105, audio_tagging_loss=0.01105, over 25000.00 frames. ], tot_loss[loss=0.01124, audio_tagging_loss=0.01124, over 4961567.95 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:42:15,504 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:42:18,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1346546.6666666667, ans=0.125 2023-12-23 21:42:21,660 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.422e+01 3.726e+01 3.927e+01 4.030e+01 4.598e+01, threshold=7.853e+01, percent-clipped=0.0 2023-12-23 21:42:40,102 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.98 vs. limit=10.0 2023-12-23 21:42:48,471 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.18 vs. limit=15.0 2023-12-23 21:42:48,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1346746.6666666667, ans=0.0 2023-12-23 21:42:55,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1346813.3333333333, ans=0.1 2023-12-23 21:42:56,179 INFO [train.py:886] (1/4) Epoch 43, batch 1850, loss[loss=0.00956, audio_tagging_loss=0.00956, over 24750.00 frames. ], tot_loss[loss=0.01128, audio_tagging_loss=0.01128, over 4955821.94 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:42:59,642 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.59 vs. limit=15.0 2023-12-23 21:43:37,256 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.40 vs. limit=6.0 2023-12-23 21:43:42,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1347080.0, ans=0.0 2023-12-23 21:43:43,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1347080.0, ans=0.125 2023-12-23 21:43:48,405 INFO [train.py:886] (1/4) Epoch 43, batch 1900, loss[loss=0.01014, audio_tagging_loss=0.01014, over 24750.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4954089.78 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:43:56,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1347146.6666666667, ans=0.0 2023-12-23 21:43:58,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1347213.3333333333, ans=0.125 2023-12-23 21:44:03,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1347213.3333333333, ans=0.0 2023-12-23 21:44:05,354 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.418e+01 3.802e+01 3.982e+01 4.158e+01 4.820e+01, threshold=7.964e+01, percent-clipped=0.0 2023-12-23 21:44:05,555 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:44:24,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1347346.6666666667, ans=0.125 2023-12-23 21:44:27,223 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2023-12-23 21:44:39,939 INFO [train.py:886] (1/4) Epoch 43, batch 1950, loss[loss=0.0121, audio_tagging_loss=0.0121, over 24750.00 frames. ], tot_loss[loss=0.01127, audio_tagging_loss=0.01127, over 4945977.72 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:44:53,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1347546.6666666667, ans=0.125 2023-12-23 21:44:58,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1347546.6666666667, ans=0.1 2023-12-23 21:45:07,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1347613.3333333333, ans=0.125 2023-12-23 21:45:09,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1347613.3333333333, ans=0.2 2023-12-23 21:45:16,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1347680.0, ans=0.125 2023-12-23 21:45:24,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1347746.6666666667, ans=0.0 2023-12-23 21:45:32,965 INFO [train.py:886] (1/4) Epoch 43, batch 2000, loss[loss=0.0112, audio_tagging_loss=0.0112, over 24750.00 frames. ], tot_loss[loss=0.01122, audio_tagging_loss=0.01122, over 4950424.87 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:45:39,651 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:45:50,015 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.411e+01 3.774e+01 3.907e+01 4.123e+01 6.126e+01, threshold=7.815e+01, percent-clipped=0.0 2023-12-23 21:45:53,359 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.84 vs. limit=22.5 2023-12-23 21:46:07,021 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2023-12-23 21:46:07,390 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.40 vs. limit=5.0 2023-12-23 21:46:20,992 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:46:25,181 INFO [train.py:886] (1/4) Epoch 43, batch 2050, loss[loss=0.01154, audio_tagging_loss=0.01154, over 25000.00 frames. ], tot_loss[loss=0.01123, audio_tagging_loss=0.01123, over 4951751.87 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 64.0 2023-12-23 21:47:06,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1348413.3333333333, ans=0.0 2023-12-23 21:47:17,046 INFO [train.py:886] (1/4) Epoch 43, batch 2100, loss[loss=0.01158, audio_tagging_loss=0.01158, over 21744.00 frames. ], tot_loss[loss=0.01118, audio_tagging_loss=0.01118, over 4946232.94 frames. ], batch size: 107, lr: 2.50e-03, grad_scale: 64.0 2023-12-23 21:47:36,274 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.344e+01 3.711e+01 3.880e+01 4.032e+01 4.676e+01, threshold=7.761e+01, percent-clipped=0.0 2023-12-23 21:47:36,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.57 vs. limit=22.5 2023-12-23 21:47:44,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1348613.3333333333, ans=0.125 2023-12-23 21:47:44,591 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.81 vs. limit=22.5 2023-12-23 21:47:45,230 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.03 vs. limit=15.0 2023-12-23 21:48:02,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1348746.6666666667, ans=0.125 2023-12-23 21:48:02,727 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.28 vs. limit=12.0 2023-12-23 21:48:10,534 INFO [train.py:886] (1/4) Epoch 43, batch 2150, loss[loss=0.0117, audio_tagging_loss=0.0117, over 24750.00 frames. ], tot_loss[loss=0.0112, audio_tagging_loss=0.0112, over 4953349.45 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 64.0 2023-12-23 21:48:11,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1348813.3333333333, ans=0.2 2023-12-23 21:48:34,432 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.80 vs. limit=15.0 2023-12-23 21:48:35,625 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.88 vs. limit=15.0 2023-12-23 21:48:41,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1349013.3333333333, ans=15.0 2023-12-23 21:48:53,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1349080.0, ans=0.0 2023-12-23 21:49:01,981 INFO [train.py:886] (1/4) Epoch 43, batch 2200, loss[loss=0.01388, audio_tagging_loss=0.01388, over 24750.00 frames. ], tot_loss[loss=0.01132, audio_tagging_loss=0.01132, over 4950911.33 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 64.0 2023-12-23 21:49:10,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1349146.6666666667, ans=0.04949747468305833 2023-12-23 21:49:17,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1349213.3333333333, ans=0.1 2023-12-23 21:49:20,202 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.408e+01 3.810e+01 3.972e+01 4.173e+01 4.722e+01, threshold=7.943e+01, percent-clipped=0.0 2023-12-23 21:49:37,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1349346.6666666667, ans=0.125 2023-12-23 21:49:38,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1349346.6666666667, ans=0.0 2023-12-23 21:49:50,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1349413.3333333333, ans=0.0 2023-12-23 21:49:54,814 INFO [train.py:886] (1/4) Epoch 43, batch 2250, loss[loss=0.00895, audio_tagging_loss=0.00895, over 25000.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4948568.02 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 64.0 2023-12-23 21:50:01,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1349480.0, ans=0.125 2023-12-23 21:50:08,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1349546.6666666667, ans=0.0 2023-12-23 21:50:17,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1349613.3333333333, ans=0.2 2023-12-23 21:50:30,662 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.01 vs. limit=12.0 2023-12-23 21:50:34,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=1349680.0, ans=22.5 2023-12-23 21:50:48,228 INFO [train.py:886] (1/4) Epoch 43, batch 2300, loss[loss=0.01251, audio_tagging_loss=0.01251, over 25000.00 frames. ], tot_loss[loss=0.01132, audio_tagging_loss=0.01132, over 4950237.46 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:50:48,834 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=15.0 2023-12-23 21:51:04,570 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.400e+01 3.719e+01 3.860e+01 4.088e+01 4.787e+01, threshold=7.721e+01, percent-clipped=0.0 2023-12-23 21:51:09,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1349946.6666666667, ans=0.125 2023-12-23 21:51:14,983 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.05 vs. limit=15.0 2023-12-23 21:51:25,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1350013.3333333333, ans=0.125 2023-12-23 21:51:30,841 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.52 vs. limit=12.0 2023-12-23 21:51:36,216 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.59 vs. limit=6.0 2023-12-23 21:51:38,600 INFO [train.py:886] (1/4) Epoch 43, batch 2350, loss[loss=0.01038, audio_tagging_loss=0.01038, over 24750.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4949304.06 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:51:51,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1350213.3333333333, ans=0.0 2023-12-23 21:51:51,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1350213.3333333333, ans=0.0 2023-12-23 21:52:30,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1350480.0, ans=0.1 2023-12-23 21:52:31,284 INFO [train.py:886] (1/4) Epoch 43, batch 2400, loss[loss=0.01182, audio_tagging_loss=0.01182, over 21540.00 frames. ], tot_loss[loss=0.01114, audio_tagging_loss=0.01114, over 4944831.28 frames. ], batch size: 107, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:52:43,118 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.43 vs. limit=10.0 2023-12-23 21:52:44,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1350546.6666666667, ans=0.05 2023-12-23 21:52:47,483 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.368e+01 3.759e+01 3.931e+01 4.096e+01 4.502e+01, threshold=7.861e+01, percent-clipped=0.0 2023-12-23 21:53:22,778 INFO [train.py:886] (1/4) Epoch 43, batch 2450, loss[loss=0.01176, audio_tagging_loss=0.01176, over 25000.00 frames. ], tot_loss[loss=0.01118, audio_tagging_loss=0.01118, over 4950267.05 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:53:30,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1350813.3333333333, ans=0.125 2023-12-23 21:53:57,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1351013.3333333333, ans=0.1 2023-12-23 21:54:13,472 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.94 vs. limit=15.0 2023-12-23 21:54:14,888 INFO [train.py:886] (1/4) Epoch 43, batch 2500, loss[loss=0.01365, audio_tagging_loss=0.01365, over 24750.00 frames. ], tot_loss[loss=0.01127, audio_tagging_loss=0.01127, over 4943010.76 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:54:26,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1351213.3333333333, ans=0.0 2023-12-23 21:54:32,406 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.451e+01 3.815e+01 4.045e+01 4.230e+01 4.864e+01, threshold=8.091e+01, percent-clipped=0.0 2023-12-23 21:54:39,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1351280.0, ans=0.0 2023-12-23 21:54:49,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1351346.6666666667, ans=0.125 2023-12-23 21:55:07,179 INFO [train.py:886] (1/4) Epoch 43, batch 2550, loss[loss=0.01044, audio_tagging_loss=0.01044, over 24750.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4939688.61 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:55:20,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1351546.6666666667, ans=0.125 2023-12-23 21:55:37,106 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.87 vs. limit=15.0 2023-12-23 21:55:41,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1351680.0, ans=0.0 2023-12-23 21:55:41,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1351680.0, ans=0.125 2023-12-23 21:55:57,346 INFO [train.py:886] (1/4) Epoch 43, batch 2600, loss[loss=0.01009, audio_tagging_loss=0.01009, over 25000.00 frames. ], tot_loss[loss=0.01123, audio_tagging_loss=0.01123, over 4929267.84 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:56:03,833 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:56:16,368 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.494e+01 3.793e+01 3.971e+01 4.202e+01 4.613e+01, threshold=7.943e+01, percent-clipped=0.0 2023-12-23 21:56:31,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1352013.3333333333, ans=0.2 2023-12-23 21:56:38,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1352080.0, ans=0.125 2023-12-23 21:56:50,206 INFO [train.py:886] (1/4) Epoch 43, batch 2650, loss[loss=0.01079, audio_tagging_loss=0.01079, over 24750.00 frames. ], tot_loss[loss=0.01117, audio_tagging_loss=0.01117, over 4935679.24 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:57:07,437 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.40 vs. limit=15.0 2023-12-23 21:57:09,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1352280.0, ans=0.125 2023-12-23 21:57:12,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1352280.0, ans=0.125 2023-12-23 21:57:18,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1352280.0, ans=0.125 2023-12-23 21:57:19,195 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.35 vs. limit=22.5 2023-12-23 21:57:32,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1352413.3333333333, ans=0.1 2023-12-23 21:57:41,550 INFO [train.py:886] (1/4) Epoch 43, batch 2700, loss[loss=0.0106, audio_tagging_loss=0.0106, over 25000.00 frames. ], tot_loss[loss=0.01115, audio_tagging_loss=0.01115, over 4939541.13 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:57:54,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1352546.6666666667, ans=0.0 2023-12-23 21:57:59,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1352546.6666666667, ans=0.2 2023-12-23 21:57:59,803 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.334e+01 3.701e+01 3.871e+01 4.080e+01 4.871e+01, threshold=7.742e+01, percent-clipped=0.0 2023-12-23 21:58:34,121 INFO [train.py:886] (1/4) Epoch 43, batch 2750, loss[loss=0.01243, audio_tagging_loss=0.01243, over 25000.00 frames. ], tot_loss[loss=0.01116, audio_tagging_loss=0.01116, over 4944478.18 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:58:43,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1352880.0, ans=0.1 2023-12-23 21:58:57,591 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.39 vs. limit=12.0 2023-12-23 21:59:01,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=1352946.6666666667, ans=10.0 2023-12-23 21:59:06,806 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.53 vs. limit=15.0 2023-12-23 21:59:12,480 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.67 vs. limit=15.0 2023-12-23 21:59:13,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1353013.3333333333, ans=0.5 2023-12-23 21:59:16,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1353080.0, ans=0.125 2023-12-23 21:59:20,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1353080.0, ans=0.125 2023-12-23 21:59:24,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1353080.0, ans=0.0 2023-12-23 21:59:26,441 INFO [train.py:886] (1/4) Epoch 43, batch 2800, loss[loss=0.01463, audio_tagging_loss=0.01463, over 24750.00 frames. ], tot_loss[loss=0.01122, audio_tagging_loss=0.01122, over 4946472.29 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:59:43,153 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.443e+01 3.835e+01 3.982e+01 4.160e+01 5.061e+01, threshold=7.963e+01, percent-clipped=0.0 2023-12-23 21:59:53,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=1353280.0, ans=0.2 2023-12-23 22:00:18,028 INFO [train.py:886] (1/4) Epoch 43, batch 2850, loss[loss=0.01203, audio_tagging_loss=0.01203, over 24750.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4941790.88 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:00:21,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1353480.0, ans=0.125 2023-12-23 22:00:31,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1353546.6666666667, ans=0.125 2023-12-23 22:00:33,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1353546.6666666667, ans=0.2 2023-12-23 22:01:06,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1353746.6666666667, ans=0.2 2023-12-23 22:01:10,329 INFO [train.py:886] (1/4) Epoch 43, batch 2900, loss[loss=0.01165, audio_tagging_loss=0.01165, over 25000.00 frames. ], tot_loss[loss=0.01132, audio_tagging_loss=0.01132, over 4939947.31 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:01:10,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=1353813.3333333333, ans=0.05 2023-12-23 22:01:15,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1353813.3333333333, ans=0.125 2023-12-23 22:01:17,208 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 22:01:24,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1353880.0, ans=0.125 2023-12-23 22:01:26,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1353880.0, ans=0.125 2023-12-23 22:01:28,113 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.361e+01 3.812e+01 3.915e+01 4.095e+01 4.854e+01, threshold=7.829e+01, percent-clipped=0.0 2023-12-23 22:01:28,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1353880.0, ans=0.125 2023-12-23 22:01:35,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1353946.6666666667, ans=0.0 2023-12-23 22:01:39,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1353946.6666666667, ans=0.1 2023-12-23 22:01:49,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1354013.3333333333, ans=0.125 2023-12-23 22:01:49,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1354013.3333333333, ans=0.0 2023-12-23 22:01:50,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1354080.0, ans=0.2 2023-12-23 22:01:58,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1354080.0, ans=0.125 2023-12-23 22:02:02,228 INFO [train.py:886] (1/4) Epoch 43, batch 2950, loss[loss=0.01175, audio_tagging_loss=0.01175, over 25000.00 frames. ], tot_loss[loss=0.01122, audio_tagging_loss=0.01122, over 4944335.65 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:02:05,260 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.00 vs. limit=12.0 2023-12-23 22:02:11,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1354213.3333333333, ans=0.07 2023-12-23 22:02:22,910 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.32 vs. limit=15.0 2023-12-23 22:02:26,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1354280.0, ans=0.125 2023-12-23 22:02:27,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1354280.0, ans=0.5 2023-12-23 22:02:51,275 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 22:02:51,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1354413.3333333333, ans=0.0 2023-12-23 22:02:53,852 INFO [train.py:886] (1/4) Epoch 43, batch 3000, loss[loss=0.009765, audio_tagging_loss=0.009765, over 24750.00 frames. ], tot_loss[loss=0.01118, audio_tagging_loss=0.01118, over 4950406.74 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:02:53,852 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 22:03:01,501 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.9518, 5.6292, 5.6501, 5.8767], device='cuda:1') 2023-12-23 22:03:15,298 INFO [train.py:917] (1/4) Epoch 43, validation: loss=0.03559, audio_tagging_loss=0.03559, over 3737520.00 frames. 2023-12-23 22:03:15,299 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 22:03:21,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1354480.0, ans=0.1 2023-12-23 22:03:31,978 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.311e+01 3.762e+01 3.904e+01 4.055e+01 4.746e+01, threshold=7.807e+01, percent-clipped=0.0 2023-12-23 22:03:32,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1354546.6666666667, ans=0.125 2023-12-23 22:03:32,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1354546.6666666667, ans=0.0 2023-12-23 22:03:33,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1354546.6666666667, ans=0.125 2023-12-23 22:03:35,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1354613.3333333333, ans=0.125 2023-12-23 22:03:47,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1354680.0, ans=0.125 2023-12-23 22:03:48,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1354680.0, ans=0.0 2023-12-23 22:04:00,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1354746.6666666667, ans=0.125 2023-12-23 22:04:05,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1354813.3333333333, ans=0.125 2023-12-23 22:04:05,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1354813.3333333333, ans=0.2 2023-12-23 22:04:07,162 INFO [train.py:886] (1/4) Epoch 43, batch 3050, loss[loss=0.01295, audio_tagging_loss=0.01295, over 25000.00 frames. ], tot_loss[loss=0.01121, audio_tagging_loss=0.01121, over 4950916.20 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:04:12,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1354813.3333333333, ans=0.125 2023-12-23 22:04:12,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=1354813.3333333333, ans=22.5 2023-12-23 22:04:22,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1354880.0, ans=0.125 2023-12-23 22:04:44,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1355013.3333333333, ans=0.0 2023-12-23 22:04:44,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1355013.3333333333, ans=0.2 2023-12-23 22:04:58,883 INFO [train.py:886] (1/4) Epoch 43, batch 3100, loss[loss=0.01266, audio_tagging_loss=0.01266, over 25000.00 frames. ], tot_loss[loss=0.01126, audio_tagging_loss=0.01126, over 4951434.20 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:05:06,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1355146.6666666667, ans=0.2 2023-12-23 22:05:17,031 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.431e+01 3.827e+01 3.980e+01 4.191e+01 5.132e+01, threshold=7.960e+01, percent-clipped=0.0 2023-12-23 22:05:31,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1355346.6666666667, ans=0.1 2023-12-23 22:05:51,041 INFO [train.py:886] (1/4) Epoch 43, batch 3150, loss[loss=0.01321, audio_tagging_loss=0.01321, over 24750.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4950048.17 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:06:20,601 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.91 vs. limit=15.0 2023-12-23 22:06:22,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1355680.0, ans=0.0 2023-12-23 22:06:23,727 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.11 vs. limit=12.0 2023-12-23 22:06:34,948 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.05 vs. limit=10.0 2023-12-23 22:06:42,745 INFO [train.py:886] (1/4) Epoch 43, batch 3200, loss[loss=0.009034, audio_tagging_loss=0.009034, over 25000.00 frames. ], tot_loss[loss=0.01127, audio_tagging_loss=0.01127, over 4947696.59 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:06:49,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1355813.3333333333, ans=0.1 2023-12-23 22:06:51,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1355813.3333333333, ans=0.125 2023-12-23 22:06:55,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1355880.0, ans=0.0 2023-12-23 22:07:00,364 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.493e+01 3.795e+01 4.020e+01 4.191e+01 4.610e+01, threshold=8.041e+01, percent-clipped=0.0 2023-12-23 22:07:03,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1355946.6666666667, ans=0.125 2023-12-23 22:07:18,926 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.81 vs. limit=15.0 2023-12-23 22:07:23,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1356013.3333333333, ans=0.2 2023-12-23 22:07:26,460 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.83 vs. limit=10.0 2023-12-23 22:07:33,267 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.59 vs. limit=15.0 2023-12-23 22:07:34,563 INFO [train.py:886] (1/4) Epoch 43, batch 3250, loss[loss=0.01069, audio_tagging_loss=0.01069, over 25000.00 frames. ], tot_loss[loss=0.01121, audio_tagging_loss=0.01121, over 4950470.70 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:07:48,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1356213.3333333333, ans=0.1 2023-12-23 22:08:04,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1356280.0, ans=0.0 2023-12-23 22:08:08,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1356346.6666666667, ans=0.125 2023-12-23 22:08:17,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1356413.3333333333, ans=0.125 2023-12-23 22:08:27,705 INFO [train.py:886] (1/4) Epoch 43, batch 3300, loss[loss=0.0103, audio_tagging_loss=0.0103, over 25000.00 frames. ], tot_loss[loss=0.0111, audio_tagging_loss=0.0111, over 4948794.53 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:08:30,209 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.14 vs. limit=15.0 2023-12-23 22:08:43,758 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.382e+01 3.775e+01 3.931e+01 4.135e+01 5.251e+01, threshold=7.863e+01, percent-clipped=0.0 2023-12-23 22:08:47,562 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 22:08:47,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1356613.3333333333, ans=0.1 2023-12-23 22:09:01,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1356680.0, ans=0.0 2023-12-23 22:09:16,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1356813.3333333333, ans=0.125 2023-12-23 22:09:17,301 INFO [train.py:886] (1/4) Epoch 43, batch 3350, loss[loss=0.01087, audio_tagging_loss=0.01087, over 25000.00 frames. ], tot_loss[loss=0.01106, audio_tagging_loss=0.01106, over 4951873.05 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:09:22,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1356813.3333333333, ans=0.125 2023-12-23 22:09:28,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1356880.0, ans=0.125 2023-12-23 22:09:35,952 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.75 vs. limit=22.5 2023-12-23 22:09:46,494 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.92 vs. limit=15.0 2023-12-23 22:09:47,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1356946.6666666667, ans=0.0 2023-12-23 22:09:48,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1357013.3333333333, ans=0.2 2023-12-23 22:09:54,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1357013.3333333333, ans=0.125 2023-12-23 22:10:10,561 INFO [train.py:886] (1/4) Epoch 43, batch 3400, loss[loss=0.01157, audio_tagging_loss=0.01157, over 24750.00 frames. ], tot_loss[loss=0.01106, audio_tagging_loss=0.01106, over 4946647.29 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:10:14,792 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=24.65 vs. limit=15.0 2023-12-23 22:10:17,585 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.88 vs. limit=15.0 2023-12-23 22:10:26,833 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.53 vs. limit=22.5 2023-12-23 22:10:27,284 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.417e+01 3.821e+01 3.927e+01 4.185e+01 4.649e+01, threshold=7.854e+01, percent-clipped=0.0 2023-12-23 22:10:27,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1357213.3333333333, ans=0.125 2023-12-23 22:10:33,694 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 22:10:38,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1357280.0, ans=0.0 2023-12-23 22:10:39,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.19 vs. limit=15.0 2023-12-23 22:10:39,274 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.82 vs. limit=15.0 2023-12-23 22:10:48,119 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.10 vs. limit=15.0 2023-12-23 22:11:02,485 INFO [train.py:886] (1/4) Epoch 43, batch 3450, loss[loss=0.01281, audio_tagging_loss=0.01281, over 24750.00 frames. ], tot_loss[loss=0.0112, audio_tagging_loss=0.0112, over 4944497.70 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:11:23,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1357613.3333333333, ans=0.125 2023-12-23 22:11:24,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2023-12-23 22:11:28,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1357613.3333333333, ans=0.1 2023-12-23 22:11:28,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1357613.3333333333, ans=0.0 2023-12-23 22:11:34,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1357680.0, ans=0.125 2023-12-23 22:11:54,014 INFO [train.py:886] (1/4) Epoch 43, batch 3500, loss[loss=0.01084, audio_tagging_loss=0.01084, over 24750.00 frames. ], tot_loss[loss=0.01127, audio_tagging_loss=0.01127, over 4937000.18 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 32.0 2023-12-23 22:11:54,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=15.0 2023-12-23 22:12:14,014 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.495e+01 3.870e+01 4.030e+01 4.254e+01 6.630e+01, threshold=8.060e+01, percent-clipped=0.0 2023-12-23 22:12:36,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1358080.0, ans=0.0 2023-12-23 22:12:47,457 INFO [train.py:886] (1/4) Epoch 43, batch 3550, loss[loss=0.01055, audio_tagging_loss=0.01055, over 25000.00 frames. ], tot_loss[loss=0.01118, audio_tagging_loss=0.01118, over 4939549.33 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 32.0 2023-12-23 22:12:47,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1358146.6666666667, ans=0.07 2023-12-23 22:12:49,105 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.42 vs. limit=22.5 2023-12-23 22:12:58,155 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=15.0 2023-12-23 22:13:00,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1358213.3333333333, ans=0.2 2023-12-23 22:13:10,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1358280.0, ans=0.125 2023-12-23 22:13:11,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1358280.0, ans=0.125 2023-12-23 22:13:20,166 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.63 vs. limit=10.0 2023-12-23 22:13:20,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1358346.6666666667, ans=0.0 2023-12-23 22:13:30,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1358413.3333333333, ans=0.07 2023-12-23 22:13:38,454 INFO [train.py:886] (1/4) Epoch 43, batch 3600, loss[loss=0.01339, audio_tagging_loss=0.01339, over 25000.00 frames. ], tot_loss[loss=0.01113, audio_tagging_loss=0.01113, over 4942825.09 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 32.0 2023-12-23 22:13:39,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1358480.0, ans=0.2 2023-12-23 22:13:46,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1358480.0, ans=0.125 2023-12-23 22:13:57,055 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.362e+01 3.773e+01 3.900e+01 4.130e+01 5.213e+01, threshold=7.800e+01, percent-clipped=0.0 2023-12-23 22:14:12,467 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 22:14:18,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1358680.0, ans=0.125 2023-12-23 22:14:20,449 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.77 vs. limit=15.0 2023-12-23 22:14:30,179 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.68 vs. limit=15.0 2023-12-23 22:14:30,416 INFO [train.py:886] (1/4) Epoch 43, batch 3650, loss[loss=0.01334, audio_tagging_loss=0.01334, over 25000.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4939649.70 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 32.0 2023-12-23 22:14:35,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1358813.3333333333, ans=0.0 2023-12-23 22:15:11,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1359080.0, ans=0.1 2023-12-23 22:15:22,585 INFO [train.py:886] (1/4) Epoch 43, batch 3700, loss[loss=0.01036, audio_tagging_loss=0.01036, over 25000.00 frames. ], tot_loss[loss=0.01114, audio_tagging_loss=0.01114, over 4943066.59 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 32.0 2023-12-23 22:15:22,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1359146.6666666667, ans=0.125 2023-12-23 22:15:24,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1359146.6666666667, ans=0.125 2023-12-23 22:15:41,171 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.439e+01 3.800e+01 4.029e+01 4.222e+01 4.954e+01, threshold=8.057e+01, percent-clipped=0.0 2023-12-23 22:15:56,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1359346.6666666667, ans=0.0 2023-12-23 22:15:57,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1359346.6666666667, ans=0.125 2023-12-23 22:16:04,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1359413.3333333333, ans=0.04949747468305833 2023-12-23 22:16:06,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1359413.3333333333, ans=0.1 2023-12-23 22:16:14,315 INFO [train.py:886] (1/4) Epoch 43, batch 3750, loss[loss=0.01283, audio_tagging_loss=0.01283, over 24750.00 frames. ], tot_loss[loss=0.01122, audio_tagging_loss=0.01122, over 4937037.83 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 32.0 2023-12-23 22:16:14,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=1359480.0, ans=12.0 2023-12-23 22:16:34,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1359546.6666666667, ans=0.5 2023-12-23 22:16:49,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1359680.0, ans=0.0 2023-12-23 22:16:53,289 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=15.0 2023-12-23 22:16:53,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1359680.0, ans=0.125 2023-12-23 22:17:06,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1359813.3333333333, ans=0.1 2023-12-23 22:17:07,194 INFO [train.py:886] (1/4) Epoch 43, batch 3800, loss[loss=0.009365, audio_tagging_loss=0.009365, over 22146.00 frames. ], tot_loss[loss=0.01128, audio_tagging_loss=0.01128, over 4934588.88 frames. ], batch size: 107, lr: 2.49e-03, grad_scale: 32.0 2023-12-23 22:17:17,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1359880.0, ans=0.1 2023-12-23 22:17:24,806 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.435e+01 3.811e+01 3.969e+01 4.137e+01 5.499e+01, threshold=7.937e+01, percent-clipped=0.0 2023-12-23 22:17:26,277 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.90 vs. limit=15.0 2023-12-23 22:17:48,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1360013.3333333333, ans=0.1 2023-12-23 22:17:53,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1360080.0, ans=0.2 2023-12-23 22:17:54,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1360080.0, ans=0.125 2023-12-23 22:17:55,715 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.98 vs. limit=15.0 2023-12-23 22:17:59,946 INFO [train.py:886] (1/4) Epoch 43, batch 3850, loss[loss=0.009843, audio_tagging_loss=0.009843, over 25000.00 frames. ], tot_loss[loss=0.01122, audio_tagging_loss=0.01122, over 4939946.90 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 32.0 2023-12-23 22:18:00,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1360146.6666666667, ans=0.05 2023-12-23 22:18:03,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1360146.6666666667, ans=0.2 2023-12-23 22:18:21,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1360280.0, ans=0.2 2023-12-23 22:18:52,576 INFO [train.py:886] (1/4) Epoch 43, batch 3900, loss[loss=0.01139, audio_tagging_loss=0.01139, over 25000.00 frames. ], tot_loss[loss=0.01118, audio_tagging_loss=0.01118, over 4944503.36 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 32.0 2023-12-23 22:19:11,867 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.297e+01 3.810e+01 3.949e+01 4.143e+01 4.595e+01, threshold=7.898e+01, percent-clipped=0.0 2023-12-23 22:19:16,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1360613.3333333333, ans=0.0 2023-12-23 22:19:25,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1360680.0, ans=0.125 2023-12-23 22:19:30,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1360680.0, ans=0.125 2023-12-23 22:19:45,220 INFO [train.py:886] (1/4) Epoch 43, batch 3950, loss[loss=0.009641, audio_tagging_loss=0.009641, over 25000.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4946623.74 frames. ], batch size: 100, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:19:55,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1360880.0, ans=0.125 2023-12-23 22:20:12,461 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.16 vs. limit=12.0 2023-12-23 22:20:36,390 INFO [train.py:886] (1/4) Epoch 43, batch 4000, loss[loss=0.009921, audio_tagging_loss=0.009921, over 25000.00 frames. ], tot_loss[loss=0.01109, audio_tagging_loss=0.01109, over 4947219.81 frames. ], batch size: 100, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:20:43,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1361146.6666666667, ans=0.0 2023-12-23 22:20:54,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1361213.3333333333, ans=0.125 2023-12-23 22:20:55,674 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.315e+01 3.773e+01 3.926e+01 4.187e+01 4.858e+01, threshold=7.853e+01, percent-clipped=0.0 2023-12-23 22:20:55,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1361213.3333333333, ans=0.0 2023-12-23 22:21:13,893 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.50 vs. limit=22.5 2023-12-23 22:21:25,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1361413.3333333333, ans=0.125 2023-12-23 22:21:28,983 INFO [train.py:886] (1/4) Epoch 43, batch 4050, loss[loss=0.009281, audio_tagging_loss=0.009281, over 24750.00 frames. ], tot_loss[loss=0.01119, audio_tagging_loss=0.01119, over 4947495.47 frames. ], batch size: 99, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:21:32,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1361480.0, ans=0.0 2023-12-23 22:21:36,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1361480.0, ans=0.125 2023-12-23 22:21:52,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1361613.3333333333, ans=0.125 2023-12-23 22:21:53,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1361613.3333333333, ans=0.0 2023-12-23 22:21:59,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1361680.0, ans=0.0 2023-12-23 22:22:06,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1361680.0, ans=0.0 2023-12-23 22:22:09,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1361746.6666666667, ans=0.125 2023-12-23 22:22:09,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1361746.6666666667, ans=0.125 2023-12-23 22:22:17,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1361746.6666666667, ans=0.2 2023-12-23 22:22:21,377 INFO [train.py:886] (1/4) Epoch 43, batch 4100, loss[loss=0.009792, audio_tagging_loss=0.009792, over 24750.00 frames. ], tot_loss[loss=0.01124, audio_tagging_loss=0.01124, over 4950342.48 frames. ], batch size: 99, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:22:26,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1361813.3333333333, ans=0.0 2023-12-23 22:22:39,746 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.438e+01 3.848e+01 3.982e+01 4.277e+01 4.961e+01, threshold=7.964e+01, percent-clipped=0.0 2023-12-23 22:23:00,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1362013.3333333333, ans=0.125 2023-12-23 22:23:12,805 INFO [train.py:886] (1/4) Epoch 43, batch 4150, loss[loss=0.01048, audio_tagging_loss=0.01048, over 25000.00 frames. ], tot_loss[loss=0.01121, audio_tagging_loss=0.01121, over 4948001.91 frames. ], batch size: 100, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:23:13,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1362146.6666666667, ans=0.125 2023-12-23 22:23:13,455 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.44 vs. limit=10.0 2023-12-23 22:23:16,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1362146.6666666667, ans=0.5 2023-12-23 22:23:38,927 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.40 vs. limit=15.0 2023-12-23 22:23:46,458 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.60 vs. limit=15.0 2023-12-23 22:23:47,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=1362346.6666666667, ans=0.05 2023-12-23 22:23:56,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1362413.3333333333, ans=0.125 2023-12-23 22:24:05,202 INFO [train.py:886] (1/4) Epoch 43, batch 4200, loss[loss=0.01083, audio_tagging_loss=0.01083, over 25000.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4951611.79 frames. ], batch size: 100, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:24:21,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1362546.6666666667, ans=0.1 2023-12-23 22:24:23,809 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.368e+01 3.761e+01 3.920e+01 4.090e+01 4.676e+01, threshold=7.840e+01, percent-clipped=0.0 2023-12-23 22:24:34,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1362613.3333333333, ans=0.1 2023-12-23 22:24:40,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1362680.0, ans=0.0 2023-12-23 22:24:42,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1362680.0, ans=0.125 2023-12-23 22:24:52,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1362746.6666666667, ans=0.125 2023-12-23 22:24:54,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1362746.6666666667, ans=0.125 2023-12-23 22:24:57,376 INFO [train.py:886] (1/4) Epoch 43, batch 4250, loss[loss=0.01055, audio_tagging_loss=0.01055, over 24750.00 frames. ], tot_loss[loss=0.01109, audio_tagging_loss=0.01109, over 4949989.74 frames. ], batch size: 99, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:24:58,482 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 22:25:07,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1362880.0, ans=0.125 2023-12-23 22:25:07,590 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.82 vs. limit=15.0 2023-12-23 22:25:31,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1363013.3333333333, ans=0.1 2023-12-23 22:25:47,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1363080.0, ans=0.125 2023-12-23 22:25:49,274 INFO [train.py:886] (1/4) Epoch 43, batch 4300, loss[loss=0.01317, audio_tagging_loss=0.01317, over 25000.00 frames. ], tot_loss[loss=0.01111, audio_tagging_loss=0.01111, over 4954809.06 frames. ], batch size: 100, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:25:53,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1363146.6666666667, ans=0.125 2023-12-23 22:25:54,684 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2023-12-23 22:26:08,483 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.459e+01 3.750e+01 3.975e+01 4.135e+01 4.734e+01, threshold=7.950e+01, percent-clipped=0.0 2023-12-23 22:26:18,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1363280.0, ans=0.1 2023-12-23 22:26:20,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1363346.6666666667, ans=0.125 2023-12-23 22:26:32,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1363413.3333333333, ans=0.2 2023-12-23 22:26:41,262 INFO [train.py:886] (1/4) Epoch 43, batch 4350, loss[loss=0.01287, audio_tagging_loss=0.01287, over 24750.00 frames. ], tot_loss[loss=0.01121, audio_tagging_loss=0.01121, over 4957842.98 frames. ], batch size: 99, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:26:51,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.58 vs. limit=22.5 2023-12-23 22:26:55,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1363546.6666666667, ans=0.05 2023-12-23 22:27:07,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1363613.3333333333, ans=0.04949747468305833 2023-12-23 22:27:10,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1363613.3333333333, ans=0.025 2023-12-23 22:27:14,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1363680.0, ans=0.1 2023-12-23 22:27:17,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1363680.0, ans=0.125 2023-12-23 22:27:19,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1363680.0, ans=0.125 2023-12-23 22:27:22,519 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.29 vs. limit=10.0 2023-12-23 22:27:32,591 INFO [train.py:886] (1/4) Epoch 43, batch 4400, loss[loss=0.00905, audio_tagging_loss=0.00905, over 22239.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4953397.81 frames. ], batch size: 107, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:27:52,466 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.368e+01 3.866e+01 4.022e+01 4.177e+01 4.881e+01, threshold=8.045e+01, percent-clipped=0.0 2023-12-23 22:28:04,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1364013.3333333333, ans=0.125 2023-12-23 22:28:14,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1364080.0, ans=0.0 2023-12-23 22:28:25,664 INFO [train.py:886] (1/4) Epoch 43, batch 4450, loss[loss=0.009589, audio_tagging_loss=0.009589, over 25000.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4946562.75 frames. ], batch size: 100, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:28:29,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1364146.6666666667, ans=0.125 2023-12-23 22:28:30,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1364146.6666666667, ans=0.5 2023-12-23 22:28:48,089 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.61 vs. limit=15.0 2023-12-23 22:29:17,692 INFO [train.py:886] (1/4) Epoch 43, batch 4500, loss[loss=0.01178, audio_tagging_loss=0.01178, over 25000.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4946946.27 frames. ], batch size: 100, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:29:17,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1364480.0, ans=0.0 2023-12-23 22:29:22,380 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.97 vs. limit=22.5 2023-12-23 22:29:29,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1364546.6666666667, ans=0.2 2023-12-23 22:29:35,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1364546.6666666667, ans=0.125 2023-12-23 22:29:36,251 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.279e+01 3.827e+01 3.976e+01 4.152e+01 9.618e+01, threshold=7.952e+01, percent-clipped=1.0 2023-12-23 22:29:55,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1364680.0, ans=0.1 2023-12-23 22:30:02,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1364746.6666666667, ans=0.04949747468305833 2023-12-23 22:30:09,425 INFO [train.py:886] (1/4) Epoch 43, batch 4550, loss[loss=0.01145, audio_tagging_loss=0.01145, over 25000.00 frames. ], tot_loss[loss=0.01128, audio_tagging_loss=0.01128, over 4950864.12 frames. ], batch size: 100, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:30:21,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1364880.0, ans=0.0 2023-12-23 22:30:36,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1364946.6666666667, ans=0.125 2023-12-23 22:30:43,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1365013.3333333333, ans=0.09899494936611666 2023-12-23 22:31:02,239 INFO [train.py:886] (1/4) Epoch 43, batch 4600, loss[loss=0.009976, audio_tagging_loss=0.009976, over 25000.00 frames. ], tot_loss[loss=0.01122, audio_tagging_loss=0.01122, over 4954008.31 frames. ], batch size: 100, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:31:07,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1365146.6666666667, ans=0.125 2023-12-23 22:31:19,362 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.512e+01 3.855e+01 4.026e+01 4.238e+01 4.840e+01, threshold=8.052e+01, percent-clipped=0.0 2023-12-23 22:31:23,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1365280.0, ans=0.2 2023-12-23 22:31:26,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1365280.0, ans=0.0 2023-12-23 22:31:31,596 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.63 vs. limit=15.0 2023-12-23 22:31:35,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1365346.6666666667, ans=0.0 2023-12-23 22:31:35,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1365346.6666666667, ans=0.0 2023-12-23 22:31:36,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1365346.6666666667, ans=0.0 2023-12-23 22:31:52,330 INFO [train.py:886] (1/4) Epoch 43, batch 4650, loss[loss=0.00965, audio_tagging_loss=0.00965, over 25000.00 frames. ], tot_loss[loss=0.01123, audio_tagging_loss=0.01123, over 4953887.67 frames. ], batch size: 100, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:32:00,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1365480.0, ans=0.0 2023-12-23 22:32:06,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1365546.6666666667, ans=0.125 2023-12-23 22:32:13,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1365613.3333333333, ans=0.125 2023-12-23 22:32:15,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1365613.3333333333, ans=0.0 2023-12-23 22:32:25,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1365680.0, ans=0.125 2023-12-23 22:32:43,787 INFO [train.py:886] (1/4) Epoch 43, batch 4700, loss[loss=0.01002, audio_tagging_loss=0.01002, over 24750.00 frames. ], tot_loss[loss=0.01122, audio_tagging_loss=0.01122, over 4947413.28 frames. ], batch size: 99, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:32:45,111 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.24 vs. limit=6.0 2023-12-23 22:32:50,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1365813.3333333333, ans=0.125 2023-12-23 22:32:52,523 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.84 vs. limit=15.0 2023-12-23 22:32:53,448 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2023-12-23 22:33:00,264 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.428e+01 3.819e+01 4.036e+01 4.197e+01 4.891e+01, threshold=8.073e+01, percent-clipped=0.0 2023-12-23 22:33:07,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=1365946.6666666667, ans=0.05 2023-12-23 22:33:08,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1365946.6666666667, ans=0.0 2023-12-23 22:33:29,796 INFO [train.py:886] (1/4) Epoch 43, batch 4750, loss[loss=0.01035, audio_tagging_loss=0.01035, over 24750.00 frames. ], tot_loss[loss=0.01133, audio_tagging_loss=0.01133, over 4943628.61 frames. ], batch size: 99, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:33:30,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1366146.6666666667, ans=0.2 2023-12-23 22:34:06,060 INFO [train.py:886] (1/4) Epoch 44, batch 0, loss[loss=0.025, audio_tagging_loss=0.025, over 23987.00 frames. ], tot_loss[loss=0.025, audio_tagging_loss=0.025, over 23987.00 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:34:06,060 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 22:34:27,395 INFO [train.py:917] (1/4) Epoch 44, validation: loss=0.03574, audio_tagging_loss=0.03574, over 3737520.00 frames. 2023-12-23 22:34:27,396 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 22:34:42,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1366320.0, ans=0.125 2023-12-23 22:35:17,584 INFO [train.py:886] (1/4) Epoch 44, batch 50, loss[loss=0.01597, audio_tagging_loss=0.01597, over 25000.00 frames. ], tot_loss[loss=0.018, audio_tagging_loss=0.018, over 1122721.94 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 16.0 2023-12-23 22:35:20,386 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.606e+01 4.065e+01 4.610e+01 5.536e+01 1.097e+02, threshold=9.221e+01, percent-clipped=8.0 2023-12-23 22:35:22,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1366586.6666666667, ans=0.0 2023-12-23 22:35:28,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1366653.3333333333, ans=0.1 2023-12-23 22:35:40,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1366720.0, ans=0.0 2023-12-23 22:35:44,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1366720.0, ans=0.125 2023-12-23 22:35:55,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1366786.6666666667, ans=0.0 2023-12-23 22:36:01,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1366853.3333333333, ans=0.125 2023-12-23 22:36:08,395 INFO [train.py:886] (1/4) Epoch 44, batch 100, loss[loss=0.01482, audio_tagging_loss=0.01482, over 25000.00 frames. ], tot_loss[loss=0.0155, audio_tagging_loss=0.0155, over 1979674.15 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 16.0 2023-12-23 22:36:08,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1366920.0, ans=0.2 2023-12-23 22:36:16,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1366920.0, ans=0.1 2023-12-23 22:36:24,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1366986.6666666667, ans=0.1 2023-12-23 22:36:29,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1367053.3333333333, ans=0.1 2023-12-23 22:36:52,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1367186.6666666667, ans=0.125 2023-12-23 22:36:56,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1367186.6666666667, ans=0.125 2023-12-23 22:36:57,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1367186.6666666667, ans=0.0 2023-12-23 22:36:59,355 INFO [train.py:886] (1/4) Epoch 44, batch 150, loss[loss=0.0126, audio_tagging_loss=0.0126, over 25000.00 frames. ], tot_loss[loss=0.01409, audio_tagging_loss=0.01409, over 2640572.09 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 16.0 2023-12-23 22:37:02,156 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.808e+01 4.071e+01 4.284e+01 4.499e+01 5.493e+01, threshold=8.567e+01, percent-clipped=0.0 2023-12-23 22:37:11,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1367320.0, ans=0.0 2023-12-23 22:37:24,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1367386.6666666667, ans=0.125 2023-12-23 22:37:25,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1367386.6666666667, ans=0.0 2023-12-23 22:37:37,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1367453.3333333333, ans=0.125 2023-12-23 22:37:51,618 INFO [train.py:886] (1/4) Epoch 44, batch 200, loss[loss=0.01031, audio_tagging_loss=0.01031, over 25000.00 frames. ], tot_loss[loss=0.01326, audio_tagging_loss=0.01326, over 3161886.50 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 16.0 2023-12-23 22:38:09,120 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=20.05 vs. limit=22.5 2023-12-23 22:38:33,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1367853.3333333333, ans=0.2 2023-12-23 22:38:39,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1367853.3333333333, ans=0.125 2023-12-23 22:38:42,580 INFO [train.py:886] (1/4) Epoch 44, batch 250, loss[loss=0.01032, audio_tagging_loss=0.01032, over 25000.00 frames. ], tot_loss[loss=0.01268, audio_tagging_loss=0.01268, over 3564051.72 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 16.0 2023-12-23 22:38:45,354 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.532e+01 3.886e+01 4.052e+01 4.204e+01 5.117e+01, threshold=8.104e+01, percent-clipped=0.0 2023-12-23 22:38:45,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1367920.0, ans=0.0 2023-12-23 22:38:50,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1367920.0, ans=0.0 2023-12-23 22:39:09,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1368053.3333333333, ans=0.125 2023-12-23 22:39:20,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1368120.0, ans=0.125 2023-12-23 22:39:22,799 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.19 vs. limit=15.0 2023-12-23 22:39:26,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1368186.6666666667, ans=0.125 2023-12-23 22:39:34,333 INFO [train.py:886] (1/4) Epoch 44, batch 300, loss[loss=0.01113, audio_tagging_loss=0.01113, over 24750.00 frames. ], tot_loss[loss=0.01232, audio_tagging_loss=0.01232, over 3871688.48 frames. ], batch size: 99, lr: 2.45e-03, grad_scale: 16.0 2023-12-23 22:40:03,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1368386.6666666667, ans=0.0 2023-12-23 22:40:05,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1368453.3333333333, ans=0.125 2023-12-23 22:40:26,291 INFO [train.py:886] (1/4) Epoch 44, batch 350, loss[loss=0.01333, audio_tagging_loss=0.01333, over 24932.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 4109706.64 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 16.0 2023-12-23 22:40:29,094 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.502e+01 3.809e+01 3.953e+01 4.148e+01 4.528e+01, threshold=7.906e+01, percent-clipped=0.0 2023-12-23 22:40:41,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1368653.3333333333, ans=0.0 2023-12-23 22:40:54,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1368720.0, ans=0.0 2023-12-23 22:40:55,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1368720.0, ans=0.125 2023-12-23 22:41:01,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1368786.6666666667, ans=0.0 2023-12-23 22:41:16,694 INFO [train.py:886] (1/4) Epoch 44, batch 400, loss[loss=0.01267, audio_tagging_loss=0.01267, over 24750.00 frames. ], tot_loss[loss=0.01182, audio_tagging_loss=0.01182, over 4293909.62 frames. ], batch size: 99, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:41:19,693 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.89 vs. limit=15.0 2023-12-23 22:41:31,934 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.65 vs. limit=15.0 2023-12-23 22:41:46,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1369120.0, ans=0.125 2023-12-23 22:41:54,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1369120.0, ans=0.0 2023-12-23 22:42:08,313 INFO [train.py:886] (1/4) Epoch 44, batch 450, loss[loss=0.007224, audio_tagging_loss=0.007224, over 23957.00 frames. ], tot_loss[loss=0.01157, audio_tagging_loss=0.01157, over 4437671.51 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:42:11,761 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.414e+01 3.763e+01 3.912e+01 4.071e+01 4.674e+01, threshold=7.824e+01, percent-clipped=0.0 2023-12-23 22:42:17,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1369320.0, ans=0.0 2023-12-23 22:42:20,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1369320.0, ans=0.125 2023-12-23 22:42:27,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1369320.0, ans=0.125 2023-12-23 22:42:49,200 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.81 vs. limit=15.0 2023-12-23 22:42:56,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1369520.0, ans=0.1 2023-12-23 22:42:59,946 INFO [train.py:886] (1/4) Epoch 44, batch 500, loss[loss=0.01012, audio_tagging_loss=0.01012, over 24750.00 frames. ], tot_loss[loss=0.01133, audio_tagging_loss=0.01133, over 4551202.73 frames. ], batch size: 99, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:43:19,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1369653.3333333333, ans=0.1 2023-12-23 22:43:22,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1369720.0, ans=0.125 2023-12-23 22:43:41,287 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.22 vs. limit=15.0 2023-12-23 22:43:51,620 INFO [train.py:886] (1/4) Epoch 44, batch 550, loss[loss=0.01099, audio_tagging_loss=0.01099, over 25000.00 frames. ], tot_loss[loss=0.01128, audio_tagging_loss=0.01128, over 4645288.37 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:43:51,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1369920.0, ans=0.125 2023-12-23 22:43:53,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1369920.0, ans=0.125 2023-12-23 22:43:54,464 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.575e+01 3.800e+01 3.977e+01 4.148e+01 4.797e+01, threshold=7.954e+01, percent-clipped=0.0 2023-12-23 22:43:54,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1369920.0, ans=0.125 2023-12-23 22:44:01,339 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.93 vs. limit=15.0 2023-12-23 22:44:19,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1370053.3333333333, ans=0.0 2023-12-23 22:44:43,224 INFO [train.py:886] (1/4) Epoch 44, batch 600, loss[loss=0.01266, audio_tagging_loss=0.01266, over 24750.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4715396.34 frames. ], batch size: 99, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:44:59,789 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.11 vs. limit=15.0 2023-12-23 22:45:15,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1370453.3333333333, ans=0.0 2023-12-23 22:45:23,918 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2023-12-23 22:45:34,275 INFO [train.py:886] (1/4) Epoch 44, batch 650, loss[loss=0.009446, audio_tagging_loss=0.009446, over 24750.00 frames. ], tot_loss[loss=0.01141, audio_tagging_loss=0.01141, over 4765880.10 frames. ], batch size: 99, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:45:37,793 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.451e+01 3.789e+01 3.956e+01 4.142e+01 5.204e+01, threshold=7.912e+01, percent-clipped=0.0 2023-12-23 22:45:44,475 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.99 vs. limit=15.0 2023-12-23 22:45:50,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1370653.3333333333, ans=0.0 2023-12-23 22:45:52,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1370653.3333333333, ans=0.05 2023-12-23 22:45:59,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1370720.0, ans=0.2 2023-12-23 22:46:04,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1370720.0, ans=0.2 2023-12-23 22:46:09,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1370786.6666666667, ans=0.2 2023-12-23 22:46:14,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1370786.6666666667, ans=0.0 2023-12-23 22:46:26,832 INFO [train.py:886] (1/4) Epoch 44, batch 700, loss[loss=0.01578, audio_tagging_loss=0.01578, over 24750.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4804474.53 frames. ], batch size: 99, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:46:59,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1371120.0, ans=0.0 2023-12-23 22:47:05,569 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.31 vs. limit=15.0 2023-12-23 22:47:11,715 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.06 vs. limit=12.0 2023-12-23 22:47:18,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1371253.3333333333, ans=0.2 2023-12-23 22:47:19,236 INFO [train.py:886] (1/4) Epoch 44, batch 750, loss[loss=0.01113, audio_tagging_loss=0.01113, over 25000.00 frames. ], tot_loss[loss=0.01119, audio_tagging_loss=0.01119, over 4834264.21 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:47:21,985 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.438e+01 3.757e+01 3.902e+01 4.113e+01 5.017e+01, threshold=7.805e+01, percent-clipped=0.0 2023-12-23 22:47:29,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1371320.0, ans=0.125 2023-12-23 22:47:43,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1371386.6666666667, ans=0.1 2023-12-23 22:47:44,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1371386.6666666667, ans=0.125 2023-12-23 22:48:09,252 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.81 vs. limit=15.0 2023-12-23 22:48:09,725 INFO [train.py:886] (1/4) Epoch 44, batch 800, loss[loss=0.011, audio_tagging_loss=0.011, over 25000.00 frames. ], tot_loss[loss=0.01115, audio_tagging_loss=0.01115, over 4859205.95 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:48:10,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1371586.6666666667, ans=0.1 2023-12-23 22:48:39,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1371720.0, ans=0.09899494936611666 2023-12-23 22:48:45,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1371786.6666666667, ans=0.125 2023-12-23 22:49:02,939 INFO [train.py:886] (1/4) Epoch 44, batch 850, loss[loss=0.01029, audio_tagging_loss=0.01029, over 24750.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4880514.67 frames. ], batch size: 99, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:49:05,729 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.453e+01 3.773e+01 3.934e+01 4.147e+01 6.054e+01, threshold=7.868e+01, percent-clipped=0.0 2023-12-23 22:49:23,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1372053.3333333333, ans=0.125 2023-12-23 22:49:36,534 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.68 vs. limit=22.5 2023-12-23 22:49:39,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1372120.0, ans=0.1 2023-12-23 22:49:47,353 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.67 vs. limit=15.0 2023-12-23 22:49:47,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1372186.6666666667, ans=0.125 2023-12-23 22:49:53,210 INFO [train.py:886] (1/4) Epoch 44, batch 900, loss[loss=0.009795, audio_tagging_loss=0.009795, over 24750.00 frames. ], tot_loss[loss=0.01119, audio_tagging_loss=0.01119, over 4897257.90 frames. ], batch size: 99, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:49:55,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1372253.3333333333, ans=0.0 2023-12-23 22:50:45,617 INFO [train.py:886] (1/4) Epoch 44, batch 950, loss[loss=0.01219, audio_tagging_loss=0.01219, over 24750.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4899261.79 frames. ], batch size: 99, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:50:48,458 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.389e+01 3.853e+01 3.991e+01 4.175e+01 5.097e+01, threshold=7.983e+01, percent-clipped=0.0 2023-12-23 22:50:49,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.16 vs. limit=15.0 2023-12-23 22:50:53,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1372586.6666666667, ans=0.125 2023-12-23 22:50:59,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1372653.3333333333, ans=0.125 2023-12-23 22:51:13,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1372720.0, ans=0.0 2023-12-23 22:51:17,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1372786.6666666667, ans=0.125 2023-12-23 22:51:23,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1372786.6666666667, ans=0.125 2023-12-23 22:51:24,222 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.57 vs. limit=22.5 2023-12-23 22:51:25,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1372853.3333333333, ans=0.0 2023-12-23 22:51:25,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1372853.3333333333, ans=0.125 2023-12-23 22:51:38,151 INFO [train.py:886] (1/4) Epoch 44, batch 1000, loss[loss=0.01092, audio_tagging_loss=0.01092, over 25000.00 frames. ], tot_loss[loss=0.01126, audio_tagging_loss=0.01126, over 4901322.53 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:51:45,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1372920.0, ans=0.1 2023-12-23 22:51:48,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1372986.6666666667, ans=0.125 2023-12-23 22:51:52,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1372986.6666666667, ans=0.125 2023-12-23 22:52:05,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=1373053.3333333333, ans=15.0 2023-12-23 22:52:18,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1373186.6666666667, ans=0.0 2023-12-23 22:52:27,679 INFO [train.py:886] (1/4) Epoch 44, batch 1050, loss[loss=0.009211, audio_tagging_loss=0.009211, over 24023.00 frames. ], tot_loss[loss=0.01124, audio_tagging_loss=0.01124, over 4907879.15 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 22:52:31,162 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.448e+01 3.809e+01 4.004e+01 4.174e+01 4.765e+01, threshold=8.009e+01, percent-clipped=0.0 2023-12-23 22:52:46,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1373320.0, ans=0.125 2023-12-23 22:52:48,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1373320.0, ans=0.2 2023-12-23 22:52:51,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1373386.6666666667, ans=0.125 2023-12-23 22:52:56,735 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.35 vs. limit=15.0 2023-12-23 22:52:59,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1373453.3333333333, ans=0.0 2023-12-23 22:53:09,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1373453.3333333333, ans=0.0 2023-12-23 22:53:10,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1373520.0, ans=0.0 2023-12-23 22:53:12,009 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.34 vs. limit=15.0 2023-12-23 22:53:14,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1373520.0, ans=0.125 2023-12-23 22:53:15,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1373520.0, ans=0.125 2023-12-23 22:53:21,037 INFO [train.py:886] (1/4) Epoch 44, batch 1100, loss[loss=0.01066, audio_tagging_loss=0.01066, over 24750.00 frames. ], tot_loss[loss=0.01114, audio_tagging_loss=0.01114, over 4922931.22 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 22:53:41,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=1373720.0, ans=0.05 2023-12-23 22:53:57,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.89 vs. limit=15.0 2023-12-23 22:54:01,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1373853.3333333333, ans=0.125 2023-12-23 22:54:01,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1373853.3333333333, ans=15.0 2023-12-23 22:54:02,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1373853.3333333333, ans=0.125 2023-12-23 22:54:12,606 INFO [train.py:886] (1/4) Epoch 44, batch 1150, loss[loss=0.01296, audio_tagging_loss=0.01296, over 25000.00 frames. ], tot_loss[loss=0.01115, audio_tagging_loss=0.01115, over 4933545.47 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 22:54:16,269 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.466e+01 3.752e+01 3.932e+01 4.115e+01 4.811e+01, threshold=7.864e+01, percent-clipped=0.0 2023-12-23 22:54:22,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1373920.0, ans=0.0 2023-12-23 22:54:37,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1374053.3333333333, ans=0.0 2023-12-23 22:54:55,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1374186.6666666667, ans=0.125 2023-12-23 22:55:04,711 INFO [train.py:886] (1/4) Epoch 44, batch 1200, loss[loss=0.01158, audio_tagging_loss=0.01158, over 25000.00 frames. ], tot_loss[loss=0.01118, audio_tagging_loss=0.01118, over 4937527.97 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 22:55:08,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1374253.3333333333, ans=0.0 2023-12-23 22:55:10,938 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=6.76 vs. limit=15.0 2023-12-23 22:55:18,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1374320.0, ans=0.0 2023-12-23 22:55:20,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1374320.0, ans=0.125 2023-12-23 22:55:29,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1374386.6666666667, ans=0.2 2023-12-23 22:55:34,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1374453.3333333333, ans=0.125 2023-12-23 22:55:56,741 INFO [train.py:886] (1/4) Epoch 44, batch 1250, loss[loss=0.012, audio_tagging_loss=0.012, over 24750.00 frames. ], tot_loss[loss=0.01125, audio_tagging_loss=0.01125, over 4940993.21 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 22:56:00,326 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.517e+01 3.822e+01 4.031e+01 4.210e+01 4.983e+01, threshold=8.061e+01, percent-clipped=0.0 2023-12-23 22:56:02,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1374586.6666666667, ans=0.125 2023-12-23 22:56:12,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=1374653.3333333333, ans=0.025 2023-12-23 22:56:31,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1374786.6666666667, ans=0.0 2023-12-23 22:56:47,077 INFO [train.py:886] (1/4) Epoch 44, batch 1300, loss[loss=0.01238, audio_tagging_loss=0.01238, over 25000.00 frames. ], tot_loss[loss=0.01133, audio_tagging_loss=0.01133, over 4936629.58 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 22:57:11,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1375053.3333333333, ans=0.125 2023-12-23 22:57:12,654 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.37 vs. limit=15.0 2023-12-23 22:57:19,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1375120.0, ans=0.0 2023-12-23 22:57:32,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1375186.6666666667, ans=0.1 2023-12-23 22:57:34,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1375186.6666666667, ans=0.07 2023-12-23 22:57:36,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1375186.6666666667, ans=0.125 2023-12-23 22:57:39,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1375253.3333333333, ans=0.0 2023-12-23 22:57:39,994 INFO [train.py:886] (1/4) Epoch 44, batch 1350, loss[loss=0.01005, audio_tagging_loss=0.01005, over 24750.00 frames. ], tot_loss[loss=0.01124, audio_tagging_loss=0.01124, over 4941339.87 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 22:57:42,820 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.513e+01 3.826e+01 3.975e+01 4.146e+01 4.619e+01, threshold=7.951e+01, percent-clipped=0.0 2023-12-23 22:58:21,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1375520.0, ans=0.2 2023-12-23 22:58:21,617 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.33 vs. limit=15.0 2023-12-23 22:58:31,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1375520.0, ans=0.2 2023-12-23 22:58:32,284 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.52 vs. limit=15.0 2023-12-23 22:58:32,856 INFO [train.py:886] (1/4) Epoch 44, batch 1400, loss[loss=0.009333, audio_tagging_loss=0.009333, over 25000.00 frames. ], tot_loss[loss=0.0112, audio_tagging_loss=0.0112, over 4947547.80 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 22:58:35,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1375586.6666666667, ans=0.0 2023-12-23 22:58:48,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1375653.3333333333, ans=0.1 2023-12-23 22:58:50,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1375653.3333333333, ans=0.1 2023-12-23 22:59:05,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.20 vs. limit=12.0 2023-12-23 22:59:13,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1375853.3333333333, ans=0.125 2023-12-23 22:59:21,601 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.32 vs. limit=10.0 2023-12-23 22:59:23,964 INFO [train.py:886] (1/4) Epoch 44, batch 1450, loss[loss=0.009445, audio_tagging_loss=0.009445, over 25000.00 frames. ], tot_loss[loss=0.01116, audio_tagging_loss=0.01116, over 4951855.00 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 22:59:26,776 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.506e+01 3.727e+01 3.932e+01 4.156e+01 5.029e+01, threshold=7.864e+01, percent-clipped=0.0 2023-12-23 22:59:41,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1375986.6666666667, ans=0.1 2023-12-23 22:59:46,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1376053.3333333333, ans=0.125 2023-12-23 23:00:00,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1376120.0, ans=0.125 2023-12-23 23:00:02,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1376120.0, ans=0.0 2023-12-23 23:00:04,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1376120.0, ans=0.125 2023-12-23 23:00:14,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1376186.6666666667, ans=0.0 2023-12-23 23:00:14,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1376186.6666666667, ans=0.125 2023-12-23 23:00:16,331 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.56 vs. limit=15.0 2023-12-23 23:00:16,702 INFO [train.py:886] (1/4) Epoch 44, batch 1500, loss[loss=0.01227, audio_tagging_loss=0.01227, over 25000.00 frames. ], tot_loss[loss=0.01119, audio_tagging_loss=0.01119, over 4955297.04 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:00:19,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1376253.3333333333, ans=0.2 2023-12-23 23:00:25,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1376253.3333333333, ans=0.0 2023-12-23 23:00:29,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1376320.0, ans=0.2 2023-12-23 23:00:29,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1376320.0, ans=0.2 2023-12-23 23:00:52,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1376453.3333333333, ans=0.0 2023-12-23 23:00:53,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1376453.3333333333, ans=0.125 2023-12-23 23:00:56,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1376453.3333333333, ans=0.125 2023-12-23 23:01:04,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1376520.0, ans=0.1 2023-12-23 23:01:09,442 INFO [train.py:886] (1/4) Epoch 44, batch 1550, loss[loss=0.01083, audio_tagging_loss=0.01083, over 24750.00 frames. ], tot_loss[loss=0.01125, audio_tagging_loss=0.01125, over 4946847.25 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:01:13,030 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.562e+01 3.917e+01 4.065e+01 4.220e+01 5.107e+01, threshold=8.130e+01, percent-clipped=0.0 2023-12-23 23:01:15,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1376586.6666666667, ans=0.1 2023-12-23 23:01:16,454 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2023-12-23 23:01:30,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1376720.0, ans=0.125 2023-12-23 23:01:39,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1376786.6666666667, ans=0.0 2023-12-23 23:01:39,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1376786.6666666667, ans=0.0 2023-12-23 23:01:49,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1376853.3333333333, ans=0.0 2023-12-23 23:01:52,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1376853.3333333333, ans=0.125 2023-12-23 23:01:56,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1376853.3333333333, ans=0.125 2023-12-23 23:02:00,084 INFO [train.py:886] (1/4) Epoch 44, batch 1600, loss[loss=0.01115, audio_tagging_loss=0.01115, over 24750.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4947260.77 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:02:16,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1376986.6666666667, ans=0.125 2023-12-23 23:02:19,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1376986.6666666667, ans=0.0 2023-12-23 23:02:35,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1377120.0, ans=0.125 2023-12-23 23:02:42,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1377186.6666666667, ans=0.09899494936611666 2023-12-23 23:02:51,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1377253.3333333333, ans=0.09899494936611666 2023-12-23 23:02:52,359 INFO [train.py:886] (1/4) Epoch 44, batch 1650, loss[loss=0.009842, audio_tagging_loss=0.009842, over 24750.00 frames. ], tot_loss[loss=0.01127, audio_tagging_loss=0.01127, over 4949024.26 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:02:53,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1377253.3333333333, ans=0.125 2023-12-23 23:02:55,191 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.228e+01 3.859e+01 4.003e+01 4.218e+01 7.648e+01, threshold=8.006e+01, percent-clipped=0.0 2023-12-23 23:03:22,168 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.64 vs. limit=15.0 2023-12-23 23:03:25,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1377453.3333333333, ans=0.0 2023-12-23 23:03:26,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=1377453.3333333333, ans=0.025 2023-12-23 23:03:31,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1377453.3333333333, ans=0.05 2023-12-23 23:03:35,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1377520.0, ans=0.125 2023-12-23 23:03:43,281 INFO [train.py:886] (1/4) Epoch 44, batch 1700, loss[loss=0.01103, audio_tagging_loss=0.01103, over 24750.00 frames. ], tot_loss[loss=0.01118, audio_tagging_loss=0.01118, over 4951487.93 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:03:59,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1377653.3333333333, ans=0.1 2023-12-23 23:04:11,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1377720.0, ans=0.125 2023-12-23 23:04:35,377 INFO [train.py:886] (1/4) Epoch 44, batch 1750, loss[loss=0.01061, audio_tagging_loss=0.01061, over 25000.00 frames. ], tot_loss[loss=0.01115, audio_tagging_loss=0.01115, over 4949105.46 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:04:38,158 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.480e+01 3.774e+01 3.975e+01 4.116e+01 4.755e+01, threshold=7.949e+01, percent-clipped=0.0 2023-12-23 23:04:42,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.10 vs. limit=15.0 2023-12-23 23:05:06,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1378120.0, ans=0.125 2023-12-23 23:05:15,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1378186.6666666667, ans=0.125 2023-12-23 23:05:27,876 INFO [train.py:886] (1/4) Epoch 44, batch 1800, loss[loss=0.01002, audio_tagging_loss=0.01002, over 25000.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4954061.18 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:05:32,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1378253.3333333333, ans=0.125 2023-12-23 23:05:46,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1378386.6666666667, ans=0.09899494936611666 2023-12-23 23:05:59,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1378453.3333333333, ans=0.125 2023-12-23 23:05:59,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1378453.3333333333, ans=0.1 2023-12-23 23:06:16,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1378520.0, ans=0.125 2023-12-23 23:06:19,003 INFO [train.py:886] (1/4) Epoch 44, batch 1850, loss[loss=0.01112, audio_tagging_loss=0.01112, over 21752.00 frames. ], tot_loss[loss=0.01115, audio_tagging_loss=0.01115, over 4955292.65 frames. ], batch size: 107, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:06:19,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1378586.6666666667, ans=0.125 2023-12-23 23:06:21,834 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.258e+01 3.867e+01 4.025e+01 4.218e+01 4.619e+01, threshold=8.051e+01, percent-clipped=0.0 2023-12-23 23:06:32,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1378653.3333333333, ans=0.125 2023-12-23 23:06:37,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1378653.3333333333, ans=0.125 2023-12-23 23:06:40,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1378720.0, ans=0.125 2023-12-23 23:07:06,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1378853.3333333333, ans=0.125 2023-12-23 23:07:10,742 INFO [train.py:886] (1/4) Epoch 44, batch 1900, loss[loss=0.01262, audio_tagging_loss=0.01262, over 24750.00 frames. ], tot_loss[loss=0.01124, audio_tagging_loss=0.01124, over 4951469.87 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:07:21,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1378986.6666666667, ans=0.1 2023-12-23 23:07:24,767 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.67 vs. limit=22.5 2023-12-23 23:07:51,600 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.41 vs. limit=12.0 2023-12-23 23:07:52,784 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.43 vs. limit=15.0 2023-12-23 23:08:02,080 INFO [train.py:886] (1/4) Epoch 44, batch 1950, loss[loss=0.01293, audio_tagging_loss=0.01293, over 24750.00 frames. ], tot_loss[loss=0.01122, audio_tagging_loss=0.01122, over 4951013.92 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:08:05,533 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.480e+01 3.823e+01 4.045e+01 4.197e+01 4.926e+01, threshold=8.090e+01, percent-clipped=0.0 2023-12-23 23:08:05,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1379253.3333333333, ans=0.0 2023-12-23 23:08:11,735 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=20.94 vs. limit=22.5 2023-12-23 23:08:16,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1379320.0, ans=0.0 2023-12-23 23:08:21,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1379386.6666666667, ans=0.125 2023-12-23 23:08:27,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1379386.6666666667, ans=0.1 2023-12-23 23:08:42,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1379453.3333333333, ans=0.125 2023-12-23 23:08:48,859 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=22.5 2023-12-23 23:08:54,358 INFO [train.py:886] (1/4) Epoch 44, batch 2000, loss[loss=0.01096, audio_tagging_loss=0.01096, over 25000.00 frames. ], tot_loss[loss=0.01111, audio_tagging_loss=0.01111, over 4949638.20 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:09:07,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1379653.3333333333, ans=0.125 2023-12-23 23:09:31,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1379786.6666666667, ans=0.125 2023-12-23 23:09:33,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1379786.6666666667, ans=0.1 2023-12-23 23:09:41,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.97 vs. limit=12.0 2023-12-23 23:09:46,490 INFO [train.py:886] (1/4) Epoch 44, batch 2050, loss[loss=0.009768, audio_tagging_loss=0.009768, over 25000.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4950125.14 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:09:49,335 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.403e+01 3.844e+01 3.984e+01 4.167e+01 5.134e+01, threshold=7.969e+01, percent-clipped=0.0 2023-12-23 23:09:52,530 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2023-12-23 23:10:18,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1380120.0, ans=0.2 2023-12-23 23:10:28,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1380186.6666666667, ans=0.125 2023-12-23 23:10:29,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1380186.6666666667, ans=0.125 2023-12-23 23:10:35,743 INFO [train.py:886] (1/4) Epoch 44, batch 2100, loss[loss=0.01055, audio_tagging_loss=0.01055, over 25000.00 frames. ], tot_loss[loss=0.01109, audio_tagging_loss=0.01109, over 4951353.02 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:11:10,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1380453.3333333333, ans=0.2 2023-12-23 23:11:17,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1380520.0, ans=0.125 2023-12-23 23:11:28,055 INFO [train.py:886] (1/4) Epoch 44, batch 2150, loss[loss=0.01024, audio_tagging_loss=0.01024, over 24750.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4949601.41 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:11:30,902 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.445e+01 3.811e+01 3.971e+01 4.151e+01 4.761e+01, threshold=7.942e+01, percent-clipped=0.0 2023-12-23 23:11:39,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1380653.3333333333, ans=0.0 2023-12-23 23:11:41,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1380653.3333333333, ans=0.0 2023-12-23 23:12:00,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.34 vs. limit=10.0 2023-12-23 23:12:15,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1380853.3333333333, ans=0.125 2023-12-23 23:12:16,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1380853.3333333333, ans=0.0 2023-12-23 23:12:18,074 INFO [train.py:886] (1/4) Epoch 44, batch 2200, loss[loss=0.01257, audio_tagging_loss=0.01257, over 24750.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4949265.06 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:12:27,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1380920.0, ans=0.2 2023-12-23 23:12:28,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1380986.6666666667, ans=0.09899494936611666 2023-12-23 23:12:30,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1380986.6666666667, ans=0.125 2023-12-23 23:12:35,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1380986.6666666667, ans=0.1 2023-12-23 23:12:42,714 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.87 vs. limit=12.0 2023-12-23 23:12:44,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1381053.3333333333, ans=0.125 2023-12-23 23:13:09,603 INFO [train.py:886] (1/4) Epoch 44, batch 2250, loss[loss=0.01086, audio_tagging_loss=0.01086, over 24750.00 frames. ], tot_loss[loss=0.01109, audio_tagging_loss=0.01109, over 4946264.50 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:13:11,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1381253.3333333333, ans=0.1 2023-12-23 23:13:12,390 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.439e+01 3.814e+01 4.052e+01 4.266e+01 4.733e+01, threshold=8.103e+01, percent-clipped=0.0 2023-12-23 23:13:28,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1381320.0, ans=0.0 2023-12-23 23:13:31,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1381386.6666666667, ans=0.125 2023-12-23 23:13:44,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1381453.3333333333, ans=0.0 2023-12-23 23:13:47,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1381453.3333333333, ans=0.0 2023-12-23 23:13:59,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1381520.0, ans=0.0 2023-12-23 23:14:02,109 INFO [train.py:886] (1/4) Epoch 44, batch 2300, loss[loss=0.008673, audio_tagging_loss=0.008673, over 25000.00 frames. ], tot_loss[loss=0.011, audio_tagging_loss=0.011, over 4947449.02 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:14:05,451 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.82 vs. limit=8.0 2023-12-23 23:14:25,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1381720.0, ans=0.125 2023-12-23 23:14:27,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1381720.0, ans=0.0 2023-12-23 23:14:28,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1381720.0, ans=0.1 2023-12-23 23:14:35,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1381786.6666666667, ans=0.125 2023-12-23 23:14:53,651 INFO [train.py:886] (1/4) Epoch 44, batch 2350, loss[loss=0.01038, audio_tagging_loss=0.01038, over 25000.00 frames. ], tot_loss[loss=0.01092, audio_tagging_loss=0.01092, over 4953755.16 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:14:57,185 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.396e+01 3.788e+01 3.954e+01 4.114e+01 5.032e+01, threshold=7.908e+01, percent-clipped=0.0 2023-12-23 23:14:58,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=1381920.0, ans=10.0 2023-12-23 23:15:38,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1382186.6666666667, ans=0.0 2023-12-23 23:15:46,390 INFO [train.py:886] (1/4) Epoch 44, batch 2400, loss[loss=0.01196, audio_tagging_loss=0.01196, over 25000.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4955648.49 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:16:01,947 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.56 vs. limit=15.0 2023-12-23 23:16:02,534 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.49 vs. limit=15.0 2023-12-23 23:16:09,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1382386.6666666667, ans=0.035 2023-12-23 23:16:15,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1382386.6666666667, ans=0.2 2023-12-23 23:16:20,367 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.04 vs. limit=12.0 2023-12-23 23:16:20,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1382453.3333333333, ans=0.125 2023-12-23 23:16:38,133 INFO [train.py:886] (1/4) Epoch 44, batch 2450, loss[loss=0.01079, audio_tagging_loss=0.01079, over 25000.00 frames. ], tot_loss[loss=0.01089, audio_tagging_loss=0.01089, over 4957395.87 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:16:38,643 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.58 vs. limit=22.5 2023-12-23 23:16:41,715 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.351e+01 3.765e+01 3.945e+01 4.099e+01 8.510e+01, threshold=7.890e+01, percent-clipped=1.0 2023-12-23 23:17:03,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1382720.0, ans=0.2 2023-12-23 23:17:12,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1382786.6666666667, ans=0.0 2023-12-23 23:17:24,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.99 vs. limit=22.5 2023-12-23 23:17:30,037 INFO [train.py:886] (1/4) Epoch 44, batch 2500, loss[loss=0.01091, audio_tagging_loss=0.01091, over 24750.00 frames. ], tot_loss[loss=0.011, audio_tagging_loss=0.011, over 4949726.16 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:17:34,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1382920.0, ans=0.0 2023-12-23 23:17:35,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1382920.0, ans=0.125 2023-12-23 23:17:57,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1383053.3333333333, ans=0.125 2023-12-23 23:18:00,420 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 23:18:22,369 INFO [train.py:886] (1/4) Epoch 44, batch 2550, loss[loss=0.0126, audio_tagging_loss=0.0126, over 24750.00 frames. ], tot_loss[loss=0.01109, audio_tagging_loss=0.01109, over 4945688.14 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:18:25,168 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.531e+01 3.915e+01 4.071e+01 4.211e+01 5.058e+01, threshold=8.142e+01, percent-clipped=0.0 2023-12-23 23:18:30,090 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.55 vs. limit=15.0 2023-12-23 23:18:39,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1383320.0, ans=0.2 2023-12-23 23:19:14,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1383586.6666666667, ans=0.125 2023-12-23 23:19:15,030 INFO [train.py:886] (1/4) Epoch 44, batch 2600, loss[loss=0.01231, audio_tagging_loss=0.01231, over 24023.00 frames. ], tot_loss[loss=0.01106, audio_tagging_loss=0.01106, over 4936279.31 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:19:19,475 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.04 vs. limit=22.5 2023-12-23 23:19:29,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1383653.3333333333, ans=0.125 2023-12-23 23:19:31,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1383653.3333333333, ans=0.0 2023-12-23 23:19:40,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1383720.0, ans=0.125 2023-12-23 23:19:44,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1383786.6666666667, ans=0.125 2023-12-23 23:19:47,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1383786.6666666667, ans=0.125 2023-12-23 23:19:50,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1383786.6666666667, ans=0.125 2023-12-23 23:20:05,558 INFO [train.py:886] (1/4) Epoch 44, batch 2650, loss[loss=0.01047, audio_tagging_loss=0.01047, over 24750.00 frames. ], tot_loss[loss=0.01098, audio_tagging_loss=0.01098, over 4937116.85 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:20:09,130 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.375e+01 3.827e+01 4.015e+01 4.224e+01 5.047e+01, threshold=8.029e+01, percent-clipped=0.0 2023-12-23 23:20:15,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1383986.6666666667, ans=0.05 2023-12-23 23:20:23,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1383986.6666666667, ans=10.0 2023-12-23 23:20:27,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1384053.3333333333, ans=0.1 2023-12-23 23:20:54,527 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.00 vs. limit=15.0 2023-12-23 23:20:59,277 INFO [train.py:886] (1/4) Epoch 44, batch 2700, loss[loss=0.01178, audio_tagging_loss=0.01178, over 25000.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4944920.17 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:21:26,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1384386.6666666667, ans=0.125 2023-12-23 23:21:37,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1384453.3333333333, ans=0.125 2023-12-23 23:21:50,604 INFO [train.py:886] (1/4) Epoch 44, batch 2750, loss[loss=0.01325, audio_tagging_loss=0.01325, over 25000.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4954398.47 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 64.0 2023-12-23 23:21:53,387 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.502e+01 3.801e+01 3.957e+01 4.121e+01 4.471e+01, threshold=7.914e+01, percent-clipped=0.0 2023-12-23 23:22:03,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1384653.3333333333, ans=0.1 2023-12-23 23:22:07,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1384653.3333333333, ans=0.1 2023-12-23 23:22:13,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1384720.0, ans=0.0 2023-12-23 23:22:16,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1384720.0, ans=0.1 2023-12-23 23:22:24,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1384786.6666666667, ans=0.0 2023-12-23 23:22:30,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1384786.6666666667, ans=0.125 2023-12-23 23:22:34,937 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.03 vs. limit=10.0 2023-12-23 23:22:42,931 INFO [train.py:886] (1/4) Epoch 44, batch 2800, loss[loss=0.01212, audio_tagging_loss=0.01212, over 24750.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4956791.19 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 64.0 2023-12-23 23:22:46,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1384920.0, ans=0.125 2023-12-23 23:23:14,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=1385120.0, ans=0.95 2023-12-23 23:23:15,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1385120.0, ans=0.125 2023-12-23 23:23:16,523 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.40 vs. limit=15.0 2023-12-23 23:23:18,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1385120.0, ans=0.125 2023-12-23 23:23:24,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1385186.6666666667, ans=0.125 2023-12-23 23:23:35,053 INFO [train.py:886] (1/4) Epoch 44, batch 2850, loss[loss=0.01024, audio_tagging_loss=0.01024, over 24750.00 frames. ], tot_loss[loss=0.01119, audio_tagging_loss=0.01119, over 4950879.59 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 64.0 2023-12-23 23:23:35,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1385253.3333333333, ans=0.125 2023-12-23 23:23:37,928 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.556e+01 3.856e+01 3.997e+01 4.163e+01 4.618e+01, threshold=7.995e+01, percent-clipped=0.0 2023-12-23 23:23:56,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1385386.6666666667, ans=0.0 2023-12-23 23:24:07,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1385453.3333333333, ans=0.125 2023-12-23 23:24:11,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1385453.3333333333, ans=0.125 2023-12-23 23:24:26,129 INFO [train.py:886] (1/4) Epoch 44, batch 2900, loss[loss=0.01091, audio_tagging_loss=0.01091, over 25000.00 frames. ], tot_loss[loss=0.01118, audio_tagging_loss=0.01118, over 4951043.45 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:24:28,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1385586.6666666667, ans=0.125 2023-12-23 23:24:29,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1385586.6666666667, ans=0.125 2023-12-23 23:24:50,575 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.14 vs. limit=6.0 2023-12-23 23:24:54,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.41 vs. limit=10.0 2023-12-23 23:25:15,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1385853.3333333333, ans=0.125 2023-12-23 23:25:18,702 INFO [train.py:886] (1/4) Epoch 44, batch 2950, loss[loss=0.009731, audio_tagging_loss=0.009731, over 25000.00 frames. ], tot_loss[loss=0.01101, audio_tagging_loss=0.01101, over 4951583.62 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:25:22,473 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.528e+01 3.797e+01 3.995e+01 4.163e+01 7.263e+01, threshold=7.990e+01, percent-clipped=0.0 2023-12-23 23:25:24,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1385920.0, ans=0.125 2023-12-23 23:25:45,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1386053.3333333333, ans=0.125 2023-12-23 23:25:48,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1386053.3333333333, ans=0.125 2023-12-23 23:26:10,144 INFO [train.py:886] (1/4) Epoch 44, batch 3000, loss[loss=0.01252, audio_tagging_loss=0.01252, over 24750.00 frames. ], tot_loss[loss=0.011, audio_tagging_loss=0.011, over 4950502.89 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:26:10,145 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 23:26:32,317 INFO [train.py:917] (1/4) Epoch 44, validation: loss=0.03602, audio_tagging_loss=0.03602, over 3737520.00 frames. 2023-12-23 23:26:32,318 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 23:26:33,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1386253.3333333333, ans=0.125 2023-12-23 23:26:53,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1386386.6666666667, ans=0.125 2023-12-23 23:27:09,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1386453.3333333333, ans=0.1 2023-12-23 23:27:18,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1386520.0, ans=0.0 2023-12-23 23:27:24,347 INFO [train.py:886] (1/4) Epoch 44, batch 3050, loss[loss=0.0123, audio_tagging_loss=0.0123, over 22269.00 frames. ], tot_loss[loss=0.01104, audio_tagging_loss=0.01104, over 4953168.08 frames. ], batch size: 107, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:27:28,171 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.485e+01 3.888e+01 4.021e+01 4.196e+01 4.723e+01, threshold=8.042e+01, percent-clipped=0.0 2023-12-23 23:27:47,364 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.53 vs. limit=22.5 2023-12-23 23:27:48,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1386720.0, ans=0.125 2023-12-23 23:28:16,894 INFO [train.py:886] (1/4) Epoch 44, batch 3100, loss[loss=0.01154, audio_tagging_loss=0.01154, over 24750.00 frames. ], tot_loss[loss=0.01108, audio_tagging_loss=0.01108, over 4958053.61 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:28:17,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1386920.0, ans=0.125 2023-12-23 23:28:21,219 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.05 vs. limit=15.0 2023-12-23 23:28:21,666 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 23:28:35,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1386986.6666666667, ans=0.125 2023-12-23 23:28:36,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1387053.3333333333, ans=0.125 2023-12-23 23:28:40,955 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.93 vs. limit=15.0 2023-12-23 23:28:50,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1387120.0, ans=0.125 2023-12-23 23:28:53,247 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.82 vs. limit=22.5 2023-12-23 23:29:01,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1387186.6666666667, ans=0.125 2023-12-23 23:29:08,880 INFO [train.py:886] (1/4) Epoch 44, batch 3150, loss[loss=0.01193, audio_tagging_loss=0.01193, over 24750.00 frames. ], tot_loss[loss=0.01115, audio_tagging_loss=0.01115, over 4952429.94 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:29:09,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1387253.3333333333, ans=0.1 2023-12-23 23:29:12,626 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.491e+01 3.896e+01 4.085e+01 4.176e+01 5.009e+01, threshold=8.169e+01, percent-clipped=0.0 2023-12-23 23:29:22,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1387320.0, ans=0.2 2023-12-23 23:29:39,676 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.89 vs. limit=6.0 2023-12-23 23:30:01,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1387586.6666666667, ans=0.125 2023-12-23 23:30:01,877 INFO [train.py:886] (1/4) Epoch 44, batch 3200, loss[loss=0.01162, audio_tagging_loss=0.01162, over 24750.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4945527.78 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:30:05,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1387586.6666666667, ans=0.1 2023-12-23 23:30:11,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1387653.3333333333, ans=0.2 2023-12-23 23:30:19,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1387653.3333333333, ans=0.1 2023-12-23 23:30:35,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1387786.6666666667, ans=0.2 2023-12-23 23:30:37,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1387786.6666666667, ans=0.04949747468305833 2023-12-23 23:30:43,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1387853.3333333333, ans=0.125 2023-12-23 23:30:48,155 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=12.0 2023-12-23 23:30:51,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1387920.0, ans=0.125 2023-12-23 23:30:53,193 INFO [train.py:886] (1/4) Epoch 44, batch 3250, loss[loss=0.01254, audio_tagging_loss=0.01254, over 25000.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4950178.14 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:30:57,013 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.426e+01 3.832e+01 3.942e+01 4.111e+01 4.749e+01, threshold=7.885e+01, percent-clipped=0.0 2023-12-23 23:31:02,910 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.61 vs. limit=15.0 2023-12-23 23:31:23,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1388120.0, ans=0.1 2023-12-23 23:31:44,695 INFO [train.py:886] (1/4) Epoch 44, batch 3300, loss[loss=0.01087, audio_tagging_loss=0.01087, over 25000.00 frames. ], tot_loss[loss=0.01106, audio_tagging_loss=0.01106, over 4955782.07 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:31:48,055 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2023-12-23 23:31:50,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1388253.3333333333, ans=0.125 2023-12-23 23:31:54,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1388320.0, ans=0.0 2023-12-23 23:32:13,995 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 23:32:14,232 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.26 vs. limit=15.0 2023-12-23 23:32:14,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1388453.3333333333, ans=0.0 2023-12-23 23:32:18,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1388453.3333333333, ans=0.0 2023-12-23 23:32:28,637 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.47 vs. limit=15.0 2023-12-23 23:32:31,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1388520.0, ans=0.0 2023-12-23 23:32:35,943 INFO [train.py:886] (1/4) Epoch 44, batch 3350, loss[loss=0.01159, audio_tagging_loss=0.01159, over 25000.00 frames. ], tot_loss[loss=0.01102, audio_tagging_loss=0.01102, over 4951462.26 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:32:39,735 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.350e+01 3.780e+01 3.970e+01 4.173e+01 4.809e+01, threshold=7.941e+01, percent-clipped=0.0 2023-12-23 23:32:48,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1388653.3333333333, ans=0.125 2023-12-23 23:32:54,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1388653.3333333333, ans=0.125 2023-12-23 23:32:55,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1388720.0, ans=0.2 2023-12-23 23:33:10,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1388786.6666666667, ans=0.0 2023-12-23 23:33:20,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1388853.3333333333, ans=0.125 2023-12-23 23:33:27,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1388920.0, ans=0.0 2023-12-23 23:33:28,109 INFO [train.py:886] (1/4) Epoch 44, batch 3400, loss[loss=0.01072, audio_tagging_loss=0.01072, over 25000.00 frames. ], tot_loss[loss=0.01109, audio_tagging_loss=0.01109, over 4951085.45 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:33:37,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1388986.6666666667, ans=0.125 2023-12-23 23:33:49,822 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.79 vs. limit=10.0 2023-12-23 23:33:53,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1389053.3333333333, ans=0.0 2023-12-23 23:34:00,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1389120.0, ans=0.2 2023-12-23 23:34:17,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1389186.6666666667, ans=0.1 2023-12-23 23:34:20,509 INFO [train.py:886] (1/4) Epoch 44, batch 3450, loss[loss=0.00998, audio_tagging_loss=0.00998, over 25000.00 frames. ], tot_loss[loss=0.01114, audio_tagging_loss=0.01114, over 4941361.63 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:34:24,939 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.641e+01 3.949e+01 4.077e+01 4.251e+01 4.756e+01, threshold=8.154e+01, percent-clipped=0.0 2023-12-23 23:34:28,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1389253.3333333333, ans=0.125 2023-12-23 23:34:38,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1389320.0, ans=0.0 2023-12-23 23:35:04,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=1389520.0, ans=0.025 2023-12-23 23:35:12,403 INFO [train.py:886] (1/4) Epoch 44, batch 3500, loss[loss=0.008872, audio_tagging_loss=0.008872, over 24750.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4940493.03 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:35:12,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1389586.6666666667, ans=0.125 2023-12-23 23:35:28,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1389653.3333333333, ans=0.0 2023-12-23 23:35:58,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1389853.3333333333, ans=0.07 2023-12-23 23:36:04,657 INFO [train.py:886] (1/4) Epoch 44, batch 3550, loss[loss=0.008741, audio_tagging_loss=0.008741, over 25000.00 frames. ], tot_loss[loss=0.01102, audio_tagging_loss=0.01102, over 4938836.22 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:36:08,437 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.423e+01 3.841e+01 4.039e+01 4.182e+01 4.677e+01, threshold=8.078e+01, percent-clipped=0.0 2023-12-23 23:36:16,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1389986.6666666667, ans=0.0 2023-12-23 23:36:17,480 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.60 vs. limit=22.5 2023-12-23 23:36:56,423 INFO [train.py:886] (1/4) Epoch 44, batch 3600, loss[loss=0.009018, audio_tagging_loss=0.009018, over 25000.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4941225.57 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:37:04,152 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.71 vs. limit=22.5 2023-12-23 23:37:07,263 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.31 vs. limit=22.5 2023-12-23 23:37:08,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1390320.0, ans=0.125 2023-12-23 23:37:13,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1390320.0, ans=0.1 2023-12-23 23:37:26,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1390386.6666666667, ans=0.1 2023-12-23 23:37:36,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1390453.3333333333, ans=0.125 2023-12-23 23:37:37,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1390520.0, ans=10.0 2023-12-23 23:37:48,518 INFO [train.py:886] (1/4) Epoch 44, batch 3650, loss[loss=0.01129, audio_tagging_loss=0.01129, over 25000.00 frames. ], tot_loss[loss=0.01094, audio_tagging_loss=0.01094, over 4941591.98 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:37:52,937 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.491e+01 3.828e+01 3.967e+01 4.135e+01 4.611e+01, threshold=7.934e+01, percent-clipped=0.0 2023-12-23 23:37:55,213 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.44 vs. limit=10.0 2023-12-23 23:38:04,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1390653.3333333333, ans=0.2 2023-12-23 23:38:19,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1390786.6666666667, ans=0.0 2023-12-23 23:38:28,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1390786.6666666667, ans=0.0 2023-12-23 23:38:41,039 INFO [train.py:886] (1/4) Epoch 44, batch 3700, loss[loss=0.0107, audio_tagging_loss=0.0107, over 25000.00 frames. ], tot_loss[loss=0.01098, audio_tagging_loss=0.01098, over 4952185.38 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:39:01,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1391053.3333333333, ans=0.125 2023-12-23 23:39:07,335 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.88 vs. limit=15.0 2023-12-23 23:39:21,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1391186.6666666667, ans=0.2 2023-12-23 23:39:32,656 INFO [train.py:886] (1/4) Epoch 44, batch 3750, loss[loss=0.009476, audio_tagging_loss=0.009476, over 24750.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4954620.58 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:39:37,090 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.417e+01 3.871e+01 4.041e+01 4.218e+01 6.039e+01, threshold=8.081e+01, percent-clipped=0.0 2023-12-23 23:40:01,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1391386.6666666667, ans=0.0 2023-12-23 23:40:03,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1391453.3333333333, ans=0.035 2023-12-23 23:40:24,077 INFO [train.py:886] (1/4) Epoch 44, batch 3800, loss[loss=0.01368, audio_tagging_loss=0.01368, over 25000.00 frames. ], tot_loss[loss=0.01113, audio_tagging_loss=0.01113, over 4953812.50 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:40:35,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=1391653.3333333333, ans=0.025 2023-12-23 23:40:49,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.10 vs. limit=15.0 2023-12-23 23:40:53,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1391720.0, ans=0.2 2023-12-23 23:40:56,039 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.06 vs. limit=22.5 2023-12-23 23:41:16,687 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=15.0 2023-12-23 23:41:17,267 INFO [train.py:886] (1/4) Epoch 44, batch 3850, loss[loss=0.01082, audio_tagging_loss=0.01082, over 25000.00 frames. ], tot_loss[loss=0.01114, audio_tagging_loss=0.01114, over 4954802.43 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:41:21,127 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.492e+01 3.872e+01 4.027e+01 4.207e+01 4.777e+01, threshold=8.053e+01, percent-clipped=0.0 2023-12-23 23:41:31,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1391986.6666666667, ans=0.125 2023-12-23 23:41:38,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1392053.3333333333, ans=0.0 2023-12-23 23:41:44,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.56 vs. limit=15.0 2023-12-23 23:41:52,833 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 23:41:56,080 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=5.58 vs. limit=15.0 2023-12-23 23:42:02,818 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.14 vs. limit=15.0 2023-12-23 23:42:03,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1392186.6666666667, ans=0.0 2023-12-23 23:42:09,413 INFO [train.py:886] (1/4) Epoch 44, batch 3900, loss[loss=0.01095, audio_tagging_loss=0.01095, over 25000.00 frames. ], tot_loss[loss=0.01106, audio_tagging_loss=0.01106, over 4948563.18 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:42:13,952 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.41 vs. limit=10.0 2023-12-23 23:42:19,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1392320.0, ans=0.0 2023-12-23 23:42:22,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1392320.0, ans=0.125 2023-12-23 23:42:30,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1392386.6666666667, ans=0.125 2023-12-23 23:42:36,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1392386.6666666667, ans=0.1 2023-12-23 23:42:36,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1392386.6666666667, ans=0.1 2023-12-23 23:43:01,306 INFO [train.py:886] (1/4) Epoch 44, batch 3950, loss[loss=0.009674, audio_tagging_loss=0.009674, over 25000.00 frames. ], tot_loss[loss=0.01105, audio_tagging_loss=0.01105, over 4949783.19 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:43:05,141 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.463e+01 3.857e+01 4.007e+01 4.216e+01 4.773e+01, threshold=8.014e+01, percent-clipped=0.0 2023-12-23 23:43:30,176 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.36 vs. limit=15.0 2023-12-23 23:43:33,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1392786.6666666667, ans=0.0 2023-12-23 23:43:51,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1392853.3333333333, ans=0.125 2023-12-23 23:43:53,642 INFO [train.py:886] (1/4) Epoch 44, batch 4000, loss[loss=0.009645, audio_tagging_loss=0.009645, over 24065.00 frames. ], tot_loss[loss=0.01114, audio_tagging_loss=0.01114, over 4952692.44 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:43:55,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1392920.0, ans=0.125 2023-12-23 23:44:11,278 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.07 vs. limit=15.0 2023-12-23 23:44:32,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1393120.0, ans=0.05 2023-12-23 23:44:35,900 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.01 vs. limit=15.0 2023-12-23 23:44:42,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1393186.6666666667, ans=0.04949747468305833 2023-12-23 23:44:44,865 INFO [train.py:886] (1/4) Epoch 44, batch 4050, loss[loss=0.009761, audio_tagging_loss=0.009761, over 24750.00 frames. ], tot_loss[loss=0.01114, audio_tagging_loss=0.01114, over 4959694.56 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:44:48,646 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.503e+01 3.797e+01 4.007e+01 4.202e+01 4.983e+01, threshold=8.013e+01, percent-clipped=0.0 2023-12-23 23:44:48,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1393253.3333333333, ans=0.0 2023-12-23 23:45:10,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1393386.6666666667, ans=0.1 2023-12-23 23:45:17,404 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.37 vs. limit=22.5 2023-12-23 23:45:34,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1393520.0, ans=0.125 2023-12-23 23:45:35,056 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.39 vs. limit=22.5 2023-12-23 23:45:36,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1393586.6666666667, ans=0.0 2023-12-23 23:45:37,348 INFO [train.py:886] (1/4) Epoch 44, batch 4100, loss[loss=0.01081, audio_tagging_loss=0.01081, over 24750.00 frames. ], tot_loss[loss=0.01121, audio_tagging_loss=0.01121, over 4949182.17 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:46:09,349 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.92 vs. limit=15.0 2023-12-23 23:46:29,088 INFO [train.py:886] (1/4) Epoch 44, batch 4150, loss[loss=0.01216, audio_tagging_loss=0.01216, over 24750.00 frames. ], tot_loss[loss=0.01126, audio_tagging_loss=0.01126, over 4950843.06 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:46:30,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1393920.0, ans=0.0 2023-12-23 23:46:33,523 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.380e+01 3.908e+01 4.057e+01 4.233e+01 4.763e+01, threshold=8.114e+01, percent-clipped=0.0 2023-12-23 23:46:59,275 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2023-12-23 23:46:59,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1394120.0, ans=0.125 2023-12-23 23:47:07,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1394120.0, ans=0.125 2023-12-23 23:47:10,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1394186.6666666667, ans=0.125 2023-12-23 23:47:11,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1394186.6666666667, ans=0.2 2023-12-23 23:47:13,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1394186.6666666667, ans=0.125 2023-12-23 23:47:20,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1394253.3333333333, ans=0.1 2023-12-23 23:47:20,981 INFO [train.py:886] (1/4) Epoch 44, batch 4200, loss[loss=0.01254, audio_tagging_loss=0.01254, over 24750.00 frames. ], tot_loss[loss=0.01116, audio_tagging_loss=0.01116, over 4952454.89 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:47:43,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1394386.6666666667, ans=0.125 2023-12-23 23:47:51,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1394453.3333333333, ans=0.2 2023-12-23 23:47:57,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1394453.3333333333, ans=0.125 2023-12-23 23:47:58,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1394453.3333333333, ans=0.07 2023-12-23 23:48:13,016 INFO [train.py:886] (1/4) Epoch 44, batch 4250, loss[loss=0.009903, audio_tagging_loss=0.009903, over 25000.00 frames. ], tot_loss[loss=0.01108, audio_tagging_loss=0.01108, over 4957848.19 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:48:16,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1394586.6666666667, ans=0.07 2023-12-23 23:48:16,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1394586.6666666667, ans=0.125 2023-12-23 23:48:17,456 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.360e+01 3.794e+01 3.945e+01 4.179e+01 4.749e+01, threshold=7.890e+01, percent-clipped=0.0 2023-12-23 23:48:17,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1394586.6666666667, ans=0.125 2023-12-23 23:48:33,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1394720.0, ans=0.125 2023-12-23 23:48:39,360 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=22.5 2023-12-23 23:48:58,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1394853.3333333333, ans=0.125 2023-12-23 23:49:00,953 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.04 vs. limit=12.0 2023-12-23 23:49:01,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1394853.3333333333, ans=0.0 2023-12-23 23:49:04,135 INFO [train.py:886] (1/4) Epoch 44, batch 4300, loss[loss=0.01039, audio_tagging_loss=0.01039, over 25000.00 frames. ], tot_loss[loss=0.01101, audio_tagging_loss=0.01101, over 4964429.71 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:49:06,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1394920.0, ans=0.0 2023-12-23 23:49:23,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1394986.6666666667, ans=0.2 2023-12-23 23:49:36,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1395120.0, ans=0.2 2023-12-23 23:49:57,524 INFO [train.py:886] (1/4) Epoch 44, batch 4350, loss[loss=0.0104, audio_tagging_loss=0.0104, over 22660.00 frames. ], tot_loss[loss=0.01102, audio_tagging_loss=0.01102, over 4964388.99 frames. ], batch size: 107, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:50:01,300 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.531e+01 3.887e+01 4.029e+01 4.199e+01 5.257e+01, threshold=8.057e+01, percent-clipped=0.0 2023-12-23 23:50:05,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1395253.3333333333, ans=0.125 2023-12-23 23:50:19,109 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 23:50:49,095 INFO [train.py:886] (1/4) Epoch 44, batch 4400, loss[loss=0.01006, audio_tagging_loss=0.01006, over 24750.00 frames. ], tot_loss[loss=0.01114, audio_tagging_loss=0.01114, over 4961317.87 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:51:40,103 INFO [train.py:886] (1/4) Epoch 44, batch 4450, loss[loss=0.01257, audio_tagging_loss=0.01257, over 24750.00 frames. ], tot_loss[loss=0.01118, audio_tagging_loss=0.01118, over 4954987.59 frames. ], batch size: 99, lr: 2.42e-03, grad_scale: 32.0 2023-12-23 23:51:43,923 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.489e+01 3.885e+01 4.023e+01 4.248e+01 5.191e+01, threshold=8.046e+01, percent-clipped=0.0 2023-12-23 23:51:51,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1395986.6666666667, ans=0.025 2023-12-23 23:51:57,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.04 vs. limit=15.0 2023-12-23 23:51:57,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1395986.6666666667, ans=0.1 2023-12-23 23:52:03,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1396053.3333333333, ans=0.125 2023-12-23 23:52:08,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1396053.3333333333, ans=0.0 2023-12-23 23:52:17,375 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=15.0 2023-12-23 23:52:32,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1396253.3333333333, ans=0.125 2023-12-23 23:52:33,409 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.29 vs. limit=15.0 2023-12-23 23:52:33,692 INFO [train.py:886] (1/4) Epoch 44, batch 4500, loss[loss=0.01059, audio_tagging_loss=0.01059, over 25000.00 frames. ], tot_loss[loss=0.01113, audio_tagging_loss=0.01113, over 4954454.84 frames. ], batch size: 100, lr: 2.42e-03, grad_scale: 32.0 2023-12-23 23:52:57,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1396386.6666666667, ans=0.0 2023-12-23 23:53:11,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1396453.3333333333, ans=0.1 2023-12-23 23:53:11,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1396453.3333333333, ans=0.125 2023-12-23 23:53:24,796 INFO [train.py:886] (1/4) Epoch 44, batch 4550, loss[loss=0.009627, audio_tagging_loss=0.009627, over 25000.00 frames. ], tot_loss[loss=0.01109, audio_tagging_loss=0.01109, over 4958707.46 frames. ], batch size: 100, lr: 2.42e-03, grad_scale: 32.0 2023-12-23 23:53:28,510 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.460e+01 3.833e+01 3.993e+01 4.205e+01 5.726e+01, threshold=7.986e+01, percent-clipped=0.0 2023-12-23 23:53:34,337 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.81 vs. limit=22.5 2023-12-23 23:53:37,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1396653.3333333333, ans=0.05 2023-12-23 23:53:39,747 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.52 vs. limit=15.0 2023-12-23 23:54:09,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1396853.3333333333, ans=0.125 2023-12-23 23:54:17,176 INFO [train.py:886] (1/4) Epoch 44, batch 4600, loss[loss=0.01169, audio_tagging_loss=0.01169, over 25000.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4962531.21 frames. ], batch size: 100, lr: 2.42e-03, grad_scale: 32.0 2023-12-23 23:54:28,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1396986.6666666667, ans=0.2 2023-12-23 23:54:30,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1396986.6666666667, ans=0.0 2023-12-23 23:54:43,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1397053.3333333333, ans=0.125 2023-12-23 23:55:05,515 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 23:55:08,881 INFO [train.py:886] (1/4) Epoch 44, batch 4650, loss[loss=0.01177, audio_tagging_loss=0.01177, over 25000.00 frames. ], tot_loss[loss=0.01114, audio_tagging_loss=0.01114, over 4965104.06 frames. ], batch size: 100, lr: 2.42e-03, grad_scale: 32.0 2023-12-23 23:55:13,399 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.459e+01 3.838e+01 4.030e+01 4.199e+01 4.777e+01, threshold=8.060e+01, percent-clipped=0.0 2023-12-23 23:55:26,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1397320.0, ans=0.125 2023-12-23 23:55:29,828 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.45 vs. limit=15.0 2023-12-23 23:55:54,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1397520.0, ans=0.2 2023-12-23 23:56:00,029 INFO [train.py:886] (1/4) Epoch 44, batch 4700, loss[loss=0.01107, audio_tagging_loss=0.01107, over 24750.00 frames. ], tot_loss[loss=0.01122, audio_tagging_loss=0.01122, over 4959091.89 frames. ], batch size: 99, lr: 2.42e-03, grad_scale: 32.0 2023-12-23 23:56:06,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1397586.6666666667, ans=0.125 2023-12-23 23:56:17,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1397720.0, ans=0.0 2023-12-23 23:56:29,219 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.07 vs. limit=15.0 2023-12-23 23:56:32,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1397786.6666666667, ans=0.125 2023-12-23 23:56:34,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1397786.6666666667, ans=0.0 2023-12-23 23:56:46,662 INFO [train.py:886] (1/4) Epoch 44, batch 4750, loss[loss=0.01261, audio_tagging_loss=0.01261, over 24750.00 frames. ], tot_loss[loss=0.01126, audio_tagging_loss=0.01126, over 4947858.89 frames. ], batch size: 99, lr: 2.42e-03, grad_scale: 32.0 2023-12-23 23:56:50,273 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.598e+01 3.838e+01 4.057e+01 4.238e+01 5.270e+01, threshold=8.115e+01, percent-clipped=0.0 2023-12-23 23:56:51,647 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2023-12-23 23:56:54,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1397920.0, ans=0.025 2023-12-23 23:57:22,199 INFO [train.py:886] (1/4) Epoch 45, batch 0, loss[loss=0.02617, audio_tagging_loss=0.02617, over 25000.00 frames. ], tot_loss[loss=0.02617, audio_tagging_loss=0.02617, over 25000.00 frames. ], batch size: 100, lr: 2.40e-03, grad_scale: 32.0 2023-12-23 23:57:22,199 INFO [train.py:909] (1/4) Computing validation loss 2023-12-23 23:57:43,159 INFO [train.py:917] (1/4) Epoch 45, validation: loss=0.03554, audio_tagging_loss=0.03554, over 3737520.00 frames. 2023-12-23 23:57:43,160 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-23 23:58:02,848 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.22 vs. limit=15.0 2023-12-23 23:58:04,849 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.86 vs. limit=10.0 2023-12-23 23:58:10,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1398160.0, ans=0.0 2023-12-23 23:58:14,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1398226.6666666667, ans=0.125 2023-12-23 23:58:16,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1398226.6666666667, ans=0.125 2023-12-23 23:58:19,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1398226.6666666667, ans=0.1 2023-12-23 23:58:24,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1398293.3333333333, ans=0.2 2023-12-23 23:58:33,603 INFO [train.py:886] (1/4) Epoch 45, batch 50, loss[loss=0.01261, audio_tagging_loss=0.01261, over 25000.00 frames. ], tot_loss[loss=0.01782, audio_tagging_loss=0.01782, over 1114978.98 frames. ], batch size: 100, lr: 2.40e-03, grad_scale: 16.0 2023-12-23 23:58:54,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1398493.3333333333, ans=0.05 2023-12-23 23:59:06,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1398560.0, ans=0.0 2023-12-23 23:59:14,330 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.685e+01 4.411e+01 4.844e+01 5.631e+01 1.112e+02, threshold=9.688e+01, percent-clipped=7.0 2023-12-23 23:59:14,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1398626.6666666667, ans=0.125 2023-12-23 23:59:23,993 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.22 vs. limit=15.0 2023-12-23 23:59:26,361 INFO [train.py:886] (1/4) Epoch 45, batch 100, loss[loss=0.01555, audio_tagging_loss=0.01555, over 25000.00 frames. ], tot_loss[loss=0.01533, audio_tagging_loss=0.01533, over 1960690.41 frames. ], batch size: 100, lr: 2.40e-03, grad_scale: 16.0 2023-12-23 23:59:30,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1398693.3333333333, ans=0.125 2023-12-23 23:59:54,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1398826.6666666667, ans=0.125 2023-12-23 23:59:58,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1398893.3333333333, ans=0.0 2023-12-24 00:00:01,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1398893.3333333333, ans=0.2 2023-12-24 00:00:01,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1398893.3333333333, ans=0.0 2023-12-24 00:00:18,186 INFO [train.py:886] (1/4) Epoch 45, batch 150, loss[loss=0.01072, audio_tagging_loss=0.01072, over 24750.00 frames. ], tot_loss[loss=0.01405, audio_tagging_loss=0.01405, over 2625565.30 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 16.0 2023-12-24 00:00:26,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1399026.6666666667, ans=0.1 2023-12-24 00:00:29,454 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.35 vs. limit=22.5 2023-12-24 00:00:57,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1399226.6666666667, ans=0.125 2023-12-24 00:00:59,191 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.571e+01 3.963e+01 4.110e+01 4.348e+01 5.500e+01, threshold=8.220e+01, percent-clipped=0.0 2023-12-24 00:01:03,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1399293.3333333333, ans=0.125 2023-12-24 00:01:08,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1399360.0, ans=0.0 2023-12-24 00:01:09,656 INFO [train.py:886] (1/4) Epoch 45, batch 200, loss[loss=0.01079, audio_tagging_loss=0.01079, over 25000.00 frames. ], tot_loss[loss=0.0131, audio_tagging_loss=0.0131, over 3143351.96 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 16.0 2023-12-24 00:01:09,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1399360.0, ans=0.0 2023-12-24 00:01:18,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1399426.6666666667, ans=0.125 2023-12-24 00:01:39,452 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.34 vs. limit=22.5 2023-12-24 00:01:58,090 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.61 vs. limit=22.5 2023-12-24 00:02:02,101 INFO [train.py:886] (1/4) Epoch 45, batch 250, loss[loss=0.01472, audio_tagging_loss=0.01472, over 25000.00 frames. ], tot_loss[loss=0.01256, audio_tagging_loss=0.01256, over 3553881.78 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 16.0 2023-12-24 00:02:05,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1399693.3333333333, ans=0.125 2023-12-24 00:02:08,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1399693.3333333333, ans=0.125 2023-12-24 00:02:09,132 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.66 vs. limit=10.0 2023-12-24 00:02:28,340 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.28 vs. limit=15.0 2023-12-24 00:02:42,534 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.460e+01 3.868e+01 4.040e+01 4.212e+01 5.003e+01, threshold=8.081e+01, percent-clipped=0.0 2023-12-24 00:02:43,131 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.65 vs. limit=15.0 2023-12-24 00:02:50,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1399960.0, ans=0.0 2023-12-24 00:02:53,885 INFO [train.py:886] (1/4) Epoch 45, batch 300, loss[loss=0.01424, audio_tagging_loss=0.01424, over 24750.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 3863712.55 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 16.0 2023-12-24 00:03:09,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1400093.3333333333, ans=0.1 2023-12-24 00:03:25,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1400226.6666666667, ans=0.125 2023-12-24 00:03:39,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.35 vs. limit=10.0 2023-12-24 00:03:46,177 INFO [train.py:886] (1/4) Epoch 45, batch 350, loss[loss=0.009279, audio_tagging_loss=0.009279, over 24750.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4098119.74 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 16.0 2023-12-24 00:03:50,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1400360.0, ans=0.125 2023-12-24 00:03:50,437 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.46 vs. limit=10.0 2023-12-24 00:03:57,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1400426.6666666667, ans=0.125 2023-12-24 00:04:00,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1400426.6666666667, ans=0.125 2023-12-24 00:04:03,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1400426.6666666667, ans=0.2 2023-12-24 00:04:04,968 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:04:15,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1400493.3333333333, ans=0.1 2023-12-24 00:04:18,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1400560.0, ans=0.0 2023-12-24 00:04:22,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1400560.0, ans=0.125 2023-12-24 00:04:24,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1400560.0, ans=0.09899494936611666 2023-12-24 00:04:25,972 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.453e+01 3.862e+01 4.002e+01 4.189e+01 4.449e+01, threshold=8.005e+01, percent-clipped=0.0 2023-12-24 00:04:35,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1400626.6666666667, ans=0.125 2023-12-24 00:04:37,806 INFO [train.py:886] (1/4) Epoch 45, batch 400, loss[loss=0.0104, audio_tagging_loss=0.0104, over 25000.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4286351.88 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:04:45,260 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.50 vs. limit=22.5 2023-12-24 00:05:23,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1400960.0, ans=0.125 2023-12-24 00:05:28,533 INFO [train.py:886] (1/4) Epoch 45, batch 450, loss[loss=0.01173, audio_tagging_loss=0.01173, over 22490.00 frames. ], tot_loss[loss=0.01149, audio_tagging_loss=0.01149, over 4430477.70 frames. ], batch size: 107, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:05:28,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1401026.6666666667, ans=0.125 2023-12-24 00:05:55,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1401160.0, ans=0.2 2023-12-24 00:05:56,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1401160.0, ans=0.125 2023-12-24 00:06:04,145 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.22 vs. limit=15.0 2023-12-24 00:06:08,955 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.455e+01 3.831e+01 3.994e+01 4.191e+01 6.478e+01, threshold=7.987e+01, percent-clipped=0.0 2023-12-24 00:06:21,069 INFO [train.py:886] (1/4) Epoch 45, batch 500, loss[loss=0.01196, audio_tagging_loss=0.01196, over 24750.00 frames. ], tot_loss[loss=0.01135, audio_tagging_loss=0.01135, over 4541973.91 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:06:43,247 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=22.5 2023-12-24 00:06:52,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1401560.0, ans=0.125 2023-12-24 00:06:56,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1401560.0, ans=0.5 2023-12-24 00:07:08,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1401626.6666666667, ans=0.2 2023-12-24 00:07:10,611 INFO [train.py:886] (1/4) Epoch 45, batch 550, loss[loss=0.01148, audio_tagging_loss=0.01148, over 25000.00 frames. ], tot_loss[loss=0.01127, audio_tagging_loss=0.01127, over 4634960.32 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:07:29,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1401760.0, ans=0.07 2023-12-24 00:07:31,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=15.0 2023-12-24 00:07:32,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.91 vs. limit=22.5 2023-12-24 00:07:35,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1401826.6666666667, ans=0.125 2023-12-24 00:07:41,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1401893.3333333333, ans=0.125 2023-12-24 00:07:50,486 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.468e+01 3.862e+01 4.012e+01 4.255e+01 6.507e+01, threshold=8.024e+01, percent-clipped=0.0 2023-12-24 00:07:51,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1401960.0, ans=0.125 2023-12-24 00:07:53,685 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.09 vs. limit=15.0 2023-12-24 00:08:01,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1402026.6666666667, ans=0.125 2023-12-24 00:08:01,736 INFO [train.py:886] (1/4) Epoch 45, batch 600, loss[loss=0.01025, audio_tagging_loss=0.01025, over 24750.00 frames. ], tot_loss[loss=0.01128, audio_tagging_loss=0.01128, over 4703328.37 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:08:02,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1402026.6666666667, ans=0.0 2023-12-24 00:08:33,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1402226.6666666667, ans=0.125 2023-12-24 00:08:50,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1402293.3333333333, ans=0.125 2023-12-24 00:08:54,246 INFO [train.py:886] (1/4) Epoch 45, batch 650, loss[loss=0.01132, audio_tagging_loss=0.01132, over 25000.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4753485.23 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:08:57,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1402360.0, ans=0.2 2023-12-24 00:09:00,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1402360.0, ans=0.1 2023-12-24 00:09:10,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1402426.6666666667, ans=0.125 2023-12-24 00:09:15,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1402493.3333333333, ans=0.07 2023-12-24 00:09:29,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1402560.0, ans=0.0 2023-12-24 00:09:31,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1402560.0, ans=0.2 2023-12-24 00:09:34,587 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.523e+01 3.819e+01 4.030e+01 4.235e+01 4.740e+01, threshold=8.060e+01, percent-clipped=0.0 2023-12-24 00:09:37,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1402626.6666666667, ans=0.125 2023-12-24 00:09:43,831 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.28 vs. limit=10.0 2023-12-24 00:09:44,746 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.92 vs. limit=12.0 2023-12-24 00:09:45,215 INFO [train.py:886] (1/4) Epoch 45, batch 700, loss[loss=0.009144, audio_tagging_loss=0.009144, over 25000.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4793086.52 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:09:51,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1402693.3333333333, ans=0.125 2023-12-24 00:10:11,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1402826.6666666667, ans=0.125 2023-12-24 00:10:22,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1402893.3333333333, ans=0.1 2023-12-24 00:10:25,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1402960.0, ans=0.125 2023-12-24 00:10:37,465 INFO [train.py:886] (1/4) Epoch 45, batch 750, loss[loss=0.01252, audio_tagging_loss=0.01252, over 24750.00 frames. ], tot_loss[loss=0.01119, audio_tagging_loss=0.01119, over 4832681.77 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:10:38,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1403026.6666666667, ans=0.1 2023-12-24 00:11:15,687 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.495e+01 3.861e+01 4.093e+01 4.221e+01 4.879e+01, threshold=8.187e+01, percent-clipped=0.0 2023-12-24 00:11:26,762 INFO [train.py:886] (1/4) Epoch 45, batch 800, loss[loss=0.01353, audio_tagging_loss=0.01353, over 25000.00 frames. ], tot_loss[loss=0.01111, audio_tagging_loss=0.01111, over 4864421.82 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:11:31,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1403360.0, ans=0.125 2023-12-24 00:11:45,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1403426.6666666667, ans=0.125 2023-12-24 00:12:15,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1403626.6666666667, ans=0.2 2023-12-24 00:12:17,191 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.57 vs. limit=15.0 2023-12-24 00:12:18,621 INFO [train.py:886] (1/4) Epoch 45, batch 850, loss[loss=0.008762, audio_tagging_loss=0.008762, over 25000.00 frames. ], tot_loss[loss=0.01106, audio_tagging_loss=0.01106, over 4889327.80 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:12:20,882 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2023-12-24 00:12:23,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1403693.3333333333, ans=0.1 2023-12-24 00:12:43,796 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.15 vs. limit=22.5 2023-12-24 00:12:44,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1403826.6666666667, ans=0.1 2023-12-24 00:12:52,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1403893.3333333333, ans=0.125 2023-12-24 00:12:53,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1403893.3333333333, ans=0.0 2023-12-24 00:12:58,730 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.415e+01 3.861e+01 4.035e+01 4.222e+01 4.797e+01, threshold=8.070e+01, percent-clipped=0.0 2023-12-24 00:13:10,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1404026.6666666667, ans=0.125 2023-12-24 00:13:11,515 INFO [train.py:886] (1/4) Epoch 45, batch 900, loss[loss=0.01119, audio_tagging_loss=0.01119, over 24750.00 frames. ], tot_loss[loss=0.01109, audio_tagging_loss=0.01109, over 4904009.15 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:13:16,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.15 vs. limit=12.0 2023-12-24 00:14:02,251 INFO [train.py:886] (1/4) Epoch 45, batch 950, loss[loss=0.01079, audio_tagging_loss=0.01079, over 22127.00 frames. ], tot_loss[loss=0.01117, audio_tagging_loss=0.01117, over 4909339.62 frames. ], batch size: 107, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:14:24,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1404493.3333333333, ans=0.125 2023-12-24 00:14:39,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1404560.0, ans=0.125 2023-12-24 00:14:40,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1404560.0, ans=0.125 2023-12-24 00:14:43,644 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.636e+01 3.902e+01 4.041e+01 4.266e+01 4.782e+01, threshold=8.081e+01, percent-clipped=0.0 2023-12-24 00:14:54,685 INFO [train.py:886] (1/4) Epoch 45, batch 1000, loss[loss=0.01052, audio_tagging_loss=0.01052, over 24750.00 frames. ], tot_loss[loss=0.01121, audio_tagging_loss=0.01121, over 4917336.96 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:15:07,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1404760.0, ans=0.0 2023-12-24 00:15:12,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1404760.0, ans=0.1 2023-12-24 00:15:33,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1404893.3333333333, ans=0.125 2023-12-24 00:15:45,806 INFO [train.py:886] (1/4) Epoch 45, batch 1050, loss[loss=0.01033, audio_tagging_loss=0.01033, over 25000.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4930023.19 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:15:48,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1405026.6666666667, ans=0.125 2023-12-24 00:15:57,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1405093.3333333333, ans=0.0 2023-12-24 00:16:07,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1405160.0, ans=0.2 2023-12-24 00:16:24,811 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.564e+01 3.844e+01 4.005e+01 4.224e+01 5.202e+01, threshold=8.010e+01, percent-clipped=0.0 2023-12-24 00:16:36,164 INFO [train.py:886] (1/4) Epoch 45, batch 1100, loss[loss=0.01361, audio_tagging_loss=0.01361, over 25000.00 frames. ], tot_loss[loss=0.01102, audio_tagging_loss=0.01102, over 4937272.81 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:16:37,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1405360.0, ans=0.1 2023-12-24 00:16:56,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1405493.3333333333, ans=0.125 2023-12-24 00:17:06,060 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.06 vs. limit=22.5 2023-12-24 00:17:18,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1405626.6666666667, ans=0.125 2023-12-24 00:17:23,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1405626.6666666667, ans=0.125 2023-12-24 00:17:27,008 INFO [train.py:886] (1/4) Epoch 45, batch 1150, loss[loss=0.008874, audio_tagging_loss=0.008874, over 25000.00 frames. ], tot_loss[loss=0.01103, audio_tagging_loss=0.01103, over 4944585.61 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:17:27,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1405693.3333333333, ans=0.125 2023-12-24 00:17:31,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1405693.3333333333, ans=0.0 2023-12-24 00:17:42,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1405760.0, ans=0.1 2023-12-24 00:17:47,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.00 vs. limit=15.0 2023-12-24 00:17:51,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1405826.6666666667, ans=0.125 2023-12-24 00:18:00,789 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.88 vs. limit=15.0 2023-12-24 00:18:05,975 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.457e+01 3.770e+01 3.983e+01 4.144e+01 4.747e+01, threshold=7.965e+01, percent-clipped=0.0 2023-12-24 00:18:07,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1405960.0, ans=0.125 2023-12-24 00:18:17,361 INFO [train.py:886] (1/4) Epoch 45, batch 1200, loss[loss=0.009085, audio_tagging_loss=0.009085, over 22055.00 frames. ], tot_loss[loss=0.01103, audio_tagging_loss=0.01103, over 4945682.71 frames. ], batch size: 107, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:18:27,600 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.36 vs. limit=22.5 2023-12-24 00:19:07,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1406293.3333333333, ans=0.125 2023-12-24 00:19:09,378 INFO [train.py:886] (1/4) Epoch 45, batch 1250, loss[loss=0.01169, audio_tagging_loss=0.01169, over 24750.00 frames. ], tot_loss[loss=0.01118, audio_tagging_loss=0.01118, over 4946893.68 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:19:11,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1406360.0, ans=0.125 2023-12-24 00:19:23,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1406426.6666666667, ans=0.125 2023-12-24 00:19:33,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1406493.3333333333, ans=0.125 2023-12-24 00:19:49,848 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.464e+01 3.882e+01 4.077e+01 4.282e+01 6.825e+01, threshold=8.153e+01, percent-clipped=0.0 2023-12-24 00:20:02,551 INFO [train.py:886] (1/4) Epoch 45, batch 1300, loss[loss=0.01251, audio_tagging_loss=0.01251, over 24005.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4936383.54 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:20:19,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1406760.0, ans=0.0 2023-12-24 00:20:46,447 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.68 vs. limit=6.0 2023-12-24 00:20:49,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1406960.0, ans=0.125 2023-12-24 00:20:53,369 INFO [train.py:886] (1/4) Epoch 45, batch 1350, loss[loss=0.01141, audio_tagging_loss=0.01141, over 24750.00 frames. ], tot_loss[loss=0.01126, audio_tagging_loss=0.01126, over 4936204.46 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:21:12,473 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.04 vs. limit=12.0 2023-12-24 00:21:34,830 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.481e+01 3.816e+01 3.963e+01 4.132e+01 5.053e+01, threshold=7.926e+01, percent-clipped=0.0 2023-12-24 00:21:42,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1407293.3333333333, ans=0.07 2023-12-24 00:21:45,965 INFO [train.py:886] (1/4) Epoch 45, batch 1400, loss[loss=0.01198, audio_tagging_loss=0.01198, over 25000.00 frames. ], tot_loss[loss=0.01119, audio_tagging_loss=0.01119, over 4943813.72 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:21:59,373 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.58 vs. limit=15.0 2023-12-24 00:22:00,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1407426.6666666667, ans=0.2 2023-12-24 00:22:08,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1407493.3333333333, ans=0.0 2023-12-24 00:22:10,601 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:22:22,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1407560.0, ans=0.0 2023-12-24 00:22:38,200 INFO [train.py:886] (1/4) Epoch 45, batch 1450, loss[loss=0.01031, audio_tagging_loss=0.01031, over 25000.00 frames. ], tot_loss[loss=0.01114, audio_tagging_loss=0.01114, over 4948011.49 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:22:52,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.47 vs. limit=15.0 2023-12-24 00:23:05,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1407826.6666666667, ans=0.125 2023-12-24 00:23:18,724 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.497e+01 3.851e+01 3.995e+01 4.151e+01 4.657e+01, threshold=7.989e+01, percent-clipped=0.0 2023-12-24 00:23:22,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1407960.0, ans=0.0 2023-12-24 00:23:28,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1408026.6666666667, ans=0.025 2023-12-24 00:23:29,325 INFO [train.py:886] (1/4) Epoch 45, batch 1500, loss[loss=0.01195, audio_tagging_loss=0.01195, over 25000.00 frames. ], tot_loss[loss=0.01106, audio_tagging_loss=0.01106, over 4949446.70 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:23:30,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1408026.6666666667, ans=0.0 2023-12-24 00:23:32,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1408026.6666666667, ans=0.125 2023-12-24 00:23:34,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1408026.6666666667, ans=0.1 2023-12-24 00:23:49,483 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.46 vs. limit=10.0 2023-12-24 00:23:55,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1408160.0, ans=0.0 2023-12-24 00:24:00,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1408226.6666666667, ans=0.0 2023-12-24 00:24:22,037 INFO [train.py:886] (1/4) Epoch 45, batch 1550, loss[loss=0.01037, audio_tagging_loss=0.01037, over 24750.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4941400.40 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:24:48,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1408493.3333333333, ans=0.125 2023-12-24 00:24:51,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1408493.3333333333, ans=0.1 2023-12-24 00:24:55,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1408560.0, ans=0.125 2023-12-24 00:25:00,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1408560.0, ans=0.1 2023-12-24 00:25:01,865 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.438e+01 3.911e+01 4.058e+01 4.249e+01 4.989e+01, threshold=8.116e+01, percent-clipped=0.0 2023-12-24 00:25:07,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1408626.6666666667, ans=0.125 2023-12-24 00:25:13,079 INFO [train.py:886] (1/4) Epoch 45, batch 1600, loss[loss=0.0111, audio_tagging_loss=0.0111, over 24750.00 frames. ], tot_loss[loss=0.01117, audio_tagging_loss=0.01117, over 4935004.55 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:25:33,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1408826.6666666667, ans=0.125 2023-12-24 00:25:38,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1408826.6666666667, ans=0.04949747468305833 2023-12-24 00:25:54,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1408960.0, ans=0.0 2023-12-24 00:25:59,040 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.12 vs. limit=15.0 2023-12-24 00:26:05,276 INFO [train.py:886] (1/4) Epoch 45, batch 1650, loss[loss=0.01127, audio_tagging_loss=0.01127, over 23961.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4933757.06 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:26:15,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=12.0 2023-12-24 00:26:18,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1409093.3333333333, ans=0.125 2023-12-24 00:26:20,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1409093.3333333333, ans=0.0 2023-12-24 00:26:45,327 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.532e+01 3.821e+01 4.016e+01 4.279e+01 4.895e+01, threshold=8.031e+01, percent-clipped=0.0 2023-12-24 00:26:58,854 INFO [train.py:886] (1/4) Epoch 45, batch 1700, loss[loss=0.01005, audio_tagging_loss=0.01005, over 25000.00 frames. ], tot_loss[loss=0.01108, audio_tagging_loss=0.01108, over 4938272.77 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:27:10,793 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.35 vs. limit=6.0 2023-12-24 00:27:33,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1409560.0, ans=0.125 2023-12-24 00:27:34,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1409560.0, ans=0.1 2023-12-24 00:27:49,113 INFO [train.py:886] (1/4) Epoch 45, batch 1750, loss[loss=0.01052, audio_tagging_loss=0.01052, over 25000.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4945942.55 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:27:50,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1409693.3333333333, ans=0.125 2023-12-24 00:27:56,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1409693.3333333333, ans=0.1 2023-12-24 00:27:56,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1409693.3333333333, ans=0.0 2023-12-24 00:28:29,630 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.438e+01 3.800e+01 3.993e+01 4.173e+01 4.854e+01, threshold=7.987e+01, percent-clipped=0.0 2023-12-24 00:28:31,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1409960.0, ans=0.2 2023-12-24 00:28:35,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1409960.0, ans=0.2 2023-12-24 00:28:42,199 INFO [train.py:886] (1/4) Epoch 45, batch 1800, loss[loss=0.009138, audio_tagging_loss=0.009138, over 21491.00 frames. ], tot_loss[loss=0.01093, audio_tagging_loss=0.01093, over 4946002.38 frames. ], batch size: 107, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:29:01,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1410160.0, ans=0.125 2023-12-24 00:29:17,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1410226.6666666667, ans=0.125 2023-12-24 00:29:19,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1410226.6666666667, ans=0.1 2023-12-24 00:29:22,255 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:29:22,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1410293.3333333333, ans=0.0 2023-12-24 00:29:26,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1410293.3333333333, ans=0.0 2023-12-24 00:29:27,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1410293.3333333333, ans=0.125 2023-12-24 00:29:32,417 INFO [train.py:886] (1/4) Epoch 45, batch 1850, loss[loss=0.01138, audio_tagging_loss=0.01138, over 25000.00 frames. ], tot_loss[loss=0.01096, audio_tagging_loss=0.01096, over 4944007.16 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:29:33,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1410360.0, ans=0.0 2023-12-24 00:29:51,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1410426.6666666667, ans=0.0 2023-12-24 00:29:54,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.11 vs. limit=6.0 2023-12-24 00:30:00,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1410493.3333333333, ans=0.2 2023-12-24 00:30:14,390 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.493e+01 3.903e+01 4.080e+01 4.257e+01 5.183e+01, threshold=8.160e+01, percent-clipped=0.0 2023-12-24 00:30:21,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1410626.6666666667, ans=0.1 2023-12-24 00:30:24,983 INFO [train.py:886] (1/4) Epoch 45, batch 1900, loss[loss=0.01024, audio_tagging_loss=0.01024, over 24750.00 frames. ], tot_loss[loss=0.01102, audio_tagging_loss=0.01102, over 4936751.88 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 32.0 2023-12-24 00:30:38,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1410760.0, ans=0.125 2023-12-24 00:30:57,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=1410893.3333333333, ans=15.0 2023-12-24 00:31:08,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.03 vs. limit=15.0 2023-12-24 00:31:15,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1410960.0, ans=0.0 2023-12-24 00:31:16,975 INFO [train.py:886] (1/4) Epoch 45, batch 1950, loss[loss=0.008774, audio_tagging_loss=0.008774, over 24750.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4936521.39 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 32.0 2023-12-24 00:31:17,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1411026.6666666667, ans=0.07 2023-12-24 00:31:22,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1411026.6666666667, ans=0.0 2023-12-24 00:31:30,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1411093.3333333333, ans=0.0 2023-12-24 00:31:31,965 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.92 vs. limit=22.5 2023-12-24 00:31:32,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1411093.3333333333, ans=0.125 2023-12-24 00:31:37,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1411160.0, ans=0.125 2023-12-24 00:31:56,107 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.534e+01 3.819e+01 3.949e+01 4.162e+01 4.750e+01, threshold=7.898e+01, percent-clipped=0.0 2023-12-24 00:32:00,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1411293.3333333333, ans=0.125 2023-12-24 00:32:06,743 INFO [train.py:886] (1/4) Epoch 45, batch 2000, loss[loss=0.01177, audio_tagging_loss=0.01177, over 24750.00 frames. ], tot_loss[loss=0.01093, audio_tagging_loss=0.01093, over 4941046.05 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 32.0 2023-12-24 00:32:10,020 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.61 vs. limit=22.5 2023-12-24 00:32:17,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1411426.6666666667, ans=0.125 2023-12-24 00:32:31,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1411493.3333333333, ans=0.07 2023-12-24 00:32:34,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1411493.3333333333, ans=0.125 2023-12-24 00:32:39,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1411560.0, ans=0.1 2023-12-24 00:32:59,698 INFO [train.py:886] (1/4) Epoch 45, batch 2050, loss[loss=0.0103, audio_tagging_loss=0.0103, over 25000.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4947037.04 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:33:08,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1411760.0, ans=0.0 2023-12-24 00:33:39,579 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.466e+01 3.826e+01 3.966e+01 4.166e+01 5.288e+01, threshold=7.932e+01, percent-clipped=0.0 2023-12-24 00:33:42,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1411960.0, ans=0.0 2023-12-24 00:33:46,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1411960.0, ans=0.125 2023-12-24 00:33:51,014 INFO [train.py:886] (1/4) Epoch 45, batch 2100, loss[loss=0.0106, audio_tagging_loss=0.0106, over 25000.00 frames. ], tot_loss[loss=0.01093, audio_tagging_loss=0.01093, over 4946649.12 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:33:52,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1412026.6666666667, ans=0.125 2023-12-24 00:34:03,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1412093.3333333333, ans=0.1 2023-12-24 00:34:14,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1412160.0, ans=0.1 2023-12-24 00:34:33,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1412293.3333333333, ans=0.125 2023-12-24 00:34:43,224 INFO [train.py:886] (1/4) Epoch 45, batch 2150, loss[loss=0.01193, audio_tagging_loss=0.01193, over 24750.00 frames. ], tot_loss[loss=0.01094, audio_tagging_loss=0.01094, over 4952044.28 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:34:52,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1412426.6666666667, ans=0.0 2023-12-24 00:35:16,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1412560.0, ans=0.125 2023-12-24 00:35:17,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=1412560.0, ans=0.2 2023-12-24 00:35:18,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1412560.0, ans=0.125 2023-12-24 00:35:22,848 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.573e+01 3.885e+01 4.074e+01 4.292e+01 5.969e+01, threshold=8.149e+01, percent-clipped=0.0 2023-12-24 00:35:32,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1412626.6666666667, ans=0.125 2023-12-24 00:35:34,964 INFO [train.py:886] (1/4) Epoch 45, batch 2200, loss[loss=0.0109, audio_tagging_loss=0.0109, over 24750.00 frames. ], tot_loss[loss=0.01103, audio_tagging_loss=0.01103, over 4948399.14 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:35:43,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1412693.3333333333, ans=0.125 2023-12-24 00:35:45,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1412760.0, ans=0.1 2023-12-24 00:35:50,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1412760.0, ans=0.95 2023-12-24 00:35:59,483 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.78 vs. limit=22.5 2023-12-24 00:36:03,016 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.69 vs. limit=6.0 2023-12-24 00:36:03,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1412826.6666666667, ans=0.125 2023-12-24 00:36:16,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1412960.0, ans=0.125 2023-12-24 00:36:18,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1412960.0, ans=0.125 2023-12-24 00:36:20,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1412960.0, ans=0.0 2023-12-24 00:36:24,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1413026.6666666667, ans=0.125 2023-12-24 00:36:25,241 INFO [train.py:886] (1/4) Epoch 45, batch 2250, loss[loss=0.01055, audio_tagging_loss=0.01055, over 24750.00 frames. ], tot_loss[loss=0.01109, audio_tagging_loss=0.01109, over 4941782.06 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:36:34,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1413026.6666666667, ans=0.0 2023-12-24 00:36:39,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1413093.3333333333, ans=0.5 2023-12-24 00:36:40,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1413093.3333333333, ans=0.125 2023-12-24 00:36:54,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1413160.0, ans=0.1 2023-12-24 00:37:02,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1413226.6666666667, ans=0.0 2023-12-24 00:37:06,938 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.413e+01 3.863e+01 4.040e+01 4.242e+01 4.611e+01, threshold=8.080e+01, percent-clipped=0.0 2023-12-24 00:37:20,260 INFO [train.py:886] (1/4) Epoch 45, batch 2300, loss[loss=0.009639, audio_tagging_loss=0.009639, over 25000.00 frames. ], tot_loss[loss=0.0111, audio_tagging_loss=0.0111, over 4947613.00 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:37:30,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1413426.6666666667, ans=0.125 2023-12-24 00:37:47,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1413493.3333333333, ans=0.125 2023-12-24 00:37:58,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1413560.0, ans=0.0 2023-12-24 00:38:04,506 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.83 vs. limit=15.0 2023-12-24 00:38:12,019 INFO [train.py:886] (1/4) Epoch 45, batch 2350, loss[loss=0.009776, audio_tagging_loss=0.009776, over 25000.00 frames. ], tot_loss[loss=0.01103, audio_tagging_loss=0.01103, over 4948173.79 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:38:29,534 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.92 vs. limit=15.0 2023-12-24 00:38:33,126 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.57 vs. limit=15.0 2023-12-24 00:38:41,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1413893.3333333333, ans=0.0 2023-12-24 00:38:43,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1413893.3333333333, ans=0.1 2023-12-24 00:38:51,853 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.613e+01 3.856e+01 3.996e+01 4.169e+01 4.627e+01, threshold=7.993e+01, percent-clipped=0.0 2023-12-24 00:38:53,395 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.02 vs. limit=12.0 2023-12-24 00:39:02,977 INFO [train.py:886] (1/4) Epoch 45, batch 2400, loss[loss=0.01254, audio_tagging_loss=0.01254, over 25000.00 frames. ], tot_loss[loss=0.01096, audio_tagging_loss=0.01096, over 4948875.12 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:39:03,370 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.44 vs. limit=6.0 2023-12-24 00:39:06,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1414026.6666666667, ans=0.1 2023-12-24 00:39:09,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1414026.6666666667, ans=0.0 2023-12-24 00:39:16,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.10 vs. limit=22.5 2023-12-24 00:39:17,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1414093.3333333333, ans=0.1 2023-12-24 00:39:27,640 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.16 vs. limit=22.5 2023-12-24 00:39:29,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1414160.0, ans=0.1 2023-12-24 00:39:40,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1414226.6666666667, ans=0.0 2023-12-24 00:39:52,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1414293.3333333333, ans=0.2 2023-12-24 00:39:54,296 INFO [train.py:886] (1/4) Epoch 45, batch 2450, loss[loss=0.0114, audio_tagging_loss=0.0114, over 25000.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4955646.83 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:40:02,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1414360.0, ans=0.0 2023-12-24 00:40:16,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1414493.3333333333, ans=0.1 2023-12-24 00:40:23,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1414493.3333333333, ans=0.0 2023-12-24 00:40:27,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1414560.0, ans=0.2 2023-12-24 00:40:33,309 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.484e+01 3.936e+01 4.079e+01 4.274e+01 6.379e+01, threshold=8.158e+01, percent-clipped=0.0 2023-12-24 00:40:39,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1414626.6666666667, ans=0.025 2023-12-24 00:40:43,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1414693.3333333333, ans=0.125 2023-12-24 00:40:44,586 INFO [train.py:886] (1/4) Epoch 45, batch 2500, loss[loss=0.01237, audio_tagging_loss=0.01237, over 25000.00 frames. ], tot_loss[loss=0.01109, audio_tagging_loss=0.01109, over 4948740.13 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:40:59,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1414760.0, ans=0.1 2023-12-24 00:41:32,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1414960.0, ans=0.125 2023-12-24 00:41:36,877 INFO [train.py:886] (1/4) Epoch 45, batch 2550, loss[loss=0.0103, audio_tagging_loss=0.0103, over 24750.00 frames. ], tot_loss[loss=0.01111, audio_tagging_loss=0.01111, over 4946138.75 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:41:44,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1415026.6666666667, ans=0.2 2023-12-24 00:41:50,057 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.40 vs. limit=10.0 2023-12-24 00:41:53,905 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:42:13,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1415226.6666666667, ans=0.125 2023-12-24 00:42:17,204 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.482e+01 3.969e+01 4.108e+01 4.307e+01 5.107e+01, threshold=8.216e+01, percent-clipped=0.0 2023-12-24 00:42:28,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1415293.3333333333, ans=0.0 2023-12-24 00:42:29,763 INFO [train.py:886] (1/4) Epoch 45, batch 2600, loss[loss=0.01193, audio_tagging_loss=0.01193, over 25000.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4952217.78 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:42:30,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1415360.0, ans=0.0 2023-12-24 00:42:41,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1415426.6666666667, ans=0.0 2023-12-24 00:42:49,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1415493.3333333333, ans=0.0 2023-12-24 00:43:03,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1415560.0, ans=0.125 2023-12-24 00:43:08,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1415560.0, ans=0.125 2023-12-24 00:43:20,989 INFO [train.py:886] (1/4) Epoch 45, batch 2650, loss[loss=0.009462, audio_tagging_loss=0.009462, over 25000.00 frames. ], tot_loss[loss=0.01104, audio_tagging_loss=0.01104, over 4956743.71 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:43:37,855 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.40 vs. limit=15.0 2023-12-24 00:43:49,183 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.84 vs. limit=15.0 2023-12-24 00:43:54,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1415893.3333333333, ans=0.0 2023-12-24 00:43:55,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1415893.3333333333, ans=0.0 2023-12-24 00:44:02,334 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.298e+01 3.813e+01 3.941e+01 4.164e+01 4.704e+01, threshold=7.881e+01, percent-clipped=0.0 2023-12-24 00:44:13,729 INFO [train.py:886] (1/4) Epoch 45, batch 2700, loss[loss=0.01174, audio_tagging_loss=0.01174, over 24750.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4954541.53 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:44:48,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1416226.6666666667, ans=0.125 2023-12-24 00:44:52,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1416226.6666666667, ans=0.2 2023-12-24 00:44:54,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1416293.3333333333, ans=0.125 2023-12-24 00:45:05,369 INFO [train.py:886] (1/4) Epoch 45, batch 2750, loss[loss=0.01231, audio_tagging_loss=0.01231, over 25000.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4956595.99 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:45:08,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1416360.0, ans=0.125 2023-12-24 00:45:10,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1416360.0, ans=0.125 2023-12-24 00:45:33,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1416493.3333333333, ans=0.2 2023-12-24 00:45:46,095 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.540e+01 3.826e+01 4.053e+01 4.238e+01 4.704e+01, threshold=8.107e+01, percent-clipped=0.0 2023-12-24 00:45:56,555 INFO [train.py:886] (1/4) Epoch 45, batch 2800, loss[loss=0.01137, audio_tagging_loss=0.01137, over 24750.00 frames. ], tot_loss[loss=0.01098, audio_tagging_loss=0.01098, over 4955220.08 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:45:56,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1416693.3333333333, ans=0.125 2023-12-24 00:46:00,700 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.04 vs. limit=6.0 2023-12-24 00:46:22,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1416826.6666666667, ans=0.05 2023-12-24 00:46:34,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1416893.3333333333, ans=0.125 2023-12-24 00:46:49,810 INFO [train.py:886] (1/4) Epoch 45, batch 2850, loss[loss=0.00859, audio_tagging_loss=0.00859, over 24750.00 frames. ], tot_loss[loss=0.01105, audio_tagging_loss=0.01105, over 4952526.77 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:47:00,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1417093.3333333333, ans=0.05 2023-12-24 00:47:16,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1417160.0, ans=0.2 2023-12-24 00:47:23,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1417226.6666666667, ans=0.125 2023-12-24 00:47:23,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1417226.6666666667, ans=0.2 2023-12-24 00:47:29,706 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.527e+01 3.915e+01 4.123e+01 4.264e+01 5.597e+01, threshold=8.246e+01, percent-clipped=0.0 2023-12-24 00:47:40,342 INFO [train.py:886] (1/4) Epoch 45, batch 2900, loss[loss=0.01196, audio_tagging_loss=0.01196, over 24750.00 frames. ], tot_loss[loss=0.011, audio_tagging_loss=0.011, over 4947650.07 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:47:40,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1417360.0, ans=0.1 2023-12-24 00:47:57,740 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.39 vs. limit=15.0 2023-12-24 00:48:07,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1417493.3333333333, ans=0.07 2023-12-24 00:48:07,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1417493.3333333333, ans=0.95 2023-12-24 00:48:27,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1417626.6666666667, ans=0.2 2023-12-24 00:48:28,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1417626.6666666667, ans=0.125 2023-12-24 00:48:32,215 INFO [train.py:886] (1/4) Epoch 45, batch 2950, loss[loss=0.009807, audio_tagging_loss=0.009807, over 25000.00 frames. ], tot_loss[loss=0.01094, audio_tagging_loss=0.01094, over 4947081.07 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:48:54,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1417826.6666666667, ans=0.125 2023-12-24 00:49:02,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1417893.3333333333, ans=0.025 2023-12-24 00:49:10,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1417893.3333333333, ans=0.1 2023-12-24 00:49:10,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1417893.3333333333, ans=0.125 2023-12-24 00:49:11,470 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.530e+01 3.805e+01 3.987e+01 4.227e+01 4.629e+01, threshold=7.974e+01, percent-clipped=0.0 2023-12-24 00:49:23,969 INFO [train.py:886] (1/4) Epoch 45, batch 3000, loss[loss=0.008632, audio_tagging_loss=0.008632, over 25000.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4954548.90 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:49:23,969 INFO [train.py:909] (1/4) Computing validation loss 2023-12-24 00:49:45,382 INFO [train.py:917] (1/4) Epoch 45, validation: loss=0.03669, audio_tagging_loss=0.03669, over 3737520.00 frames. 2023-12-24 00:49:45,382 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-24 00:49:47,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1418026.6666666667, ans=0.04949747468305833 2023-12-24 00:50:11,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1418160.0, ans=0.0 2023-12-24 00:50:17,114 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.83 vs. limit=22.5 2023-12-24 00:50:26,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1418293.3333333333, ans=0.09899494936611666 2023-12-24 00:50:32,056 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2023-12-24 00:50:36,121 INFO [train.py:886] (1/4) Epoch 45, batch 3050, loss[loss=0.01337, audio_tagging_loss=0.01337, over 25000.00 frames. ], tot_loss[loss=0.01092, audio_tagging_loss=0.01092, over 4954207.41 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:50:41,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1418360.0, ans=0.0 2023-12-24 00:50:45,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1418426.6666666667, ans=0.0 2023-12-24 00:51:05,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1418560.0, ans=0.125 2023-12-24 00:51:15,234 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.535e+01 3.857e+01 4.011e+01 4.213e+01 4.793e+01, threshold=8.022e+01, percent-clipped=0.0 2023-12-24 00:51:24,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1418626.6666666667, ans=0.125 2023-12-24 00:51:27,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1418693.3333333333, ans=0.125 2023-12-24 00:51:27,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1418693.3333333333, ans=0.125 2023-12-24 00:51:28,653 INFO [train.py:886] (1/4) Epoch 45, batch 3100, loss[loss=0.01057, audio_tagging_loss=0.01057, over 24750.00 frames. ], tot_loss[loss=0.01091, audio_tagging_loss=0.01091, over 4957184.21 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:52:18,793 INFO [train.py:886] (1/4) Epoch 45, batch 3150, loss[loss=0.01201, audio_tagging_loss=0.01201, over 24750.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4950749.62 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:52:23,621 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.79 vs. limit=15.0 2023-12-24 00:52:32,002 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.54 vs. limit=15.0 2023-12-24 00:52:35,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1419093.3333333333, ans=0.2 2023-12-24 00:52:36,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1419093.3333333333, ans=0.125 2023-12-24 00:52:38,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1419093.3333333333, ans=0.0 2023-12-24 00:53:00,683 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.707e+01 3.938e+01 4.100e+01 4.263e+01 4.919e+01, threshold=8.199e+01, percent-clipped=0.0 2023-12-24 00:53:01,904 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:53:07,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1419293.3333333333, ans=0.07 2023-12-24 00:53:11,859 INFO [train.py:886] (1/4) Epoch 45, batch 3200, loss[loss=0.01089, audio_tagging_loss=0.01089, over 24750.00 frames. ], tot_loss[loss=0.011, audio_tagging_loss=0.011, over 4948412.13 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:53:28,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1419426.6666666667, ans=0.1 2023-12-24 00:53:35,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1419493.3333333333, ans=0.2 2023-12-24 00:53:39,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=1419493.3333333333, ans=0.2 2023-12-24 00:53:40,528 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.84 vs. limit=15.0 2023-12-24 00:54:03,207 INFO [train.py:886] (1/4) Epoch 45, batch 3250, loss[loss=0.009289, audio_tagging_loss=0.009289, over 25000.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4951487.30 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:54:18,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1419760.0, ans=0.125 2023-12-24 00:54:43,802 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.456e+01 3.834e+01 4.007e+01 4.238e+01 5.618e+01, threshold=8.014e+01, percent-clipped=0.0 2023-12-24 00:54:45,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.26 vs. limit=15.0 2023-12-24 00:54:55,295 INFO [train.py:886] (1/4) Epoch 45, batch 3300, loss[loss=0.0128, audio_tagging_loss=0.0128, over 25000.00 frames. ], tot_loss[loss=0.01088, audio_tagging_loss=0.01088, over 4952048.77 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:54:56,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1420026.6666666667, ans=0.125 2023-12-24 00:55:13,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1420093.3333333333, ans=0.125 2023-12-24 00:55:46,846 INFO [train.py:886] (1/4) Epoch 45, batch 3350, loss[loss=0.01144, audio_tagging_loss=0.01144, over 25000.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4955034.47 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:55:49,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1420360.0, ans=0.125 2023-12-24 00:55:56,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1420426.6666666667, ans=0.125 2023-12-24 00:55:58,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1420426.6666666667, ans=0.0 2023-12-24 00:56:06,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1420493.3333333333, ans=0.125 2023-12-24 00:56:14,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1420493.3333333333, ans=0.125 2023-12-24 00:56:17,656 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.01 vs. limit=15.0 2023-12-24 00:56:26,531 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.535e+01 3.924e+01 4.094e+01 4.263e+01 4.631e+01, threshold=8.187e+01, percent-clipped=0.0 2023-12-24 00:56:36,981 INFO [train.py:886] (1/4) Epoch 45, batch 3400, loss[loss=0.009593, audio_tagging_loss=0.009593, over 21352.00 frames. ], tot_loss[loss=0.01093, audio_tagging_loss=0.01093, over 4952478.52 frames. ], batch size: 107, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:56:37,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1420693.3333333333, ans=0.2 2023-12-24 00:56:42,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1420693.3333333333, ans=0.125 2023-12-24 00:57:07,567 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:57:13,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=1420893.3333333333, ans=10.0 2023-12-24 00:57:23,463 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.77 vs. limit=15.0 2023-12-24 00:57:29,308 INFO [train.py:886] (1/4) Epoch 45, batch 3450, loss[loss=0.01143, audio_tagging_loss=0.01143, over 24750.00 frames. ], tot_loss[loss=0.01105, audio_tagging_loss=0.01105, over 4946577.75 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:57:40,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1421093.3333333333, ans=0.125 2023-12-24 00:57:52,210 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2023-12-24 00:58:08,513 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.540e+01 3.963e+01 4.132e+01 4.315e+01 4.821e+01, threshold=8.264e+01, percent-clipped=0.0 2023-12-24 00:58:20,524 INFO [train.py:886] (1/4) Epoch 45, batch 3500, loss[loss=0.01091, audio_tagging_loss=0.01091, over 24750.00 frames. ], tot_loss[loss=0.01109, audio_tagging_loss=0.01109, over 4941927.35 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:58:22,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1421360.0, ans=0.1 2023-12-24 00:58:31,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1421426.6666666667, ans=0.125 2023-12-24 00:59:10,980 INFO [train.py:886] (1/4) Epoch 45, batch 3550, loss[loss=0.009974, audio_tagging_loss=0.009974, over 25000.00 frames. ], tot_loss[loss=0.01102, audio_tagging_loss=0.01102, over 4942135.65 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:59:17,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1421693.3333333333, ans=0.2 2023-12-24 00:59:26,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1421760.0, ans=0.0 2023-12-24 00:59:32,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=1421826.6666666667, ans=0.05 2023-12-24 00:59:50,933 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.379e+01 3.787e+01 4.000e+01 4.230e+01 4.921e+01, threshold=7.999e+01, percent-clipped=0.0 2023-12-24 00:59:55,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1421960.0, ans=0.2 2023-12-24 01:00:02,149 INFO [train.py:886] (1/4) Epoch 45, batch 3600, loss[loss=0.01117, audio_tagging_loss=0.01117, over 25000.00 frames. ], tot_loss[loss=0.01087, audio_tagging_loss=0.01087, over 4943343.00 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 01:00:06,583 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.71 vs. limit=15.0 2023-12-24 01:00:07,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1422026.6666666667, ans=0.2 2023-12-24 01:00:20,006 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:00:23,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.47 vs. limit=22.5 2023-12-24 01:00:34,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1422226.6666666667, ans=10.0 2023-12-24 01:00:53,736 INFO [train.py:886] (1/4) Epoch 45, batch 3650, loss[loss=0.01223, audio_tagging_loss=0.01223, over 24750.00 frames. ], tot_loss[loss=0.01075, audio_tagging_loss=0.01075, over 4946074.99 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 01:00:56,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1422360.0, ans=0.5 2023-12-24 01:01:05,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1422426.6666666667, ans=0.125 2023-12-24 01:01:10,217 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=22.5 2023-12-24 01:01:34,294 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.468e+01 3.852e+01 3.987e+01 4.174e+01 4.561e+01, threshold=7.974e+01, percent-clipped=0.0 2023-12-24 01:01:37,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1422626.6666666667, ans=0.125 2023-12-24 01:01:44,935 INFO [train.py:886] (1/4) Epoch 45, batch 3700, loss[loss=0.01082, audio_tagging_loss=0.01082, over 25000.00 frames. ], tot_loss[loss=0.01079, audio_tagging_loss=0.01079, over 4950431.69 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 64.0 2023-12-24 01:01:46,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1422693.3333333333, ans=0.1 2023-12-24 01:01:55,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1422760.0, ans=0.125 2023-12-24 01:02:00,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1422760.0, ans=0.0 2023-12-24 01:02:24,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1422960.0, ans=0.125 2023-12-24 01:02:37,424 INFO [train.py:886] (1/4) Epoch 45, batch 3750, loss[loss=0.01278, audio_tagging_loss=0.01278, over 24750.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4953163.54 frames. ], batch size: 99, lr: 2.37e-03, grad_scale: 64.0 2023-12-24 01:02:38,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1423026.6666666667, ans=0.125 2023-12-24 01:02:50,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1423093.3333333333, ans=0.125 2023-12-24 01:03:07,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1423160.0, ans=0.125 2023-12-24 01:03:09,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1423226.6666666667, ans=0.0 2023-12-24 01:03:11,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1423226.6666666667, ans=0.1 2023-12-24 01:03:17,400 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.344e+01 3.874e+01 4.070e+01 4.272e+01 4.635e+01, threshold=8.140e+01, percent-clipped=0.0 2023-12-24 01:03:28,537 INFO [train.py:886] (1/4) Epoch 45, batch 3800, loss[loss=0.01042, audio_tagging_loss=0.01042, over 24750.00 frames. ], tot_loss[loss=0.01105, audio_tagging_loss=0.01105, over 4948210.60 frames. ], batch size: 99, lr: 2.37e-03, grad_scale: 64.0 2023-12-24 01:03:30,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1423360.0, ans=0.125 2023-12-24 01:03:44,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1423426.6666666667, ans=0.125 2023-12-24 01:04:03,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1423560.0, ans=0.0 2023-12-24 01:04:20,880 INFO [train.py:886] (1/4) Epoch 45, batch 3850, loss[loss=0.009868, audio_tagging_loss=0.009868, over 24750.00 frames. ], tot_loss[loss=0.01098, audio_tagging_loss=0.01098, over 4947760.56 frames. ], batch size: 99, lr: 2.37e-03, grad_scale: 64.0 2023-12-24 01:04:31,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1423760.0, ans=0.07 2023-12-24 01:04:46,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1423826.6666666667, ans=0.125 2023-12-24 01:04:48,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=1423826.6666666667, ans=0.2 2023-12-24 01:04:59,991 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.544e+01 3.849e+01 4.042e+01 4.188e+01 4.936e+01, threshold=8.085e+01, percent-clipped=0.0 2023-12-24 01:05:12,424 INFO [train.py:886] (1/4) Epoch 45, batch 3900, loss[loss=0.01019, audio_tagging_loss=0.01019, over 25000.00 frames. ], tot_loss[loss=0.01086, audio_tagging_loss=0.01086, over 4948553.23 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 64.0 2023-12-24 01:05:17,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.96 vs. limit=15.0 2023-12-24 01:05:21,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1424093.3333333333, ans=0.2 2023-12-24 01:05:26,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1424093.3333333333, ans=0.1 2023-12-24 01:05:51,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1424293.3333333333, ans=0.125 2023-12-24 01:06:01,352 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=15.0 2023-12-24 01:06:01,967 INFO [train.py:886] (1/4) Epoch 45, batch 3950, loss[loss=0.009846, audio_tagging_loss=0.009846, over 24750.00 frames. ], tot_loss[loss=0.01084, audio_tagging_loss=0.01084, over 4953617.27 frames. ], batch size: 99, lr: 2.37e-03, grad_scale: 64.0 2023-12-24 01:06:10,825 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.34 vs. limit=15.0 2023-12-24 01:06:14,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1424426.6666666667, ans=0.125 2023-12-24 01:06:33,394 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.37 vs. limit=15.0 2023-12-24 01:06:35,235 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.23 vs. limit=22.5 2023-12-24 01:06:42,077 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.512e+01 3.875e+01 4.019e+01 4.169e+01 5.128e+01, threshold=8.038e+01, percent-clipped=0.0 2023-12-24 01:06:53,886 INFO [train.py:886] (1/4) Epoch 45, batch 4000, loss[loss=0.01242, audio_tagging_loss=0.01242, over 25000.00 frames. ], tot_loss[loss=0.01089, audio_tagging_loss=0.01089, over 4956449.35 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 64.0 2023-12-24 01:06:57,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1424693.3333333333, ans=0.0 2023-12-24 01:07:01,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1424693.3333333333, ans=0.125 2023-12-24 01:07:03,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1424760.0, ans=10.0 2023-12-24 01:07:20,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1424826.6666666667, ans=0.0 2023-12-24 01:07:43,270 INFO [train.py:886] (1/4) Epoch 45, batch 4050, loss[loss=0.01162, audio_tagging_loss=0.01162, over 24750.00 frames. ], tot_loss[loss=0.01103, audio_tagging_loss=0.01103, over 4961292.21 frames. ], batch size: 99, lr: 2.37e-03, grad_scale: 64.0 2023-12-24 01:07:57,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1425093.3333333333, ans=0.0 2023-12-24 01:08:00,650 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:08:02,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1425093.3333333333, ans=0.125 2023-12-24 01:08:12,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1425160.0, ans=0.125 2023-12-24 01:08:23,526 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:08:25,215 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.455e+01 3.868e+01 4.107e+01 4.296e+01 5.422e+01, threshold=8.214e+01, percent-clipped=0.0 2023-12-24 01:08:34,918 INFO [train.py:886] (1/4) Epoch 45, batch 4100, loss[loss=0.01247, audio_tagging_loss=0.01247, over 25000.00 frames. ], tot_loss[loss=0.01114, audio_tagging_loss=0.01114, over 4958958.01 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 64.0 2023-12-24 01:08:40,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1425360.0, ans=0.125 2023-12-24 01:08:51,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1425426.6666666667, ans=0.025 2023-12-24 01:08:53,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1425426.6666666667, ans=0.125 2023-12-24 01:09:09,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1425560.0, ans=0.0 2023-12-24 01:09:12,345 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.18 vs. limit=15.0 2023-12-24 01:09:14,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1425560.0, ans=0.0 2023-12-24 01:09:25,787 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.15 vs. limit=22.5 2023-12-24 01:09:27,206 INFO [train.py:886] (1/4) Epoch 45, batch 4150, loss[loss=0.01242, audio_tagging_loss=0.01242, over 24750.00 frames. ], tot_loss[loss=0.0111, audio_tagging_loss=0.0111, over 4949069.89 frames. ], batch size: 99, lr: 2.37e-03, grad_scale: 64.0 2023-12-24 01:09:29,198 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:09:39,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1425760.0, ans=0.0 2023-12-24 01:09:49,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1425826.6666666667, ans=0.125 2023-12-24 01:10:06,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1425960.0, ans=0.125 2023-12-24 01:10:07,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.60 vs. limit=15.0 2023-12-24 01:10:07,925 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.08 vs. limit=12.0 2023-12-24 01:10:08,340 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.433e+01 3.920e+01 4.056e+01 4.290e+01 4.928e+01, threshold=8.113e+01, percent-clipped=0.0 2023-12-24 01:10:16,933 INFO [train.py:886] (1/4) Epoch 45, batch 4200, loss[loss=0.01016, audio_tagging_loss=0.01016, over 24750.00 frames. ], tot_loss[loss=0.01102, audio_tagging_loss=0.01102, over 4951952.37 frames. ], batch size: 99, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:10:19,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1426026.6666666667, ans=0.0 2023-12-24 01:10:20,327 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=15.0 2023-12-24 01:10:29,920 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:10:40,697 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.90 vs. limit=15.0 2023-12-24 01:11:08,473 INFO [train.py:886] (1/4) Epoch 45, batch 4250, loss[loss=0.01313, audio_tagging_loss=0.01313, over 24750.00 frames. ], tot_loss[loss=0.01091, audio_tagging_loss=0.01091, over 4952243.38 frames. ], batch size: 99, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:11:21,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1426426.6666666667, ans=0.0 2023-12-24 01:11:29,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1426493.3333333333, ans=0.125 2023-12-24 01:11:33,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1426493.3333333333, ans=0.0 2023-12-24 01:11:36,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1426493.3333333333, ans=0.1 2023-12-24 01:11:46,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1426560.0, ans=0.125 2023-12-24 01:11:47,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1426626.6666666667, ans=0.125 2023-12-24 01:11:49,248 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.608e+01 3.890e+01 4.019e+01 4.191e+01 4.680e+01, threshold=8.038e+01, percent-clipped=0.0 2023-12-24 01:11:58,749 INFO [train.py:886] (1/4) Epoch 45, batch 4300, loss[loss=0.01363, audio_tagging_loss=0.01363, over 25000.00 frames. ], tot_loss[loss=0.01091, audio_tagging_loss=0.01091, over 4957862.46 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:12:13,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1426760.0, ans=0.0 2023-12-24 01:12:20,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1426826.6666666667, ans=0.125 2023-12-24 01:12:23,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1426826.6666666667, ans=0.0 2023-12-24 01:12:29,638 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.43 vs. limit=12.0 2023-12-24 01:12:39,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1426960.0, ans=0.125 2023-12-24 01:12:48,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1426960.0, ans=0.125 2023-12-24 01:12:50,107 INFO [train.py:886] (1/4) Epoch 45, batch 4350, loss[loss=0.0125, audio_tagging_loss=0.0125, over 25000.00 frames. ], tot_loss[loss=0.01098, audio_tagging_loss=0.01098, over 4958003.84 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:12:57,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1427026.6666666667, ans=0.0 2023-12-24 01:13:00,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.02 vs. limit=15.0 2023-12-24 01:13:20,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1427226.6666666667, ans=0.125 2023-12-24 01:13:28,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1427226.6666666667, ans=0.0 2023-12-24 01:13:29,284 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.55 vs. limit=15.0 2023-12-24 01:13:31,698 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.626e+01 3.980e+01 4.130e+01 4.328e+01 5.553e+01, threshold=8.260e+01, percent-clipped=0.0 2023-12-24 01:13:31,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1427293.3333333333, ans=0.2 2023-12-24 01:13:43,006 INFO [train.py:886] (1/4) Epoch 45, batch 4400, loss[loss=0.01022, audio_tagging_loss=0.01022, over 24750.00 frames. ], tot_loss[loss=0.01101, audio_tagging_loss=0.01101, over 4951058.51 frames. ], batch size: 99, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:13:50,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1427360.0, ans=0.1 2023-12-24 01:13:54,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1427426.6666666667, ans=0.0 2023-12-24 01:14:12,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1427560.0, ans=0.125 2023-12-24 01:14:15,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1427560.0, ans=0.125 2023-12-24 01:14:17,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1427560.0, ans=0.125 2023-12-24 01:14:21,115 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.71 vs. limit=22.5 2023-12-24 01:14:26,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1427626.6666666667, ans=0.0 2023-12-24 01:14:30,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1427626.6666666667, ans=0.125 2023-12-24 01:14:32,959 INFO [train.py:886] (1/4) Epoch 45, batch 4450, loss[loss=0.01027, audio_tagging_loss=0.01027, over 25000.00 frames. ], tot_loss[loss=0.01102, audio_tagging_loss=0.01102, over 4942373.38 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:14:35,056 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:14:54,642 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.37 vs. limit=6.0 2023-12-24 01:15:01,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1427826.6666666667, ans=0.125 2023-12-24 01:15:15,499 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.412e+01 3.945e+01 4.084e+01 4.273e+01 5.400e+01, threshold=8.168e+01, percent-clipped=0.0 2023-12-24 01:15:21,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1427960.0, ans=0.1 2023-12-24 01:15:24,902 INFO [train.py:886] (1/4) Epoch 45, batch 4500, loss[loss=0.01132, audio_tagging_loss=0.01132, over 25000.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4941609.30 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:15:45,215 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.76 vs. limit=22.5 2023-12-24 01:15:46,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1428160.0, ans=0.05 2023-12-24 01:15:49,907 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.06 vs. limit=12.0 2023-12-24 01:16:15,452 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:16:17,045 INFO [train.py:886] (1/4) Epoch 45, batch 4550, loss[loss=0.0103, audio_tagging_loss=0.0103, over 25000.00 frames. ], tot_loss[loss=0.01087, audio_tagging_loss=0.01087, over 4945841.00 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:16:31,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1428426.6666666667, ans=0.125 2023-12-24 01:16:42,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1428493.3333333333, ans=0.2 2023-12-24 01:16:53,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1428560.0, ans=10.0 2023-12-24 01:16:57,736 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.89 vs. limit=15.0 2023-12-24 01:16:59,850 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.365e+01 3.907e+01 4.021e+01 4.185e+01 4.612e+01, threshold=8.042e+01, percent-clipped=0.0 2023-12-24 01:17:05,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1428626.6666666667, ans=0.1 2023-12-24 01:17:08,556 INFO [train.py:886] (1/4) Epoch 45, batch 4600, loss[loss=0.008836, audio_tagging_loss=0.008836, over 21804.00 frames. ], tot_loss[loss=0.01081, audio_tagging_loss=0.01081, over 4949752.59 frames. ], batch size: 107, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:17:22,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1428760.0, ans=0.1 2023-12-24 01:17:26,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1428760.0, ans=0.125 2023-12-24 01:17:31,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1428826.6666666667, ans=0.07 2023-12-24 01:17:58,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=1428960.0, ans=0.1 2023-12-24 01:17:58,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1428960.0, ans=0.1 2023-12-24 01:18:00,689 INFO [train.py:886] (1/4) Epoch 45, batch 4650, loss[loss=0.0123, audio_tagging_loss=0.0123, over 25000.00 frames. ], tot_loss[loss=0.01089, audio_tagging_loss=0.01089, over 4954843.69 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:18:00,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1429026.6666666667, ans=0.07 2023-12-24 01:18:08,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1429026.6666666667, ans=0.0 2023-12-24 01:18:09,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1429026.6666666667, ans=0.1 2023-12-24 01:18:42,117 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.658e+01 3.928e+01 4.124e+01 4.353e+01 4.841e+01, threshold=8.247e+01, percent-clipped=0.0 2023-12-24 01:18:50,472 INFO [train.py:886] (1/4) Epoch 45, batch 4700, loss[loss=0.01413, audio_tagging_loss=0.01413, over 24750.00 frames. ], tot_loss[loss=0.01104, audio_tagging_loss=0.01104, over 4955682.60 frames. ], batch size: 99, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:19:02,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1429426.6666666667, ans=0.1 2023-12-24 01:19:37,547 INFO [train.py:886] (1/4) Epoch 45, batch 4750, loss[loss=0.01101, audio_tagging_loss=0.01101, over 24750.00 frames. ], tot_loss[loss=0.0111, audio_tagging_loss=0.0111, over 4955399.43 frames. ], batch size: 99, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:20:09,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1429800.0, ans=0.0 2023-12-24 01:20:09,676 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=15.0 2023-12-24 01:20:13,330 INFO [train.py:886] (1/4) Epoch 46, batch 0, loss[loss=0.02151, audio_tagging_loss=0.02151, over 24104.00 frames. ], tot_loss[loss=0.02151, audio_tagging_loss=0.02151, over 24104.00 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:20:13,330 INFO [train.py:909] (1/4) Computing validation loss 2023-12-24 01:20:24,852 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.1128, 3.6196, 3.8866, 3.9134], device='cuda:1') 2023-12-24 01:20:34,594 INFO [train.py:917] (1/4) Epoch 46, validation: loss=0.03601, audio_tagging_loss=0.03601, over 3737520.00 frames. 2023-12-24 01:20:34,594 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-24 01:20:43,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=1429866.6666666667, ans=0.1 2023-12-24 01:21:00,402 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.422e+01 4.025e+01 4.232e+01 5.097e+01 1.112e+02, threshold=8.463e+01, percent-clipped=5.0 2023-12-24 01:21:02,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1429933.3333333333, ans=0.0 2023-12-24 01:21:06,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1430000.0, ans=0.125 2023-12-24 01:21:06,568 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.33 vs. limit=10.0 2023-12-24 01:21:16,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1430066.6666666667, ans=0.0 2023-12-24 01:21:25,406 INFO [train.py:886] (1/4) Epoch 46, batch 50, loss[loss=0.01684, audio_tagging_loss=0.01684, over 25000.00 frames. ], tot_loss[loss=0.01784, audio_tagging_loss=0.01784, over 1119514.66 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:21:32,273 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.84 vs. limit=15.0 2023-12-24 01:21:40,545 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.57 vs. limit=15.0 2023-12-24 01:21:53,308 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:22:09,727 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:22:12,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1430400.0, ans=0.07 2023-12-24 01:22:13,909 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.55 vs. limit=15.0 2023-12-24 01:22:17,819 INFO [train.py:886] (1/4) Epoch 46, batch 100, loss[loss=0.0115, audio_tagging_loss=0.0115, over 25000.00 frames. ], tot_loss[loss=0.01535, audio_tagging_loss=0.01535, over 1973106.67 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:22:29,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1430533.3333333333, ans=0.0 2023-12-24 01:22:43,335 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.772e+01 4.262e+01 4.601e+01 4.856e+01 5.800e+01, threshold=9.203e+01, percent-clipped=0.0 2023-12-24 01:22:43,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1430600.0, ans=0.0 2023-12-24 01:22:47,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1430666.6666666667, ans=0.125 2023-12-24 01:23:09,996 INFO [train.py:886] (1/4) Epoch 46, batch 150, loss[loss=0.009931, audio_tagging_loss=0.009931, over 21762.00 frames. ], tot_loss[loss=0.01394, audio_tagging_loss=0.01394, over 2634704.43 frames. ], batch size: 107, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:23:30,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1430933.3333333333, ans=0.0 2023-12-24 01:23:56,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1431066.6666666667, ans=0.125 2023-12-24 01:24:01,175 INFO [train.py:886] (1/4) Epoch 46, batch 200, loss[loss=0.01388, audio_tagging_loss=0.01388, over 25000.00 frames. ], tot_loss[loss=0.01312, audio_tagging_loss=0.01312, over 3147690.35 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:24:06,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1431133.3333333333, ans=0.125 2023-12-24 01:24:12,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1431200.0, ans=0.1 2023-12-24 01:24:19,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1431200.0, ans=0.0 2023-12-24 01:24:26,296 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.683e+01 3.918e+01 4.124e+01 4.291e+01 5.491e+01, threshold=8.249e+01, percent-clipped=0.0 2023-12-24 01:24:27,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1431266.6666666667, ans=0.2 2023-12-24 01:24:31,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1431333.3333333333, ans=0.2 2023-12-24 01:24:51,868 INFO [train.py:886] (1/4) Epoch 46, batch 250, loss[loss=0.009686, audio_tagging_loss=0.009686, over 25000.00 frames. ], tot_loss[loss=0.01259, audio_tagging_loss=0.01259, over 3550251.70 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:24:57,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1431466.6666666667, ans=0.0 2023-12-24 01:25:07,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1431533.3333333333, ans=0.5 2023-12-24 01:25:23,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1431666.6666666667, ans=0.1 2023-12-24 01:25:25,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1431666.6666666667, ans=0.125 2023-12-24 01:25:28,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1431666.6666666667, ans=0.125 2023-12-24 01:25:39,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=1431733.3333333333, ans=0.05 2023-12-24 01:25:42,514 INFO [train.py:886] (1/4) Epoch 46, batch 300, loss[loss=0.01049, audio_tagging_loss=0.01049, over 24750.00 frames. ], tot_loss[loss=0.01218, audio_tagging_loss=0.01218, over 3859630.45 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:25:45,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1431800.0, ans=0.125 2023-12-24 01:25:46,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1431800.0, ans=0.0 2023-12-24 01:26:02,026 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.00 vs. limit=22.5 2023-12-24 01:26:08,049 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.528e+01 3.903e+01 4.066e+01 4.292e+01 4.827e+01, threshold=8.132e+01, percent-clipped=0.0 2023-12-24 01:26:10,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1431933.3333333333, ans=0.04949747468305833 2023-12-24 01:26:10,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1431933.3333333333, ans=0.125 2023-12-24 01:26:28,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1432066.6666666667, ans=0.125 2023-12-24 01:26:33,591 INFO [train.py:886] (1/4) Epoch 46, batch 350, loss[loss=0.01254, audio_tagging_loss=0.01254, over 24750.00 frames. ], tot_loss[loss=0.0121, audio_tagging_loss=0.0121, over 4093878.40 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:26:33,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1432133.3333333333, ans=0.2 2023-12-24 01:26:39,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1432133.3333333333, ans=0.1 2023-12-24 01:26:43,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1432200.0, ans=0.1 2023-12-24 01:27:04,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1432333.3333333333, ans=0.95 2023-12-24 01:27:06,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1432333.3333333333, ans=0.125 2023-12-24 01:27:07,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1432333.3333333333, ans=0.0 2023-12-24 01:27:21,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1432400.0, ans=0.0 2023-12-24 01:27:26,609 INFO [train.py:886] (1/4) Epoch 46, batch 400, loss[loss=0.008687, audio_tagging_loss=0.008687, over 25000.00 frames. ], tot_loss[loss=0.01172, audio_tagging_loss=0.01172, over 4286313.42 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:27:52,250 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.333e+01 3.856e+01 4.044e+01 4.244e+01 4.925e+01, threshold=8.088e+01, percent-clipped=0.0 2023-12-24 01:27:57,183 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.37 vs. limit=15.0 2023-12-24 01:28:01,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1432666.6666666667, ans=0.125 2023-12-24 01:28:07,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1432733.3333333333, ans=0.2 2023-12-24 01:28:07,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1432733.3333333333, ans=0.0 2023-12-24 01:28:17,147 INFO [train.py:886] (1/4) Epoch 46, batch 450, loss[loss=0.01065, audio_tagging_loss=0.01065, over 25000.00 frames. ], tot_loss[loss=0.01149, audio_tagging_loss=0.01149, over 4435598.07 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:28:32,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.37 vs. limit=15.0 2023-12-24 01:28:37,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1432933.3333333333, ans=0.125 2023-12-24 01:28:49,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1433000.0, ans=0.125 2023-12-24 01:29:05,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1433066.6666666667, ans=0.0 2023-12-24 01:29:09,438 INFO [train.py:886] (1/4) Epoch 46, batch 500, loss[loss=0.01045, audio_tagging_loss=0.01045, over 25000.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4554005.20 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:29:35,997 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.577e+01 3.884e+01 4.049e+01 4.174e+01 4.739e+01, threshold=8.098e+01, percent-clipped=0.0 2023-12-24 01:30:01,318 INFO [train.py:886] (1/4) Epoch 46, batch 550, loss[loss=0.01182, audio_tagging_loss=0.01182, over 24750.00 frames. ], tot_loss[loss=0.01124, audio_tagging_loss=0.01124, over 4648024.58 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:30:04,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1433466.6666666667, ans=0.0 2023-12-24 01:30:22,779 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:30:43,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1433733.3333333333, ans=0.015 2023-12-24 01:30:48,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.50 vs. limit=15.0 2023-12-24 01:30:52,359 INFO [train.py:886] (1/4) Epoch 46, batch 600, loss[loss=0.0117, audio_tagging_loss=0.0117, over 24750.00 frames. ], tot_loss[loss=0.0112, audio_tagging_loss=0.0112, over 4718454.58 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:30:59,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=22.5 2023-12-24 01:31:02,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1433866.6666666667, ans=0.125 2023-12-24 01:31:03,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1433866.6666666667, ans=0.125 2023-12-24 01:31:08,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1433866.6666666667, ans=0.125 2023-12-24 01:31:17,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1433933.3333333333, ans=0.125 2023-12-24 01:31:18,053 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.528e+01 3.934e+01 4.117e+01 4.300e+01 4.914e+01, threshold=8.233e+01, percent-clipped=0.0 2023-12-24 01:31:20,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1433933.3333333333, ans=0.0 2023-12-24 01:31:25,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1434000.0, ans=0.2 2023-12-24 01:31:36,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1434066.6666666667, ans=0.5 2023-12-24 01:31:37,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1434066.6666666667, ans=0.125 2023-12-24 01:31:40,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1434066.6666666667, ans=0.125 2023-12-24 01:31:43,853 INFO [train.py:886] (1/4) Epoch 46, batch 650, loss[loss=0.01215, audio_tagging_loss=0.01215, over 25000.00 frames. ], tot_loss[loss=0.01117, audio_tagging_loss=0.01117, over 4760981.73 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:31:44,478 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.92 vs. limit=15.0 2023-12-24 01:31:49,912 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.82 vs. limit=15.0 2023-12-24 01:31:52,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1434200.0, ans=0.125 2023-12-24 01:31:57,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1434200.0, ans=0.0 2023-12-24 01:32:02,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1434266.6666666667, ans=0.125 2023-12-24 01:32:05,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1434266.6666666667, ans=0.09899494936611666 2023-12-24 01:32:18,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1434333.3333333333, ans=10.0 2023-12-24 01:32:25,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1434400.0, ans=0.0 2023-12-24 01:32:33,568 INFO [train.py:886] (1/4) Epoch 46, batch 700, loss[loss=0.01152, audio_tagging_loss=0.01152, over 25000.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4795249.77 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:32:35,057 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.70 vs. limit=15.0 2023-12-24 01:32:35,242 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.37 vs. limit=6.0 2023-12-24 01:32:46,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1434533.3333333333, ans=0.1 2023-12-24 01:32:58,780 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.569e+01 3.947e+01 4.093e+01 4.311e+01 5.149e+01, threshold=8.186e+01, percent-clipped=0.0 2023-12-24 01:33:01,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1434600.0, ans=0.125 2023-12-24 01:33:05,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1434666.6666666667, ans=0.0 2023-12-24 01:33:08,689 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.54 vs. limit=15.0 2023-12-24 01:33:10,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1434666.6666666667, ans=0.125 2023-12-24 01:33:15,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1434733.3333333333, ans=0.1 2023-12-24 01:33:15,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1434733.3333333333, ans=0.95 2023-12-24 01:33:25,068 INFO [train.py:886] (1/4) Epoch 46, batch 750, loss[loss=0.01137, audio_tagging_loss=0.01137, over 24750.00 frames. ], tot_loss[loss=0.01102, audio_tagging_loss=0.01102, over 4827152.23 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:33:25,645 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.10 vs. limit=15.0 2023-12-24 01:33:35,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1434866.6666666667, ans=0.0 2023-12-24 01:33:44,369 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.95 vs. limit=15.0 2023-12-24 01:34:00,064 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.22 vs. limit=15.0 2023-12-24 01:34:16,975 INFO [train.py:886] (1/4) Epoch 46, batch 800, loss[loss=0.01057, audio_tagging_loss=0.01057, over 25000.00 frames. ], tot_loss[loss=0.01098, audio_tagging_loss=0.01098, over 4854027.13 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:34:17,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1435133.3333333333, ans=0.0 2023-12-24 01:34:21,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1435133.3333333333, ans=0.125 2023-12-24 01:34:25,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1435133.3333333333, ans=0.1 2023-12-24 01:34:28,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1435200.0, ans=0.125 2023-12-24 01:34:30,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1435200.0, ans=0.2 2023-12-24 01:34:41,861 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.539e+01 3.875e+01 4.046e+01 4.240e+01 5.244e+01, threshold=8.092e+01, percent-clipped=0.0 2023-12-24 01:34:42,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1435266.6666666667, ans=0.125 2023-12-24 01:34:42,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1435266.6666666667, ans=0.0 2023-12-24 01:34:48,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1435333.3333333333, ans=0.125 2023-12-24 01:34:58,395 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=12.0 2023-12-24 01:34:59,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1435400.0, ans=0.0 2023-12-24 01:35:06,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1435400.0, ans=0.125 2023-12-24 01:35:08,501 INFO [train.py:886] (1/4) Epoch 46, batch 850, loss[loss=0.01193, audio_tagging_loss=0.01193, over 25000.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4878223.47 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:35:23,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1435533.3333333333, ans=0.125 2023-12-24 01:35:27,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1435533.3333333333, ans=0.125 2023-12-24 01:35:28,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1435533.3333333333, ans=0.0 2023-12-24 01:35:35,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1435600.0, ans=0.05 2023-12-24 01:35:37,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1435600.0, ans=0.2 2023-12-24 01:35:44,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1435666.6666666667, ans=0.125 2023-12-24 01:35:44,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=1435666.6666666667, ans=15.0 2023-12-24 01:35:50,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.11 vs. limit=15.0 2023-12-24 01:35:55,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1435733.3333333333, ans=0.2 2023-12-24 01:36:00,432 INFO [train.py:886] (1/4) Epoch 46, batch 900, loss[loss=0.01303, audio_tagging_loss=0.01303, over 24079.00 frames. ], tot_loss[loss=0.01101, audio_tagging_loss=0.01101, over 4891050.17 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:36:07,171 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:36:25,492 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.452e+01 3.869e+01 4.061e+01 4.225e+01 5.084e+01, threshold=8.122e+01, percent-clipped=0.0 2023-12-24 01:36:27,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1435933.3333333333, ans=0.125 2023-12-24 01:36:32,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1436000.0, ans=0.125 2023-12-24 01:36:50,212 INFO [train.py:886] (1/4) Epoch 46, batch 950, loss[loss=0.01096, audio_tagging_loss=0.01096, over 24750.00 frames. ], tot_loss[loss=0.01104, audio_tagging_loss=0.01104, over 4901362.73 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:36:52,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1436133.3333333333, ans=0.05 2023-12-24 01:37:12,534 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:37:14,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1436266.6666666667, ans=0.0 2023-12-24 01:37:42,098 INFO [train.py:886] (1/4) Epoch 46, batch 1000, loss[loss=0.01083, audio_tagging_loss=0.01083, over 24750.00 frames. ], tot_loss[loss=0.01104, audio_tagging_loss=0.01104, over 4904866.18 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:37:53,730 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.00 vs. limit=12.0 2023-12-24 01:37:53,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.38 vs. limit=10.0 2023-12-24 01:38:07,785 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.584e+01 3.871e+01 4.031e+01 4.254e+01 4.824e+01, threshold=8.061e+01, percent-clipped=0.0 2023-12-24 01:38:11,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1436666.6666666667, ans=0.2 2023-12-24 01:38:15,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1436666.6666666667, ans=0.0 2023-12-24 01:38:23,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1436733.3333333333, ans=0.2 2023-12-24 01:38:23,466 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.20 vs. limit=22.5 2023-12-24 01:38:25,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1436733.3333333333, ans=0.2 2023-12-24 01:38:28,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1436733.3333333333, ans=0.1 2023-12-24 01:38:32,890 INFO [train.py:886] (1/4) Epoch 46, batch 1050, loss[loss=0.01018, audio_tagging_loss=0.01018, over 25000.00 frames. ], tot_loss[loss=0.01096, audio_tagging_loss=0.01096, over 4919053.43 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:38:36,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1436800.0, ans=0.125 2023-12-24 01:38:42,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1436866.6666666667, ans=0.1 2023-12-24 01:38:43,816 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.66 vs. limit=15.0 2023-12-24 01:38:54,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1436933.3333333333, ans=0.1 2023-12-24 01:39:24,087 INFO [train.py:886] (1/4) Epoch 46, batch 1100, loss[loss=0.00863, audio_tagging_loss=0.00863, over 25000.00 frames. ], tot_loss[loss=0.01087, audio_tagging_loss=0.01087, over 4930270.61 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:39:36,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1437200.0, ans=0.125 2023-12-24 01:39:37,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1437200.0, ans=0.0 2023-12-24 01:39:49,746 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.427e+01 3.840e+01 4.057e+01 4.285e+01 6.085e+01, threshold=8.114e+01, percent-clipped=0.0 2023-12-24 01:40:14,858 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.05 vs. limit=15.0 2023-12-24 01:40:15,298 INFO [train.py:886] (1/4) Epoch 46, batch 1150, loss[loss=0.01337, audio_tagging_loss=0.01337, over 25000.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4929886.07 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:40:22,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1437466.6666666667, ans=0.125 2023-12-24 01:40:31,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1437533.3333333333, ans=0.125 2023-12-24 01:40:32,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1437533.3333333333, ans=0.125 2023-12-24 01:40:34,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1437600.0, ans=0.125 2023-12-24 01:40:36,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1437600.0, ans=0.2 2023-12-24 01:40:46,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1437666.6666666667, ans=0.125 2023-12-24 01:41:01,656 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:41:05,278 INFO [train.py:886] (1/4) Epoch 46, batch 1200, loss[loss=0.01058, audio_tagging_loss=0.01058, over 24750.00 frames. ], tot_loss[loss=0.01088, audio_tagging_loss=0.01088, over 4938240.75 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:41:08,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1437800.0, ans=0.125 2023-12-24 01:41:13,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1437800.0, ans=0.0 2023-12-24 01:41:14,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1437800.0, ans=0.125 2023-12-24 01:41:15,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1437866.6666666667, ans=0.1 2023-12-24 01:41:30,967 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.478e+01 3.918e+01 4.092e+01 4.253e+01 4.725e+01, threshold=8.185e+01, percent-clipped=0.0 2023-12-24 01:41:39,566 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.85 vs. limit=6.0 2023-12-24 01:41:43,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1438000.0, ans=0.0 2023-12-24 01:41:43,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1438000.0, ans=0.125 2023-12-24 01:41:57,101 INFO [train.py:886] (1/4) Epoch 46, batch 1250, loss[loss=0.01179, audio_tagging_loss=0.01179, over 25000.00 frames. ], tot_loss[loss=0.011, audio_tagging_loss=0.011, over 4933924.27 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:42:00,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=1438133.3333333333, ans=6.0 2023-12-24 01:42:22,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1438266.6666666667, ans=0.0 2023-12-24 01:42:30,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1438333.3333333333, ans=0.0 2023-12-24 01:42:44,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1438400.0, ans=0.09899494936611666 2023-12-24 01:42:47,430 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.77 vs. limit=15.0 2023-12-24 01:42:47,843 INFO [train.py:886] (1/4) Epoch 46, batch 1300, loss[loss=0.0118, audio_tagging_loss=0.0118, over 24750.00 frames. ], tot_loss[loss=0.01105, audio_tagging_loss=0.01105, over 4924920.36 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:43:01,538 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2023-12-24 01:43:09,421 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:43:14,369 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.645e+01 3.930e+01 4.058e+01 4.275e+01 4.949e+01, threshold=8.116e+01, percent-clipped=0.0 2023-12-24 01:43:32,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1438733.3333333333, ans=0.0 2023-12-24 01:43:33,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1438733.3333333333, ans=0.0 2023-12-24 01:43:36,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.42 vs. limit=15.0 2023-12-24 01:43:36,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1438733.3333333333, ans=0.125 2023-12-24 01:43:39,276 INFO [train.py:886] (1/4) Epoch 46, batch 1350, loss[loss=0.009436, audio_tagging_loss=0.009436, over 25000.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4931339.72 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:43:51,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1438866.6666666667, ans=10.0 2023-12-24 01:44:04,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1438933.3333333333, ans=0.1 2023-12-24 01:44:24,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1439066.6666666667, ans=0.04949747468305833 2023-12-24 01:44:25,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1439066.6666666667, ans=0.0 2023-12-24 01:44:31,780 INFO [train.py:886] (1/4) Epoch 46, batch 1400, loss[loss=0.009914, audio_tagging_loss=0.009914, over 24750.00 frames. ], tot_loss[loss=0.01089, audio_tagging_loss=0.01089, over 4933613.38 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 64.0 2023-12-24 01:44:56,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.49 vs. limit=22.5 2023-12-24 01:44:58,227 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.346e+01 3.871e+01 4.064e+01 4.207e+01 5.055e+01, threshold=8.128e+01, percent-clipped=0.0 2023-12-24 01:44:59,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1439266.6666666667, ans=0.0 2023-12-24 01:45:06,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1439333.3333333333, ans=0.0 2023-12-24 01:45:18,469 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.43 vs. limit=12.0 2023-12-24 01:45:24,845 INFO [train.py:886] (1/4) Epoch 46, batch 1450, loss[loss=0.01132, audio_tagging_loss=0.01132, over 25000.00 frames. ], tot_loss[loss=0.01085, audio_tagging_loss=0.01085, over 4936375.93 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:45:27,327 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.28 vs. limit=12.0 2023-12-24 01:45:33,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1439533.3333333333, ans=0.125 2023-12-24 01:46:14,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1439800.0, ans=0.125 2023-12-24 01:46:15,274 INFO [train.py:886] (1/4) Epoch 46, batch 1500, loss[loss=0.01297, audio_tagging_loss=0.01297, over 25000.00 frames. ], tot_loss[loss=0.01088, audio_tagging_loss=0.01088, over 4941326.47 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:46:17,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1439800.0, ans=0.0 2023-12-24 01:46:24,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.08 vs. limit=15.0 2023-12-24 01:46:31,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1439866.6666666667, ans=0.0 2023-12-24 01:46:32,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1439866.6666666667, ans=0.0 2023-12-24 01:46:32,749 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.46 vs. limit=22.5 2023-12-24 01:46:37,400 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.73 vs. limit=15.0 2023-12-24 01:46:37,410 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.79 vs. limit=15.0 2023-12-24 01:46:39,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1439933.3333333333, ans=0.125 2023-12-24 01:46:41,575 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.511e+01 3.911e+01 4.080e+01 4.273e+01 5.286e+01, threshold=8.160e+01, percent-clipped=0.0 2023-12-24 01:47:07,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1440066.6666666667, ans=0.07 2023-12-24 01:47:10,229 INFO [train.py:886] (1/4) Epoch 46, batch 1550, loss[loss=0.01226, audio_tagging_loss=0.01226, over 24750.00 frames. ], tot_loss[loss=0.01098, audio_tagging_loss=0.01098, over 4944326.57 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:47:11,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1440133.3333333333, ans=0.125 2023-12-24 01:47:47,499 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=15.0 2023-12-24 01:48:02,300 INFO [train.py:886] (1/4) Epoch 46, batch 1600, loss[loss=0.01293, audio_tagging_loss=0.01293, over 24750.00 frames. ], tot_loss[loss=0.01105, audio_tagging_loss=0.01105, over 4939588.68 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:48:26,623 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.459e+01 3.905e+01 4.113e+01 4.286e+01 4.788e+01, threshold=8.225e+01, percent-clipped=0.0 2023-12-24 01:48:26,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1440600.0, ans=0.125 2023-12-24 01:48:33,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1440666.6666666667, ans=0.0 2023-12-24 01:48:43,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1440733.3333333333, ans=0.125 2023-12-24 01:48:44,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=1440733.3333333333, ans=0.2 2023-12-24 01:48:52,882 INFO [train.py:886] (1/4) Epoch 46, batch 1650, loss[loss=0.01249, audio_tagging_loss=0.01249, over 25000.00 frames. ], tot_loss[loss=0.01106, audio_tagging_loss=0.01106, over 4941037.53 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:49:19,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.85 vs. limit=15.0 2023-12-24 01:49:23,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1441000.0, ans=0.125 2023-12-24 01:49:38,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1441066.6666666667, ans=0.1 2023-12-24 01:49:44,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1441066.6666666667, ans=0.125 2023-12-24 01:49:46,075 INFO [train.py:886] (1/4) Epoch 46, batch 1700, loss[loss=0.01013, audio_tagging_loss=0.01013, over 24750.00 frames. ], tot_loss[loss=0.01093, audio_tagging_loss=0.01093, over 4945354.96 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:50:01,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1441200.0, ans=0.125 2023-12-24 01:50:05,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1441266.6666666667, ans=0.125 2023-12-24 01:50:08,978 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.07 vs. limit=6.0 2023-12-24 01:50:11,913 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.496e+01 3.860e+01 4.005e+01 4.201e+01 5.398e+01, threshold=8.010e+01, percent-clipped=0.0 2023-12-24 01:50:17,021 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.64 vs. limit=15.0 2023-12-24 01:50:18,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1441333.3333333333, ans=0.125 2023-12-24 01:50:28,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=1441400.0, ans=0.2 2023-12-24 01:50:30,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1441400.0, ans=0.07 2023-12-24 01:50:31,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1441400.0, ans=0.125 2023-12-24 01:50:33,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1441400.0, ans=0.0 2023-12-24 01:50:36,430 INFO [train.py:886] (1/4) Epoch 46, batch 1750, loss[loss=0.009966, audio_tagging_loss=0.009966, over 25000.00 frames. ], tot_loss[loss=0.01084, audio_tagging_loss=0.01084, over 4943575.31 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:50:37,170 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.68 vs. limit=8.0 2023-12-24 01:50:39,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1441466.6666666667, ans=0.1 2023-12-24 01:50:41,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1441466.6666666667, ans=0.04949747468305833 2023-12-24 01:51:13,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1441666.6666666667, ans=0.0 2023-12-24 01:51:24,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1441733.3333333333, ans=0.125 2023-12-24 01:51:29,196 INFO [train.py:886] (1/4) Epoch 46, batch 1800, loss[loss=0.0116, audio_tagging_loss=0.0116, over 25000.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4952925.29 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:51:44,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1441866.6666666667, ans=0.1 2023-12-24 01:51:47,383 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.08 vs. limit=22.5 2023-12-24 01:51:50,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1441933.3333333333, ans=0.0 2023-12-24 01:51:55,677 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.556e+01 3.867e+01 4.060e+01 4.230e+01 5.187e+01, threshold=8.121e+01, percent-clipped=0.0 2023-12-24 01:51:55,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1441933.3333333333, ans=0.125 2023-12-24 01:52:05,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1442000.0, ans=0.125 2023-12-24 01:52:13,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1442066.6666666667, ans=0.0 2023-12-24 01:52:18,569 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=22.5 2023-12-24 01:52:20,842 INFO [train.py:886] (1/4) Epoch 46, batch 1850, loss[loss=0.008825, audio_tagging_loss=0.008825, over 24750.00 frames. ], tot_loss[loss=0.01084, audio_tagging_loss=0.01084, over 4953316.17 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:52:33,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.10 vs. limit=15.0 2023-12-24 01:52:40,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1442266.6666666667, ans=0.125 2023-12-24 01:52:58,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1442333.3333333333, ans=10.0 2023-12-24 01:53:00,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1442333.3333333333, ans=0.1 2023-12-24 01:53:06,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1442400.0, ans=0.2 2023-12-24 01:53:12,154 INFO [train.py:886] (1/4) Epoch 46, batch 1900, loss[loss=0.01178, audio_tagging_loss=0.01178, over 24750.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4948982.25 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:53:26,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1442533.3333333333, ans=0.125 2023-12-24 01:53:38,754 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.559e+01 3.923e+01 4.090e+01 4.316e+01 4.935e+01, threshold=8.180e+01, percent-clipped=0.0 2023-12-24 01:53:41,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1442600.0, ans=0.125 2023-12-24 01:53:45,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=1442666.6666666667, ans=10.0 2023-12-24 01:53:51,506 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2023-12-24 01:53:58,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1442733.3333333333, ans=0.1 2023-12-24 01:54:05,336 INFO [train.py:886] (1/4) Epoch 46, batch 1950, loss[loss=0.01309, audio_tagging_loss=0.01309, over 25000.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4952193.53 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:54:05,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1442800.0, ans=0.0 2023-12-24 01:54:09,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1442800.0, ans=0.125 2023-12-24 01:54:34,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1442933.3333333333, ans=0.1 2023-12-24 01:54:38,824 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:54:56,377 INFO [train.py:886] (1/4) Epoch 46, batch 2000, loss[loss=0.009166, audio_tagging_loss=0.009166, over 25000.00 frames. ], tot_loss[loss=0.01091, audio_tagging_loss=0.01091, over 4950629.17 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:55:01,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1443133.3333333333, ans=0.0 2023-12-24 01:55:05,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.89 vs. limit=15.0 2023-12-24 01:55:10,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1443200.0, ans=0.0 2023-12-24 01:55:12,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1443200.0, ans=0.125 2023-12-24 01:55:22,165 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.540e+01 3.855e+01 4.032e+01 4.223e+01 5.008e+01, threshold=8.064e+01, percent-clipped=0.0 2023-12-24 01:55:25,406 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.39 vs. limit=12.0 2023-12-24 01:55:30,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1443333.3333333333, ans=0.125 2023-12-24 01:55:45,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1443400.0, ans=0.035 2023-12-24 01:55:46,682 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.08 vs. limit=15.0 2023-12-24 01:55:48,851 INFO [train.py:886] (1/4) Epoch 46, batch 2050, loss[loss=0.01183, audio_tagging_loss=0.01183, over 25000.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4939555.09 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:55:50,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1443466.6666666667, ans=0.125 2023-12-24 01:55:56,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1443466.6666666667, ans=0.125 2023-12-24 01:55:56,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1443466.6666666667, ans=0.125 2023-12-24 01:56:02,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1443533.3333333333, ans=0.0 2023-12-24 01:56:12,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1443600.0, ans=0.1 2023-12-24 01:56:21,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1443666.6666666667, ans=0.0 2023-12-24 01:56:25,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1443666.6666666667, ans=0.07 2023-12-24 01:56:27,115 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.86 vs. limit=15.0 2023-12-24 01:56:32,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1443733.3333333333, ans=0.125 2023-12-24 01:56:41,057 INFO [train.py:886] (1/4) Epoch 46, batch 2100, loss[loss=0.01103, audio_tagging_loss=0.01103, over 25000.00 frames. ], tot_loss[loss=0.01089, audio_tagging_loss=0.01089, over 4951250.52 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:56:42,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1443800.0, ans=0.125 2023-12-24 01:57:02,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1443933.3333333333, ans=0.1 2023-12-24 01:57:05,944 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.524e+01 3.905e+01 4.022e+01 4.224e+01 4.545e+01, threshold=8.045e+01, percent-clipped=0.0 2023-12-24 01:57:08,367 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.69 vs. limit=15.0 2023-12-24 01:57:17,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1444000.0, ans=0.0 2023-12-24 01:57:32,019 INFO [train.py:886] (1/4) Epoch 46, batch 2150, loss[loss=0.01206, audio_tagging_loss=0.01206, over 24750.00 frames. ], tot_loss[loss=0.01091, audio_tagging_loss=0.01091, over 4948227.05 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:57:48,735 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.69 vs. limit=15.0 2023-12-24 01:58:02,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1444333.3333333333, ans=0.1 2023-12-24 01:58:20,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1444400.0, ans=0.125 2023-12-24 01:58:24,358 INFO [train.py:886] (1/4) Epoch 46, batch 2200, loss[loss=0.01227, audio_tagging_loss=0.01227, over 24750.00 frames. ], tot_loss[loss=0.01103, audio_tagging_loss=0.01103, over 4945072.23 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:58:27,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1444466.6666666667, ans=0.0 2023-12-24 01:58:28,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1444466.6666666667, ans=0.125 2023-12-24 01:58:31,487 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.23 vs. limit=22.5 2023-12-24 01:58:36,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1444533.3333333333, ans=0.125 2023-12-24 01:58:43,023 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:58:50,837 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.513e+01 3.958e+01 4.112e+01 4.314e+01 5.314e+01, threshold=8.224e+01, percent-clipped=0.0 2023-12-24 01:58:54,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1444666.6666666667, ans=0.0 2023-12-24 01:59:16,823 INFO [train.py:886] (1/4) Epoch 46, batch 2250, loss[loss=0.009463, audio_tagging_loss=0.009463, over 24750.00 frames. ], tot_loss[loss=0.01104, audio_tagging_loss=0.01104, over 4942410.61 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:59:18,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1444800.0, ans=0.2 2023-12-24 01:59:21,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1444800.0, ans=0.125 2023-12-24 01:59:39,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1444933.3333333333, ans=0.1 2023-12-24 02:00:08,442 INFO [train.py:886] (1/4) Epoch 46, batch 2300, loss[loss=0.01085, audio_tagging_loss=0.01085, over 25000.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4941197.89 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:00:09,583 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 02:00:12,671 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.64 vs. limit=10.0 2023-12-24 02:00:17,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1445200.0, ans=0.05 2023-12-24 02:00:23,475 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.04 vs. limit=15.0 2023-12-24 02:00:31,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.49 vs. limit=6.0 2023-12-24 02:00:34,262 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.386e+01 3.894e+01 4.073e+01 4.228e+01 5.336e+01, threshold=8.145e+01, percent-clipped=0.0 2023-12-24 02:00:41,356 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.99 vs. limit=15.0 2023-12-24 02:00:44,910 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.00 vs. limit=10.0 2023-12-24 02:01:00,766 INFO [train.py:886] (1/4) Epoch 46, batch 2350, loss[loss=0.009732, audio_tagging_loss=0.009732, over 24750.00 frames. ], tot_loss[loss=0.01092, audio_tagging_loss=0.01092, over 4945250.04 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:01:17,271 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.25 vs. limit=22.5 2023-12-24 02:01:25,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1445600.0, ans=0.125 2023-12-24 02:01:37,393 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.68 vs. limit=15.0 2023-12-24 02:01:43,325 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.31 vs. limit=10.0 2023-12-24 02:01:45,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1445733.3333333333, ans=0.2 2023-12-24 02:01:51,092 INFO [train.py:886] (1/4) Epoch 46, batch 2400, loss[loss=0.01024, audio_tagging_loss=0.01024, over 21464.00 frames. ], tot_loss[loss=0.01088, audio_tagging_loss=0.01088, over 4951308.85 frames. ], batch size: 107, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:01:51,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1445800.0, ans=0.0 2023-12-24 02:02:09,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1445866.6666666667, ans=0.125 2023-12-24 02:02:16,979 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.495e+01 3.926e+01 4.069e+01 4.266e+01 5.020e+01, threshold=8.138e+01, percent-clipped=0.0 2023-12-24 02:02:17,446 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.83 vs. limit=22.5 2023-12-24 02:02:29,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1446000.0, ans=0.125 2023-12-24 02:02:34,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1446066.6666666667, ans=0.2 2023-12-24 02:02:39,885 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.63 vs. limit=15.0 2023-12-24 02:02:43,285 INFO [train.py:886] (1/4) Epoch 46, batch 2450, loss[loss=0.008418, audio_tagging_loss=0.008418, over 23998.00 frames. ], tot_loss[loss=0.01089, audio_tagging_loss=0.01089, over 4952057.11 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:02:55,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1446200.0, ans=0.1 2023-12-24 02:03:21,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1446333.3333333333, ans=0.125 2023-12-24 02:03:25,716 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.16 vs. limit=15.0 2023-12-24 02:03:29,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=1446400.0, ans=10.0 2023-12-24 02:03:35,351 INFO [train.py:886] (1/4) Epoch 46, batch 2500, loss[loss=0.01486, audio_tagging_loss=0.01486, over 24750.00 frames. ], tot_loss[loss=0.011, audio_tagging_loss=0.011, over 4953935.83 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:03:36,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1446466.6666666667, ans=0.125 2023-12-24 02:03:42,271 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.89 vs. limit=15.0 2023-12-24 02:03:57,300 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2023-12-24 02:04:00,419 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.680e+01 3.970e+01 4.120e+01 4.239e+01 5.060e+01, threshold=8.241e+01, percent-clipped=0.0 2023-12-24 02:04:03,818 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.79 vs. limit=22.5 2023-12-24 02:04:25,317 INFO [train.py:886] (1/4) Epoch 46, batch 2550, loss[loss=0.01074, audio_tagging_loss=0.01074, over 24052.00 frames. ], tot_loss[loss=0.011, audio_tagging_loss=0.011, over 4952910.82 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:04:28,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1446800.0, ans=0.125 2023-12-24 02:04:29,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1446800.0, ans=0.1 2023-12-24 02:04:33,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1446800.0, ans=0.125 2023-12-24 02:04:50,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1446933.3333333333, ans=0.1 2023-12-24 02:05:05,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.73 vs. limit=22.5 2023-12-24 02:05:10,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.26 vs. limit=22.5 2023-12-24 02:05:13,097 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.17 vs. limit=15.0 2023-12-24 02:05:16,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.40 vs. limit=15.0 2023-12-24 02:05:18,331 INFO [train.py:886] (1/4) Epoch 46, batch 2600, loss[loss=0.0105, audio_tagging_loss=0.0105, over 24750.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4949504.73 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:05:29,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1447200.0, ans=0.125 2023-12-24 02:05:44,710 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.567e+01 3.903e+01 4.068e+01 4.253e+01 4.776e+01, threshold=8.137e+01, percent-clipped=0.0 2023-12-24 02:05:50,920 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.38 vs. limit=15.0 2023-12-24 02:05:59,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1447400.0, ans=0.95 2023-12-24 02:06:00,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1447400.0, ans=0.0 2023-12-24 02:06:01,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1447400.0, ans=0.0 2023-12-24 02:06:07,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1447400.0, ans=0.125 2023-12-24 02:06:09,964 INFO [train.py:886] (1/4) Epoch 46, batch 2650, loss[loss=0.009863, audio_tagging_loss=0.009863, over 24750.00 frames. ], tot_loss[loss=0.01098, audio_tagging_loss=0.01098, over 4955264.74 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:06:17,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1447466.6666666667, ans=0.0 2023-12-24 02:06:18,583 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.68 vs. limit=15.0 2023-12-24 02:06:25,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1447533.3333333333, ans=0.1 2023-12-24 02:06:40,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1447666.6666666667, ans=0.125 2023-12-24 02:06:47,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1447666.6666666667, ans=0.1 2023-12-24 02:06:51,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.13 vs. limit=15.0 2023-12-24 02:06:57,000 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 02:06:58,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1447733.3333333333, ans=0.125 2023-12-24 02:06:58,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1447733.3333333333, ans=0.0 2023-12-24 02:07:01,527 INFO [train.py:886] (1/4) Epoch 46, batch 2700, loss[loss=0.0129, audio_tagging_loss=0.0129, over 25000.00 frames. ], tot_loss[loss=0.01089, audio_tagging_loss=0.01089, over 4953751.55 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:07:01,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1447800.0, ans=0.125 2023-12-24 02:07:15,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1447866.6666666667, ans=0.0 2023-12-24 02:07:21,520 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 02:07:22,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1447933.3333333333, ans=0.125 2023-12-24 02:07:27,935 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.425e+01 3.925e+01 4.049e+01 4.308e+01 4.721e+01, threshold=8.099e+01, percent-clipped=0.0 2023-12-24 02:07:35,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1448000.0, ans=0.2 2023-12-24 02:07:53,863 INFO [train.py:886] (1/4) Epoch 46, batch 2750, loss[loss=0.01025, audio_tagging_loss=0.01025, over 25000.00 frames. ], tot_loss[loss=0.01083, audio_tagging_loss=0.01083, over 4955576.44 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:08:27,728 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.45 vs. limit=15.0 2023-12-24 02:08:28,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1448333.3333333333, ans=0.125 2023-12-24 02:08:32,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1448333.3333333333, ans=0.0 2023-12-24 02:08:32,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1448333.3333333333, ans=0.125 2023-12-24 02:08:43,414 INFO [train.py:886] (1/4) Epoch 46, batch 2800, loss[loss=0.01153, audio_tagging_loss=0.01153, over 24750.00 frames. ], tot_loss[loss=0.01084, audio_tagging_loss=0.01084, over 4952426.34 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:08:48,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1448466.6666666667, ans=0.125 2023-12-24 02:09:00,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1448533.3333333333, ans=0.125 2023-12-24 02:09:09,792 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.523e+01 3.906e+01 4.083e+01 4.345e+01 5.056e+01, threshold=8.167e+01, percent-clipped=0.0 2023-12-24 02:09:13,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1448666.6666666667, ans=0.125 2023-12-24 02:09:19,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1448666.6666666667, ans=0.0 2023-12-24 02:09:21,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1448666.6666666667, ans=0.1 2023-12-24 02:09:22,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1448666.6666666667, ans=0.125 2023-12-24 02:09:35,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1448800.0, ans=0.0 2023-12-24 02:09:36,215 INFO [train.py:886] (1/4) Epoch 46, batch 2850, loss[loss=0.01097, audio_tagging_loss=0.01097, over 25000.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4947584.82 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:09:48,678 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=12.0 2023-12-24 02:10:05,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1448933.3333333333, ans=0.025 2023-12-24 02:10:09,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1449000.0, ans=0.125 2023-12-24 02:10:10,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1449000.0, ans=0.0 2023-12-24 02:10:25,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1449066.6666666667, ans=0.125 2023-12-24 02:10:28,356 INFO [train.py:886] (1/4) Epoch 46, batch 2900, loss[loss=0.01241, audio_tagging_loss=0.01241, over 25000.00 frames. ], tot_loss[loss=0.01085, audio_tagging_loss=0.01085, over 4947488.91 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:10:34,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1449133.3333333333, ans=0.0 2023-12-24 02:10:38,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2023-12-24 02:10:40,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1449200.0, ans=0.125 2023-12-24 02:10:53,597 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.454e+01 3.893e+01 4.087e+01 4.310e+01 5.363e+01, threshold=8.174e+01, percent-clipped=0.0 2023-12-24 02:11:12,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1449400.0, ans=0.125 2023-12-24 02:11:19,442 INFO [train.py:886] (1/4) Epoch 46, batch 2950, loss[loss=0.01117, audio_tagging_loss=0.01117, over 24750.00 frames. ], tot_loss[loss=0.01081, audio_tagging_loss=0.01081, over 4951052.62 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:11:24,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1449466.6666666667, ans=0.1 2023-12-24 02:11:33,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=1449533.3333333333, ans=0.1 2023-12-24 02:12:12,486 INFO [train.py:886] (1/4) Epoch 46, batch 3000, loss[loss=0.009912, audio_tagging_loss=0.009912, over 25000.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4954440.64 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:12:12,486 INFO [train.py:909] (1/4) Computing validation loss 2023-12-24 02:12:19,985 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.6213, 3.7487, 3.4632, 3.2700], device='cuda:1') 2023-12-24 02:12:34,123 INFO [train.py:917] (1/4) Epoch 46, validation: loss=0.03679, audio_tagging_loss=0.03679, over 3737520.00 frames. 2023-12-24 02:12:34,124 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-24 02:12:34,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.32 vs. limit=6.0 2023-12-24 02:12:54,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1449933.3333333333, ans=0.125 2023-12-24 02:12:58,522 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.409e+01 3.890e+01 4.114e+01 4.303e+01 5.269e+01, threshold=8.229e+01, percent-clipped=0.0 2023-12-24 02:13:08,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1450000.0, ans=0.1 2023-12-24 02:13:25,003 INFO [train.py:886] (1/4) Epoch 46, batch 3050, loss[loss=0.01149, audio_tagging_loss=0.01149, over 25000.00 frames. ], tot_loss[loss=0.0108, audio_tagging_loss=0.0108, over 4956490.52 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:13:33,685 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2023-12-24 02:13:44,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1450200.0, ans=0.1 2023-12-24 02:13:55,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1450333.3333333333, ans=0.1 2023-12-24 02:13:58,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1450333.3333333333, ans=0.125 2023-12-24 02:13:58,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1450333.3333333333, ans=0.125 2023-12-24 02:14:05,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1450400.0, ans=0.05 2023-12-24 02:14:11,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1450400.0, ans=0.125 2023-12-24 02:14:16,879 INFO [train.py:886] (1/4) Epoch 46, batch 3100, loss[loss=0.009925, audio_tagging_loss=0.009925, over 24750.00 frames. ], tot_loss[loss=0.01081, audio_tagging_loss=0.01081, over 4959175.19 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:14:22,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1450466.6666666667, ans=0.0 2023-12-24 02:14:25,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1450533.3333333333, ans=0.125 2023-12-24 02:14:38,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1450600.0, ans=0.2 2023-12-24 02:14:41,856 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.614e+01 3.922e+01 4.127e+01 4.313e+01 5.087e+01, threshold=8.254e+01, percent-clipped=0.0 2023-12-24 02:14:42,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.27 vs. limit=15.0 2023-12-24 02:14:50,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1450666.6666666667, ans=0.07 2023-12-24 02:15:07,069 INFO [train.py:886] (1/4) Epoch 46, batch 3150, loss[loss=0.0127, audio_tagging_loss=0.0127, over 24944.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4946132.89 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:15:10,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1450800.0, ans=0.1 2023-12-24 02:15:19,592 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.72 vs. limit=22.5 2023-12-24 02:15:40,384 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 02:15:58,402 INFO [train.py:886] (1/4) Epoch 46, batch 3200, loss[loss=0.009797, audio_tagging_loss=0.009797, over 25000.00 frames. ], tot_loss[loss=0.01092, audio_tagging_loss=0.01092, over 4946782.38 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:15:59,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1451133.3333333333, ans=0.0 2023-12-24 02:16:24,225 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.595e+01 3.931e+01 4.109e+01 4.308e+01 5.073e+01, threshold=8.218e+01, percent-clipped=0.0 2023-12-24 02:16:32,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1451333.3333333333, ans=0.0 2023-12-24 02:16:50,751 INFO [train.py:886] (1/4) Epoch 46, batch 3250, loss[loss=0.01127, audio_tagging_loss=0.01127, over 21359.00 frames. ], tot_loss[loss=0.01089, audio_tagging_loss=0.01089, over 4946551.24 frames. ], batch size: 107, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:16:52,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1451466.6666666667, ans=0.125 2023-12-24 02:16:53,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1451466.6666666667, ans=0.04949747468305833 2023-12-24 02:17:02,573 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.45 vs. limit=15.0 2023-12-24 02:17:06,981 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.39 vs. limit=12.0 2023-12-24 02:17:07,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1451533.3333333333, ans=0.125 2023-12-24 02:17:23,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1451666.6666666667, ans=0.0 2023-12-24 02:17:23,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1451666.6666666667, ans=0.2 2023-12-24 02:17:41,156 INFO [train.py:886] (1/4) Epoch 46, batch 3300, loss[loss=0.01219, audio_tagging_loss=0.01219, over 25000.00 frames. ], tot_loss[loss=0.01089, audio_tagging_loss=0.01089, over 4948465.08 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:17:50,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1451800.0, ans=0.0 2023-12-24 02:17:53,321 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.19 vs. limit=15.0 2023-12-24 02:18:04,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1451933.3333333333, ans=0.1 2023-12-24 02:18:06,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1451933.3333333333, ans=0.125 2023-12-24 02:18:07,485 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.526e+01 3.879e+01 4.032e+01 4.165e+01 5.063e+01, threshold=8.064e+01, percent-clipped=0.0 2023-12-24 02:18:33,750 INFO [train.py:886] (1/4) Epoch 46, batch 3350, loss[loss=0.01172, audio_tagging_loss=0.01172, over 25000.00 frames. ], tot_loss[loss=0.01088, audio_tagging_loss=0.01088, over 4952693.13 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:18:53,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1452266.6666666667, ans=0.05 2023-12-24 02:18:58,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1452266.6666666667, ans=0.125 2023-12-24 02:19:11,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1452333.3333333333, ans=0.0 2023-12-24 02:19:19,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1452400.0, ans=0.125 2023-12-24 02:19:25,267 INFO [train.py:886] (1/4) Epoch 46, batch 3400, loss[loss=0.0105, audio_tagging_loss=0.0105, over 25000.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4957527.16 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:19:44,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1452533.3333333333, ans=0.1 2023-12-24 02:19:50,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1452600.0, ans=0.1 2023-12-24 02:19:51,846 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.591e+01 3.965e+01 4.111e+01 4.276e+01 8.253e+01, threshold=8.223e+01, percent-clipped=1.0 2023-12-24 02:19:54,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1452600.0, ans=0.1 2023-12-24 02:20:01,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1452666.6666666667, ans=0.05 2023-12-24 02:20:06,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1452733.3333333333, ans=0.04949747468305833 2023-12-24 02:20:07,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1452733.3333333333, ans=0.0 2023-12-24 02:20:17,510 INFO [train.py:886] (1/4) Epoch 46, batch 3450, loss[loss=0.01237, audio_tagging_loss=0.01237, over 24750.00 frames. ], tot_loss[loss=0.01101, audio_tagging_loss=0.01101, over 4952314.57 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:20:19,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1452800.0, ans=0.1 2023-12-24 02:20:30,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1452866.6666666667, ans=0.2 2023-12-24 02:21:09,814 INFO [train.py:886] (1/4) Epoch 46, batch 3500, loss[loss=0.01168, audio_tagging_loss=0.01168, over 24750.00 frames. ], tot_loss[loss=0.01106, audio_tagging_loss=0.01106, over 4944339.59 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:21:29,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1453266.6666666667, ans=0.125 2023-12-24 02:21:34,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1453266.6666666667, ans=0.07 2023-12-24 02:21:37,138 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.445e+01 3.883e+01 4.040e+01 4.247e+01 5.009e+01, threshold=8.081e+01, percent-clipped=0.0 2023-12-24 02:21:37,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1453266.6666666667, ans=0.1 2023-12-24 02:21:44,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1453333.3333333333, ans=0.0 2023-12-24 02:21:48,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1453333.3333333333, ans=0.0 2023-12-24 02:21:48,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1453333.3333333333, ans=0.1 2023-12-24 02:21:52,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1453400.0, ans=0.0 2023-12-24 02:22:01,502 INFO [train.py:886] (1/4) Epoch 46, batch 3550, loss[loss=0.00887, audio_tagging_loss=0.00887, over 25000.00 frames. ], tot_loss[loss=0.01096, audio_tagging_loss=0.01096, over 4944928.56 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:22:25,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1453600.0, ans=0.2 2023-12-24 02:22:34,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1453666.6666666667, ans=0.125 2023-12-24 02:22:39,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1453666.6666666667, ans=0.125 2023-12-24 02:22:39,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1453666.6666666667, ans=0.0 2023-12-24 02:22:53,312 INFO [train.py:886] (1/4) Epoch 46, batch 3600, loss[loss=0.01278, audio_tagging_loss=0.01278, over 25000.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4950049.89 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:22:53,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1453800.0, ans=0.2 2023-12-24 02:23:14,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1453933.3333333333, ans=0.015 2023-12-24 02:23:18,193 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.27 vs. limit=15.0 2023-12-24 02:23:19,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1453933.3333333333, ans=0.125 2023-12-24 02:23:20,531 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.432e+01 3.928e+01 4.098e+01 4.249e+01 6.702e+01, threshold=8.195e+01, percent-clipped=0.0 2023-12-24 02:23:29,542 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.41 vs. limit=15.0 2023-12-24 02:23:31,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1454000.0, ans=0.0 2023-12-24 02:23:35,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1454066.6666666667, ans=0.125 2023-12-24 02:23:38,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1454066.6666666667, ans=0.07 2023-12-24 02:23:43,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1454066.6666666667, ans=0.0 2023-12-24 02:23:46,092 INFO [train.py:886] (1/4) Epoch 46, batch 3650, loss[loss=0.01083, audio_tagging_loss=0.01083, over 25000.00 frames. ], tot_loss[loss=0.01088, audio_tagging_loss=0.01088, over 4957608.43 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:24:04,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1454266.6666666667, ans=0.0 2023-12-24 02:24:36,125 INFO [train.py:886] (1/4) Epoch 46, batch 3700, loss[loss=0.01136, audio_tagging_loss=0.01136, over 25000.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4959470.12 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:24:52,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1454533.3333333333, ans=0.0 2023-12-24 02:24:54,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1454533.3333333333, ans=0.125 2023-12-24 02:25:03,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=1454600.0, ans=15.0 2023-12-24 02:25:03,712 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.626e+01 3.918e+01 4.062e+01 4.194e+01 4.815e+01, threshold=8.124e+01, percent-clipped=0.0 2023-12-24 02:25:10,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1454666.6666666667, ans=0.125 2023-12-24 02:25:17,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1454733.3333333333, ans=0.125 2023-12-24 02:25:20,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1454733.3333333333, ans=0.1 2023-12-24 02:25:20,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1454733.3333333333, ans=0.04949747468305833 2023-12-24 02:25:25,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1454733.3333333333, ans=0.0 2023-12-24 02:25:27,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1454733.3333333333, ans=0.0 2023-12-24 02:25:28,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1454800.0, ans=10.0 2023-12-24 02:25:29,637 INFO [train.py:886] (1/4) Epoch 46, batch 3750, loss[loss=0.01185, audio_tagging_loss=0.01185, over 24750.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4956325.21 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:26:01,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1455000.0, ans=0.1 2023-12-24 02:26:03,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1455000.0, ans=0.0 2023-12-24 02:26:04,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1455000.0, ans=0.2 2023-12-24 02:26:07,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1455000.0, ans=0.125 2023-12-24 02:26:20,515 INFO [train.py:886] (1/4) Epoch 46, batch 3800, loss[loss=0.01071, audio_tagging_loss=0.01071, over 24750.00 frames. ], tot_loss[loss=0.01088, audio_tagging_loss=0.01088, over 4949816.64 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:26:39,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1455200.0, ans=0.125 2023-12-24 02:26:45,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1455266.6666666667, ans=0.125 2023-12-24 02:26:46,645 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.525e+01 3.960e+01 4.079e+01 4.273e+01 4.996e+01, threshold=8.158e+01, percent-clipped=0.0 2023-12-24 02:26:57,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1455333.3333333333, ans=0.0 2023-12-24 02:26:59,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1455333.3333333333, ans=0.1 2023-12-24 02:27:01,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1455400.0, ans=0.0 2023-12-24 02:27:02,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1455400.0, ans=0.125 2023-12-24 02:27:02,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1455400.0, ans=0.1 2023-12-24 02:27:11,916 INFO [train.py:886] (1/4) Epoch 46, batch 3850, loss[loss=0.01177, audio_tagging_loss=0.01177, over 24053.00 frames. ], tot_loss[loss=0.01083, audio_tagging_loss=0.01083, over 4946852.21 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:27:12,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1455466.6666666667, ans=0.125 2023-12-24 02:27:13,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1455466.6666666667, ans=0.125 2023-12-24 02:27:17,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1455466.6666666667, ans=0.0 2023-12-24 02:27:18,002 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.46 vs. limit=6.0 2023-12-24 02:27:18,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1455466.6666666667, ans=0.1 2023-12-24 02:27:45,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1455666.6666666667, ans=0.125 2023-12-24 02:28:03,998 INFO [train.py:886] (1/4) Epoch 46, batch 3900, loss[loss=0.009445, audio_tagging_loss=0.009445, over 24750.00 frames. ], tot_loss[loss=0.01077, audio_tagging_loss=0.01077, over 4952074.92 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:28:04,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1455800.0, ans=0.125 2023-12-24 02:28:13,885 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 02:28:14,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1455866.6666666667, ans=0.0 2023-12-24 02:28:30,031 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.541e+01 3.894e+01 4.050e+01 4.339e+01 5.039e+01, threshold=8.100e+01, percent-clipped=0.0 2023-12-24 02:28:47,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1456066.6666666667, ans=0.0 2023-12-24 02:28:54,397 INFO [train.py:886] (1/4) Epoch 46, batch 3950, loss[loss=0.009833, audio_tagging_loss=0.009833, over 25000.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4951885.98 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:28:55,936 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.61 vs. limit=15.0 2023-12-24 02:29:00,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1456133.3333333333, ans=0.0 2023-12-24 02:29:25,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1456333.3333333333, ans=0.125 2023-12-24 02:29:32,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1456333.3333333333, ans=0.0 2023-12-24 02:29:34,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1456400.0, ans=0.0 2023-12-24 02:29:46,383 INFO [train.py:886] (1/4) Epoch 46, batch 4000, loss[loss=0.01246, audio_tagging_loss=0.01246, over 25000.00 frames. ], tot_loss[loss=0.01078, audio_tagging_loss=0.01078, over 4955351.52 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:29:55,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1456533.3333333333, ans=0.125 2023-12-24 02:30:10,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1456600.0, ans=0.125 2023-12-24 02:30:13,630 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.650e+01 3.977e+01 4.098e+01 4.271e+01 5.184e+01, threshold=8.196e+01, percent-clipped=0.0 2023-12-24 02:30:19,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1456666.6666666667, ans=0.2 2023-12-24 02:30:37,714 INFO [train.py:886] (1/4) Epoch 46, batch 4050, loss[loss=0.008983, audio_tagging_loss=0.008983, over 24750.00 frames. ], tot_loss[loss=0.01082, audio_tagging_loss=0.01082, over 4955084.02 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:31:28,349 INFO [train.py:886] (1/4) Epoch 46, batch 4100, loss[loss=0.01167, audio_tagging_loss=0.01167, over 24007.00 frames. ], tot_loss[loss=0.01093, audio_tagging_loss=0.01093, over 4949361.95 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:31:30,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1457133.3333333333, ans=0.2 2023-12-24 02:31:54,636 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.42 vs. limit=15.0 2023-12-24 02:31:55,139 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.551e+01 3.971e+01 4.093e+01 4.225e+01 5.395e+01, threshold=8.186e+01, percent-clipped=0.0 2023-12-24 02:31:59,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1457333.3333333333, ans=0.025 2023-12-24 02:32:20,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1457466.6666666667, ans=0.2 2023-12-24 02:32:21,001 INFO [train.py:886] (1/4) Epoch 46, batch 4150, loss[loss=0.01294, audio_tagging_loss=0.01294, over 25000.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4944685.99 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:32:35,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1457533.3333333333, ans=0.5 2023-12-24 02:32:40,353 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.58 vs. limit=22.5 2023-12-24 02:32:43,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1457600.0, ans=0.125 2023-12-24 02:32:56,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.98 vs. limit=12.0 2023-12-24 02:32:58,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.07 vs. limit=15.0 2023-12-24 02:33:10,868 INFO [train.py:886] (1/4) Epoch 46, batch 4200, loss[loss=0.0108, audio_tagging_loss=0.0108, over 25000.00 frames. ], tot_loss[loss=0.01087, audio_tagging_loss=0.01087, over 4942241.38 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:33:28,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1457866.6666666667, ans=0.2 2023-12-24 02:33:33,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1457933.3333333333, ans=0.1 2023-12-24 02:33:38,203 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.408e+01 3.862e+01 4.047e+01 4.185e+01 5.649e+01, threshold=8.095e+01, percent-clipped=0.0 2023-12-24 02:33:46,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1458000.0, ans=0.2 2023-12-24 02:33:49,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1458000.0, ans=0.05 2023-12-24 02:34:03,964 INFO [train.py:886] (1/4) Epoch 46, batch 4250, loss[loss=0.009644, audio_tagging_loss=0.009644, over 24750.00 frames. ], tot_loss[loss=0.01077, audio_tagging_loss=0.01077, over 4940833.20 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:34:11,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.30 vs. limit=6.0 2023-12-24 02:34:18,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1458200.0, ans=0.125 2023-12-24 02:34:55,817 INFO [train.py:886] (1/4) Epoch 46, batch 4300, loss[loss=0.009922, audio_tagging_loss=0.009922, over 24750.00 frames. ], tot_loss[loss=0.01076, audio_tagging_loss=0.01076, over 4944656.27 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:35:04,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1458533.3333333333, ans=0.125 2023-12-24 02:35:08,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1458533.3333333333, ans=0.5 2023-12-24 02:35:12,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1458533.3333333333, ans=0.035 2023-12-24 02:35:21,230 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.535e+01 3.901e+01 4.132e+01 4.342e+01 5.346e+01, threshold=8.265e+01, percent-clipped=0.0 2023-12-24 02:35:23,316 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.04 vs. limit=15.0 2023-12-24 02:35:24,946 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.33 vs. limit=10.0 2023-12-24 02:35:31,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1458666.6666666667, ans=0.0 2023-12-24 02:35:33,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1458666.6666666667, ans=0.125 2023-12-24 02:35:42,512 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.38 vs. limit=22.5 2023-12-24 02:35:43,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1458733.3333333333, ans=0.2 2023-12-24 02:35:46,813 INFO [train.py:886] (1/4) Epoch 46, batch 4350, loss[loss=0.01114, audio_tagging_loss=0.01114, over 24750.00 frames. ], tot_loss[loss=0.0108, audio_tagging_loss=0.0108, over 4945571.57 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:35:51,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1458800.0, ans=0.1 2023-12-24 02:36:00,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1458866.6666666667, ans=0.0 2023-12-24 02:36:06,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1458866.6666666667, ans=0.125 2023-12-24 02:36:34,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1459066.6666666667, ans=0.125 2023-12-24 02:36:39,148 INFO [train.py:886] (1/4) Epoch 46, batch 4400, loss[loss=0.01318, audio_tagging_loss=0.01318, over 24938.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4948271.78 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:36:55,840 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.33 vs. limit=5.0 2023-12-24 02:37:05,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1459266.6666666667, ans=0.125 2023-12-24 02:37:06,113 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.527e+01 3.966e+01 4.108e+01 4.313e+01 4.794e+01, threshold=8.216e+01, percent-clipped=0.0 2023-12-24 02:37:09,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1459333.3333333333, ans=0.0 2023-12-24 02:37:12,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1459333.3333333333, ans=0.125 2023-12-24 02:37:26,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1459400.0, ans=0.1 2023-12-24 02:37:30,588 INFO [train.py:886] (1/4) Epoch 46, batch 4450, loss[loss=0.01238, audio_tagging_loss=0.01238, over 24750.00 frames. ], tot_loss[loss=0.01098, audio_tagging_loss=0.01098, over 4945728.15 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:37:53,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1459600.0, ans=0.125 2023-12-24 02:37:57,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1459600.0, ans=0.025 2023-12-24 02:38:11,824 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.84 vs. limit=22.5 2023-12-24 02:38:22,188 INFO [train.py:886] (1/4) Epoch 46, batch 4500, loss[loss=0.009385, audio_tagging_loss=0.009385, over 25000.00 frames. ], tot_loss[loss=0.01092, audio_tagging_loss=0.01092, over 4953133.88 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:38:26,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1459800.0, ans=0.125 2023-12-24 02:38:26,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1459800.0, ans=0.0 2023-12-24 02:38:35,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1459866.6666666667, ans=0.125 2023-12-24 02:38:49,731 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.485e+01 3.900e+01 4.113e+01 4.259e+01 4.782e+01, threshold=8.226e+01, percent-clipped=0.0 2023-12-24 02:38:50,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1459933.3333333333, ans=0.0 2023-12-24 02:39:00,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1460000.0, ans=0.125 2023-12-24 02:39:14,698 INFO [train.py:886] (1/4) Epoch 46, batch 4550, loss[loss=0.01056, audio_tagging_loss=0.01056, over 25000.00 frames. ], tot_loss[loss=0.01082, audio_tagging_loss=0.01082, over 4953997.40 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 32.0 2023-12-24 02:39:24,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1460200.0, ans=0.125 2023-12-24 02:39:35,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1460266.6666666667, ans=0.2 2023-12-24 02:39:47,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1460333.3333333333, ans=0.0 2023-12-24 02:39:54,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1460400.0, ans=0.1 2023-12-24 02:40:05,495 INFO [train.py:886] (1/4) Epoch 46, batch 4600, loss[loss=0.01181, audio_tagging_loss=0.01181, over 25000.00 frames. ], tot_loss[loss=0.01087, audio_tagging_loss=0.01087, over 4956098.33 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 32.0 2023-12-24 02:40:19,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1460533.3333333333, ans=0.125 2023-12-24 02:40:19,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1460533.3333333333, ans=0.2 2023-12-24 02:40:33,137 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.440e+01 3.978e+01 4.125e+01 4.323e+01 5.544e+01, threshold=8.249e+01, percent-clipped=0.0 2023-12-24 02:40:39,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1460666.6666666667, ans=0.0 2023-12-24 02:40:57,032 INFO [train.py:886] (1/4) Epoch 46, batch 4650, loss[loss=0.008851, audio_tagging_loss=0.008851, over 22367.00 frames. ], tot_loss[loss=0.01086, audio_tagging_loss=0.01086, over 4955338.72 frames. ], batch size: 107, lr: 2.32e-03, grad_scale: 32.0 2023-12-24 02:40:57,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1460800.0, ans=0.0 2023-12-24 02:40:58,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1460800.0, ans=0.125 2023-12-24 02:41:01,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1460800.0, ans=0.1 2023-12-24 02:41:41,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1461066.6666666667, ans=0.125 2023-12-24 02:41:46,467 INFO [train.py:886] (1/4) Epoch 46, batch 4700, loss[loss=0.009672, audio_tagging_loss=0.009672, over 24750.00 frames. ], tot_loss[loss=0.01092, audio_tagging_loss=0.01092, over 4948609.39 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 32.0 2023-12-24 02:41:58,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1461200.0, ans=0.2 2023-12-24 02:42:01,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1461200.0, ans=0.2 2023-12-24 02:42:12,851 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.719e+01 3.992e+01 4.134e+01 4.373e+01 5.124e+01, threshold=8.269e+01, percent-clipped=0.0 2023-12-24 02:42:17,257 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.73 vs. limit=22.5 2023-12-24 02:42:27,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1461400.0, ans=0.125 2023-12-24 02:42:34,330 INFO [train.py:886] (1/4) Epoch 46, batch 4750, loss[loss=0.00968, audio_tagging_loss=0.00968, over 24750.00 frames. ], tot_loss[loss=0.011, audio_tagging_loss=0.011, over 4941720.65 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 32.0 2023-12-24 02:42:38,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1461466.6666666667, ans=0.0 2023-12-24 02:42:42,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1461533.3333333333, ans=0.125 2023-12-24 02:43:10,060 INFO [train.py:886] (1/4) Epoch 47, batch 0, loss[loss=0.02111, audio_tagging_loss=0.02111, over 23988.00 frames. ], tot_loss[loss=0.02111, audio_tagging_loss=0.02111, over 23988.00 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:43:10,061 INFO [train.py:909] (1/4) Computing validation loss 2023-12-24 02:43:20,372 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.6532, 4.0149, 4.1995, 3.9283], device='cuda:1') 2023-12-24 02:43:30,566 INFO [train.py:917] (1/4) Epoch 47, validation: loss=0.0358, audio_tagging_loss=0.0358, over 3737520.00 frames. 2023-12-24 02:43:30,567 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-24 02:43:32,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1461573.3333333333, ans=0.035 2023-12-24 02:43:41,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1461640.0, ans=0.125 2023-12-24 02:43:41,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1461640.0, ans=0.125 2023-12-24 02:43:44,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1461640.0, ans=0.2 2023-12-24 02:43:50,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1461706.6666666667, ans=0.2 2023-12-24 02:44:05,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1461773.3333333333, ans=0.1 2023-12-24 02:44:14,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1461840.0, ans=0.125 2023-12-24 02:44:14,413 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.02 vs. limit=15.0 2023-12-24 02:44:22,408 INFO [train.py:886] (1/4) Epoch 47, batch 50, loss[loss=0.01636, audio_tagging_loss=0.01636, over 25000.00 frames. ], tot_loss[loss=0.0173, audio_tagging_loss=0.0173, over 1117482.66 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 16.0 2023-12-24 02:44:28,151 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 02:44:34,687 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.488e+01 4.203e+01 4.907e+01 5.637e+01 1.199e+02, threshold=9.813e+01, percent-clipped=7.0 2023-12-24 02:44:36,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1461973.3333333333, ans=0.0 2023-12-24 02:44:40,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1461973.3333333333, ans=0.125 2023-12-24 02:44:42,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1462040.0, ans=0.125 2023-12-24 02:44:44,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1462040.0, ans=0.0 2023-12-24 02:44:56,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1462106.6666666667, ans=0.1 2023-12-24 02:44:56,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1462106.6666666667, ans=0.2 2023-12-24 02:44:58,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1462106.6666666667, ans=0.125 2023-12-24 02:45:07,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.74 vs. limit=10.0 2023-12-24 02:45:08,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1462173.3333333333, ans=0.0 2023-12-24 02:45:09,521 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.06 vs. limit=22.5 2023-12-24 02:45:13,771 INFO [train.py:886] (1/4) Epoch 47, batch 100, loss[loss=0.01589, audio_tagging_loss=0.01589, over 25000.00 frames. ], tot_loss[loss=0.01491, audio_tagging_loss=0.01491, over 1974243.34 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 16.0 2023-12-24 02:45:41,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1462373.3333333333, ans=0.2 2023-12-24 02:45:44,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1462440.0, ans=0.125 2023-12-24 02:45:44,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1462440.0, ans=0.125 2023-12-24 02:45:52,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1462440.0, ans=0.1 2023-12-24 02:45:55,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1462506.6666666667, ans=0.2 2023-12-24 02:45:58,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1462506.6666666667, ans=0.035 2023-12-24 02:45:59,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1462506.6666666667, ans=0.125 2023-12-24 02:46:05,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1462573.3333333333, ans=0.2 2023-12-24 02:46:05,864 INFO [train.py:886] (1/4) Epoch 47, batch 150, loss[loss=0.01246, audio_tagging_loss=0.01246, over 24750.00 frames. ], tot_loss[loss=0.01359, audio_tagging_loss=0.01359, over 2636472.74 frames. ], batch size: 99, lr: 2.29e-03, grad_scale: 16.0 2023-12-24 02:46:06,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1462573.3333333333, ans=0.125 2023-12-24 02:46:17,192 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.815e+01 4.110e+01 4.292e+01 4.596e+01 5.407e+01, threshold=8.583e+01, percent-clipped=0.0 2023-12-24 02:46:30,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1462706.6666666667, ans=0.125 2023-12-24 02:46:32,248 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.00 vs. limit=15.0 2023-12-24 02:46:37,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1462773.3333333333, ans=0.0 2023-12-24 02:46:39,440 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.79 vs. limit=15.0 2023-12-24 02:46:54,294 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.90 vs. limit=15.0 2023-12-24 02:46:58,185 INFO [train.py:886] (1/4) Epoch 47, batch 200, loss[loss=0.009261, audio_tagging_loss=0.009261, over 25000.00 frames. ], tot_loss[loss=0.01287, audio_tagging_loss=0.01287, over 3154501.15 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 16.0 2023-12-24 02:47:11,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1462973.3333333333, ans=0.125 2023-12-24 02:47:17,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1463040.0, ans=0.1 2023-12-24 02:47:49,239 INFO [train.py:886] (1/4) Epoch 47, batch 250, loss[loss=0.01026, audio_tagging_loss=0.01026, over 25000.00 frames. ], tot_loss[loss=0.01233, audio_tagging_loss=0.01233, over 3557770.71 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 16.0 2023-12-24 02:48:01,231 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.589e+01 3.929e+01 4.138e+01 4.313e+01 4.926e+01, threshold=8.277e+01, percent-clipped=0.0 2023-12-24 02:48:11,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1463373.3333333333, ans=0.05 2023-12-24 02:48:14,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1463373.3333333333, ans=0.0 2023-12-24 02:48:26,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1463440.0, ans=0.125 2023-12-24 02:48:29,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1463506.6666666667, ans=0.125 2023-12-24 02:48:34,468 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.94 vs. limit=15.0 2023-12-24 02:48:40,491 INFO [train.py:886] (1/4) Epoch 47, batch 300, loss[loss=0.01435, audio_tagging_loss=0.01435, over 24750.00 frames. ], tot_loss[loss=0.01205, audio_tagging_loss=0.01205, over 3861487.98 frames. ], batch size: 99, lr: 2.29e-03, grad_scale: 16.0 2023-12-24 02:48:45,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1463573.3333333333, ans=0.125 2023-12-24 02:48:50,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1463640.0, ans=0.1 2023-12-24 02:49:00,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1463706.6666666667, ans=0.125 2023-12-24 02:49:31,969 INFO [train.py:886] (1/4) Epoch 47, batch 350, loss[loss=0.01113, audio_tagging_loss=0.01113, over 24750.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4100398.20 frames. ], batch size: 99, lr: 2.29e-03, grad_scale: 16.0 2023-12-24 02:49:32,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1463906.6666666667, ans=0.125 2023-12-24 02:49:37,205 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.82 vs. limit=15.0 2023-12-24 02:49:44,719 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.473e+01 3.948e+01 4.136e+01 4.344e+01 5.181e+01, threshold=8.273e+01, percent-clipped=0.0 2023-12-24 02:49:51,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1463973.3333333333, ans=0.1 2023-12-24 02:50:15,389 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.34 vs. limit=15.0 2023-12-24 02:50:21,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1464173.3333333333, ans=0.0 2023-12-24 02:50:21,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1464173.3333333333, ans=0.125 2023-12-24 02:50:24,245 INFO [train.py:886] (1/4) Epoch 47, batch 400, loss[loss=0.008786, audio_tagging_loss=0.008786, over 24750.00 frames. ], tot_loss[loss=0.01161, audio_tagging_loss=0.01161, over 4285503.61 frames. ], batch size: 99, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:50:27,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1464240.0, ans=0.0 2023-12-24 02:50:37,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1464306.6666666667, ans=0.0 2023-12-24 02:50:40,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1464306.6666666667, ans=0.125 2023-12-24 02:50:57,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.30 vs. limit=15.0 2023-12-24 02:51:02,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1464440.0, ans=0.125 2023-12-24 02:51:02,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1464440.0, ans=0.2 2023-12-24 02:51:07,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1464506.6666666667, ans=0.0 2023-12-24 02:51:16,132 INFO [train.py:886] (1/4) Epoch 47, batch 450, loss[loss=0.01229, audio_tagging_loss=0.01229, over 24750.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4428588.05 frames. ], batch size: 99, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:51:28,925 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.364e+01 3.902e+01 4.040e+01 4.251e+01 5.082e+01, threshold=8.080e+01, percent-clipped=0.0 2023-12-24 02:51:30,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1464640.0, ans=0.2 2023-12-24 02:51:44,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1464706.6666666667, ans=0.1 2023-12-24 02:51:54,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1464773.3333333333, ans=0.125 2023-12-24 02:52:02,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1464840.0, ans=0.1 2023-12-24 02:52:03,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1464840.0, ans=0.2 2023-12-24 02:52:06,594 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.13 vs. limit=15.0 2023-12-24 02:52:07,979 INFO [train.py:886] (1/4) Epoch 47, batch 500, loss[loss=0.01036, audio_tagging_loss=0.01036, over 25000.00 frames. ], tot_loss[loss=0.01127, audio_tagging_loss=0.01127, over 4541209.36 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:52:19,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1464973.3333333333, ans=0.125 2023-12-24 02:52:22,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1464973.3333333333, ans=0.125 2023-12-24 02:52:31,270 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.92 vs. limit=15.0 2023-12-24 02:52:32,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1465040.0, ans=0.0 2023-12-24 02:52:36,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1465040.0, ans=0.2 2023-12-24 02:52:45,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1465106.6666666667, ans=0.0 2023-12-24 02:52:56,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1465173.3333333333, ans=0.0 2023-12-24 02:53:00,338 INFO [train.py:886] (1/4) Epoch 47, batch 550, loss[loss=0.008997, audio_tagging_loss=0.008997, over 25000.00 frames. ], tot_loss[loss=0.01113, audio_tagging_loss=0.01113, over 4639262.07 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:53:12,417 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.441e+01 3.969e+01 4.099e+01 4.262e+01 5.027e+01, threshold=8.197e+01, percent-clipped=0.0 2023-12-24 02:53:51,814 INFO [train.py:886] (1/4) Epoch 47, batch 600, loss[loss=0.01081, audio_tagging_loss=0.01081, over 24750.00 frames. ], tot_loss[loss=0.01114, audio_tagging_loss=0.01114, over 4709583.71 frames. ], batch size: 99, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:54:08,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1465640.0, ans=0.0 2023-12-24 02:54:36,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1465840.0, ans=0.1 2023-12-24 02:54:43,460 INFO [train.py:886] (1/4) Epoch 47, batch 650, loss[loss=0.01124, audio_tagging_loss=0.01124, over 24750.00 frames. ], tot_loss[loss=0.01119, audio_tagging_loss=0.01119, over 4760301.98 frames. ], batch size: 99, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:54:55,403 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.474e+01 3.862e+01 4.034e+01 4.310e+01 5.761e+01, threshold=8.068e+01, percent-clipped=0.0 2023-12-24 02:54:56,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1465973.3333333333, ans=0.0 2023-12-24 02:55:14,208 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.72 vs. limit=15.0 2023-12-24 02:55:14,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1466106.6666666667, ans=0.2 2023-12-24 02:55:26,104 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.48 vs. limit=22.5 2023-12-24 02:55:34,952 INFO [train.py:886] (1/4) Epoch 47, batch 700, loss[loss=0.009939, audio_tagging_loss=0.009939, over 24750.00 frames. ], tot_loss[loss=0.0111, audio_tagging_loss=0.0111, over 4797544.67 frames. ], batch size: 99, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:55:45,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1466306.6666666667, ans=0.125 2023-12-24 02:55:55,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1466373.3333333333, ans=0.0 2023-12-24 02:55:57,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1466373.3333333333, ans=0.0 2023-12-24 02:56:01,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1466373.3333333333, ans=0.125 2023-12-24 02:56:06,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1466440.0, ans=0.1 2023-12-24 02:56:06,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1466440.0, ans=0.2 2023-12-24 02:56:23,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1466506.6666666667, ans=0.0 2023-12-24 02:56:26,192 INFO [train.py:886] (1/4) Epoch 47, batch 750, loss[loss=0.00914, audio_tagging_loss=0.00914, over 25000.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4830832.57 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:56:38,900 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.551e+01 3.880e+01 4.112e+01 4.307e+01 5.752e+01, threshold=8.224e+01, percent-clipped=0.0 2023-12-24 02:56:47,795 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.73 vs. limit=22.5 2023-12-24 02:57:03,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1466773.3333333333, ans=0.0 2023-12-24 02:57:05,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1466773.3333333333, ans=0.125 2023-12-24 02:57:13,663 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.23 vs. limit=15.0 2023-12-24 02:57:20,479 INFO [train.py:886] (1/4) Epoch 47, batch 800, loss[loss=0.01142, audio_tagging_loss=0.01142, over 25000.00 frames. ], tot_loss[loss=0.01088, audio_tagging_loss=0.01088, over 4852208.70 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:57:27,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1466906.6666666667, ans=0.0 2023-12-24 02:58:12,284 INFO [train.py:886] (1/4) Epoch 47, batch 850, loss[loss=0.01033, audio_tagging_loss=0.01033, over 25000.00 frames. ], tot_loss[loss=0.01086, audio_tagging_loss=0.01086, over 4878373.53 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:58:16,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1467240.0, ans=0.125 2023-12-24 02:58:25,026 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.509e+01 3.885e+01 4.046e+01 4.247e+01 5.015e+01, threshold=8.092e+01, percent-clipped=0.0 2023-12-24 02:58:27,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1467306.6666666667, ans=0.1 2023-12-24 02:59:03,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1467573.3333333333, ans=0.125 2023-12-24 02:59:04,300 INFO [train.py:886] (1/4) Epoch 47, batch 900, loss[loss=0.008826, audio_tagging_loss=0.008826, over 25000.00 frames. ], tot_loss[loss=0.01084, audio_tagging_loss=0.01084, over 4897335.94 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:59:11,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1467573.3333333333, ans=10.0 2023-12-24 02:59:19,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1467640.0, ans=0.125 2023-12-24 02:59:28,878 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.20 vs. limit=22.5 2023-12-24 02:59:36,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1467773.3333333333, ans=0.125 2023-12-24 02:59:38,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1467773.3333333333, ans=0.0 2023-12-24 02:59:49,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=1467840.0, ans=0.5 2023-12-24 02:59:50,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1467840.0, ans=0.0 2023-12-24 02:59:51,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1467840.0, ans=0.125 2023-12-24 02:59:54,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1467840.0, ans=0.125 2023-12-24 02:59:56,685 INFO [train.py:886] (1/4) Epoch 47, batch 950, loss[loss=0.01052, audio_tagging_loss=0.01052, over 24750.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4907411.69 frames. ], batch size: 99, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 03:00:08,635 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.658e+01 3.944e+01 4.162e+01 4.322e+01 5.155e+01, threshold=8.324e+01, percent-clipped=0.0 2023-12-24 03:00:12,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1467973.3333333333, ans=0.0 2023-12-24 03:00:18,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1468040.0, ans=0.125 2023-12-24 03:00:19,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1468040.0, ans=0.125 2023-12-24 03:00:25,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1468040.0, ans=0.125 2023-12-24 03:00:29,553 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.51 vs. limit=12.0 2023-12-24 03:00:37,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1468173.3333333333, ans=0.2 2023-12-24 03:00:48,899 INFO [train.py:886] (1/4) Epoch 47, batch 1000, loss[loss=0.01271, audio_tagging_loss=0.01271, over 25000.00 frames. ], tot_loss[loss=0.01094, audio_tagging_loss=0.01094, over 4916617.22 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 03:00:54,098 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.13 vs. limit=15.0 2023-12-24 03:01:08,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1468373.3333333333, ans=0.07 2023-12-24 03:01:27,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1468440.0, ans=0.0 2023-12-24 03:01:40,687 INFO [train.py:886] (1/4) Epoch 47, batch 1050, loss[loss=0.01084, audio_tagging_loss=0.01084, over 25000.00 frames. ], tot_loss[loss=0.01083, audio_tagging_loss=0.01083, over 4926706.29 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 03:01:40,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1468573.3333333333, ans=0.2 2023-12-24 03:01:46,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1468573.3333333333, ans=0.125 2023-12-24 03:01:53,565 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.568e+01 3.919e+01 4.119e+01 4.342e+01 4.813e+01, threshold=8.238e+01, percent-clipped=0.0 2023-12-24 03:02:02,210 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.21 vs. limit=22.5 2023-12-24 03:02:06,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1468706.6666666667, ans=0.025 2023-12-24 03:02:10,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1468706.6666666667, ans=0.125 2023-12-24 03:02:15,345 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.78 vs. limit=15.0 2023-12-24 03:02:21,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1468840.0, ans=0.0 2023-12-24 03:02:32,948 INFO [train.py:886] (1/4) Epoch 47, batch 1100, loss[loss=0.01052, audio_tagging_loss=0.01052, over 24750.00 frames. ], tot_loss[loss=0.01083, audio_tagging_loss=0.01083, over 4934743.64 frames. ], batch size: 99, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 03:02:47,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1468973.3333333333, ans=0.125 2023-12-24 03:02:53,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1469040.0, ans=0.5 2023-12-24 03:03:23,789 INFO [train.py:886] (1/4) Epoch 47, batch 1150, loss[loss=0.01165, audio_tagging_loss=0.01165, over 25000.00 frames. ], tot_loss[loss=0.0108, audio_tagging_loss=0.0108, over 4934843.62 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 03:03:37,300 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.450e+01 3.892e+01 4.065e+01 4.219e+01 4.911e+01, threshold=8.130e+01, percent-clipped=0.0 2023-12-24 03:03:38,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1469306.6666666667, ans=0.2 2023-12-24 03:03:43,535 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.83 vs. limit=12.0 2023-12-24 03:03:50,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1469373.3333333333, ans=0.0 2023-12-24 03:04:08,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1469506.6666666667, ans=0.0 2023-12-24 03:04:17,411 INFO [train.py:886] (1/4) Epoch 47, batch 1200, loss[loss=0.01079, audio_tagging_loss=0.01079, over 25000.00 frames. ], tot_loss[loss=0.01085, audio_tagging_loss=0.01085, over 4942428.95 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 03:04:24,536 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.69 vs. limit=15.0 2023-12-24 03:04:26,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1469640.0, ans=0.125 2023-12-24 03:04:35,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1469640.0, ans=0.1 2023-12-24 03:04:38,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1469706.6666666667, ans=0.125 2023-12-24 03:04:40,319 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.87 vs. limit=15.0 2023-12-24 03:04:50,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1469773.3333333333, ans=0.0 2023-12-24 03:04:54,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1469773.3333333333, ans=0.2 2023-12-24 03:05:07,737 INFO [train.py:886] (1/4) Epoch 47, batch 1250, loss[loss=0.01415, audio_tagging_loss=0.01415, over 24750.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4939488.29 frames. ], batch size: 99, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 03:05:11,645 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.14 vs. limit=22.5 2023-12-24 03:05:15,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1469906.6666666667, ans=0.125 2023-12-24 03:05:20,731 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.584e+01 3.991e+01 4.155e+01 4.310e+01 5.132e+01, threshold=8.310e+01, percent-clipped=0.0 2023-12-24 03:05:21,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1469973.3333333333, ans=0.0 2023-12-24 03:05:26,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1469973.3333333333, ans=0.2 2023-12-24 03:05:30,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1470040.0, ans=0.0 2023-12-24 03:05:54,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1470173.3333333333, ans=0.1 2023-12-24 03:05:57,721 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:05:59,449 INFO [train.py:886] (1/4) Epoch 47, batch 1300, loss[loss=0.01115, audio_tagging_loss=0.01115, over 24750.00 frames. ], tot_loss[loss=0.01096, audio_tagging_loss=0.01096, over 4931789.83 frames. ], batch size: 99, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 03:06:02,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1470240.0, ans=0.125 2023-12-24 03:06:09,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1470306.6666666667, ans=0.0 2023-12-24 03:06:16,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1470306.6666666667, ans=0.0 2023-12-24 03:06:17,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1470306.6666666667, ans=0.0 2023-12-24 03:06:31,323 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.81 vs. limit=15.0 2023-12-24 03:06:34,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1470440.0, ans=0.125 2023-12-24 03:06:49,395 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.01 vs. limit=22.5 2023-12-24 03:06:52,379 INFO [train.py:886] (1/4) Epoch 47, batch 1350, loss[loss=0.009984, audio_tagging_loss=0.009984, over 24750.00 frames. ], tot_loss[loss=0.01085, audio_tagging_loss=0.01085, over 4931541.27 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:07:03,684 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.555e+01 3.933e+01 4.111e+01 4.315e+01 5.636e+01, threshold=8.222e+01, percent-clipped=0.0 2023-12-24 03:07:04,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1470640.0, ans=0.2 2023-12-24 03:07:25,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1470773.3333333333, ans=0.125 2023-12-24 03:07:27,601 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2023-12-24 03:07:36,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.18 vs. limit=6.0 2023-12-24 03:07:43,727 INFO [train.py:886] (1/4) Epoch 47, batch 1400, loss[loss=0.009301, audio_tagging_loss=0.009301, over 24750.00 frames. ], tot_loss[loss=0.01076, audio_tagging_loss=0.01076, over 4936408.77 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:07:46,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1470906.6666666667, ans=0.0 2023-12-24 03:08:17,186 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.47 vs. limit=15.0 2023-12-24 03:08:34,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1471173.3333333333, ans=0.125 2023-12-24 03:08:35,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1471240.0, ans=0.0 2023-12-24 03:08:35,972 INFO [train.py:886] (1/4) Epoch 47, batch 1450, loss[loss=0.009744, audio_tagging_loss=0.009744, over 24750.00 frames. ], tot_loss[loss=0.01073, audio_tagging_loss=0.01073, over 4937798.11 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:08:39,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1471240.0, ans=0.2 2023-12-24 03:08:41,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1471240.0, ans=0.1 2023-12-24 03:08:41,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1471240.0, ans=0.2 2023-12-24 03:08:41,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1471240.0, ans=0.125 2023-12-24 03:08:48,109 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.547e+01 3.853e+01 4.015e+01 4.195e+01 8.350e+01, threshold=8.030e+01, percent-clipped=1.0 2023-12-24 03:09:00,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1471373.3333333333, ans=0.0 2023-12-24 03:09:26,254 INFO [train.py:886] (1/4) Epoch 47, batch 1500, loss[loss=0.01175, audio_tagging_loss=0.01175, over 25000.00 frames. ], tot_loss[loss=0.01079, audio_tagging_loss=0.01079, over 4945190.72 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:09:31,544 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.99 vs. limit=8.0 2023-12-24 03:09:50,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1471706.6666666667, ans=0.2 2023-12-24 03:10:17,898 INFO [train.py:886] (1/4) Epoch 47, batch 1550, loss[loss=0.008808, audio_tagging_loss=0.008808, over 24041.00 frames. ], tot_loss[loss=0.01083, audio_tagging_loss=0.01083, over 4941611.16 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:10:27,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1471973.3333333333, ans=0.125 2023-12-24 03:10:27,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1471973.3333333333, ans=0.0 2023-12-24 03:10:29,799 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.710e+01 4.030e+01 4.186e+01 4.353e+01 4.618e+01, threshold=8.371e+01, percent-clipped=0.0 2023-12-24 03:10:36,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1471973.3333333333, ans=0.0 2023-12-24 03:10:54,445 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:10:59,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1472173.3333333333, ans=0.125 2023-12-24 03:10:59,310 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:11:04,766 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.89 vs. limit=15.0 2023-12-24 03:11:10,585 INFO [train.py:886] (1/4) Epoch 47, batch 1600, loss[loss=0.01067, audio_tagging_loss=0.01067, over 24750.00 frames. ], tot_loss[loss=0.01088, audio_tagging_loss=0.01088, over 4937846.76 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:11:13,772 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.21 vs. limit=15.0 2023-12-24 03:11:19,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1472306.6666666667, ans=0.1 2023-12-24 03:11:45,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1472440.0, ans=0.125 2023-12-24 03:11:48,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1472440.0, ans=0.1 2023-12-24 03:11:59,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1472506.6666666667, ans=0.0 2023-12-24 03:12:01,424 INFO [train.py:886] (1/4) Epoch 47, batch 1650, loss[loss=0.01137, audio_tagging_loss=0.01137, over 24750.00 frames. ], tot_loss[loss=0.01086, audio_tagging_loss=0.01086, over 4938985.73 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:12:12,670 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.45 vs. limit=12.0 2023-12-24 03:12:14,046 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.647e+01 4.031e+01 4.196e+01 4.409e+01 5.344e+01, threshold=8.391e+01, percent-clipped=0.0 2023-12-24 03:12:19,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1472640.0, ans=0.2 2023-12-24 03:12:28,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1472706.6666666667, ans=0.0 2023-12-24 03:12:29,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1472706.6666666667, ans=0.5 2023-12-24 03:12:37,583 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.70 vs. limit=15.0 2023-12-24 03:12:48,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1472840.0, ans=0.125 2023-12-24 03:12:51,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1472906.6666666667, ans=0.09899494936611666 2023-12-24 03:12:52,664 INFO [train.py:886] (1/4) Epoch 47, batch 1700, loss[loss=0.008094, audio_tagging_loss=0.008094, over 22371.00 frames. ], tot_loss[loss=0.01082, audio_tagging_loss=0.01082, over 4938709.49 frames. ], batch size: 107, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:13:03,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1472973.3333333333, ans=0.2 2023-12-24 03:13:03,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1472973.3333333333, ans=0.125 2023-12-24 03:13:12,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1473040.0, ans=0.1 2023-12-24 03:13:16,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1473040.0, ans=0.125 2023-12-24 03:13:16,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1473040.0, ans=0.2 2023-12-24 03:13:16,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1473040.0, ans=0.0 2023-12-24 03:13:43,961 INFO [train.py:886] (1/4) Epoch 47, batch 1750, loss[loss=0.01013, audio_tagging_loss=0.01013, over 22335.00 frames. ], tot_loss[loss=0.01081, audio_tagging_loss=0.01081, over 4937661.71 frames. ], batch size: 107, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:13:49,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1473240.0, ans=0.125 2023-12-24 03:13:56,767 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.487e+01 3.924e+01 4.095e+01 4.271e+01 4.874e+01, threshold=8.190e+01, percent-clipped=0.0 2023-12-24 03:14:02,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1473306.6666666667, ans=0.125 2023-12-24 03:14:03,978 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.53 vs. limit=6.0 2023-12-24 03:14:22,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1473440.0, ans=0.2 2023-12-24 03:14:29,650 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.19 vs. limit=22.5 2023-12-24 03:14:35,525 INFO [train.py:886] (1/4) Epoch 47, batch 1800, loss[loss=0.01149, audio_tagging_loss=0.01149, over 25000.00 frames. ], tot_loss[loss=0.01077, audio_tagging_loss=0.01077, over 4948882.54 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:14:50,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1473640.0, ans=0.2 2023-12-24 03:15:22,891 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.89 vs. limit=15.0 2023-12-24 03:15:27,765 INFO [train.py:886] (1/4) Epoch 47, batch 1850, loss[loss=0.01324, audio_tagging_loss=0.01324, over 24750.00 frames. ], tot_loss[loss=0.0108, audio_tagging_loss=0.0108, over 4947692.14 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:15:39,846 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.565e+01 3.927e+01 4.095e+01 4.266e+01 4.764e+01, threshold=8.189e+01, percent-clipped=0.0 2023-12-24 03:16:18,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1474173.3333333333, ans=0.125 2023-12-24 03:16:19,734 INFO [train.py:886] (1/4) Epoch 47, batch 1900, loss[loss=0.01028, audio_tagging_loss=0.01028, over 24072.00 frames. ], tot_loss[loss=0.01088, audio_tagging_loss=0.01088, over 4941350.54 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:16:34,772 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.71 vs. limit=22.5 2023-12-24 03:16:43,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1474373.3333333333, ans=0.0 2023-12-24 03:16:58,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1474440.0, ans=0.07 2023-12-24 03:17:05,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1474506.6666666667, ans=0.5 2023-12-24 03:17:12,115 INFO [train.py:886] (1/4) Epoch 47, batch 1950, loss[loss=0.008826, audio_tagging_loss=0.008826, over 24750.00 frames. ], tot_loss[loss=0.0108, audio_tagging_loss=0.0108, over 4939577.80 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:17:16,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1474573.3333333333, ans=0.125 2023-12-24 03:17:20,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1474573.3333333333, ans=0.125 2023-12-24 03:17:24,065 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.604e+01 3.930e+01 4.132e+01 4.340e+01 4.631e+01, threshold=8.265e+01, percent-clipped=0.0 2023-12-24 03:17:27,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1474640.0, ans=0.0 2023-12-24 03:18:04,040 INFO [train.py:886] (1/4) Epoch 47, batch 2000, loss[loss=0.008462, audio_tagging_loss=0.008462, over 25000.00 frames. ], tot_loss[loss=0.01071, audio_tagging_loss=0.01071, over 4945343.56 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:18:09,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1474906.6666666667, ans=0.125 2023-12-24 03:18:34,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1475106.6666666667, ans=0.1 2023-12-24 03:18:42,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1475106.6666666667, ans=0.125 2023-12-24 03:18:46,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1475173.3333333333, ans=0.125 2023-12-24 03:18:56,081 INFO [train.py:886] (1/4) Epoch 47, batch 2050, loss[loss=0.008937, audio_tagging_loss=0.008937, over 25000.00 frames. ], tot_loss[loss=0.01061, audio_tagging_loss=0.01061, over 4948111.76 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:19:07,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1475306.6666666667, ans=0.125 2023-12-24 03:19:09,101 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.482e+01 3.891e+01 4.061e+01 4.208e+01 4.839e+01, threshold=8.122e+01, percent-clipped=0.0 2023-12-24 03:19:09,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1475306.6666666667, ans=0.0 2023-12-24 03:19:11,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1475306.6666666667, ans=0.1 2023-12-24 03:19:32,379 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.27 vs. limit=12.0 2023-12-24 03:19:47,637 INFO [train.py:886] (1/4) Epoch 47, batch 2100, loss[loss=0.01142, audio_tagging_loss=0.01142, over 24750.00 frames. ], tot_loss[loss=0.0106, audio_tagging_loss=0.0106, over 4948860.61 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:19:52,768 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.74 vs. limit=15.0 2023-12-24 03:20:02,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.76 vs. limit=15.0 2023-12-24 03:20:05,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1475640.0, ans=0.0 2023-12-24 03:20:19,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1475773.3333333333, ans=0.0 2023-12-24 03:20:38,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1475906.6666666667, ans=0.2 2023-12-24 03:20:38,857 INFO [train.py:886] (1/4) Epoch 47, batch 2150, loss[loss=0.01022, audio_tagging_loss=0.01022, over 25000.00 frames. ], tot_loss[loss=0.01065, audio_tagging_loss=0.01065, over 4953029.08 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:20:45,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1475906.6666666667, ans=0.1 2023-12-24 03:20:52,742 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.593e+01 3.982e+01 4.167e+01 4.321e+01 5.208e+01, threshold=8.335e+01, percent-clipped=0.0 2023-12-24 03:21:06,070 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.01 vs. limit=15.0 2023-12-24 03:21:22,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1476173.3333333333, ans=0.2 2023-12-24 03:21:30,442 INFO [train.py:886] (1/4) Epoch 47, batch 2200, loss[loss=0.01157, audio_tagging_loss=0.01157, over 24750.00 frames. ], tot_loss[loss=0.01075, audio_tagging_loss=0.01075, over 4949344.45 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:21:36,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=1476240.0, ans=10.0 2023-12-24 03:21:43,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1476306.6666666667, ans=0.035 2023-12-24 03:21:52,385 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.88 vs. limit=12.0 2023-12-24 03:22:23,340 INFO [train.py:886] (1/4) Epoch 47, batch 2250, loss[loss=0.0126, audio_tagging_loss=0.0126, over 25000.00 frames. ], tot_loss[loss=0.01072, audio_tagging_loss=0.01072, over 4943596.31 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:22:26,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1476573.3333333333, ans=0.1 2023-12-24 03:22:35,566 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.670e+01 3.897e+01 4.118e+01 4.297e+01 5.377e+01, threshold=8.236e+01, percent-clipped=0.0 2023-12-24 03:22:53,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1476773.3333333333, ans=0.125 2023-12-24 03:22:55,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1476773.3333333333, ans=0.0 2023-12-24 03:22:55,503 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.81 vs. limit=15.0 2023-12-24 03:22:58,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1476773.3333333333, ans=0.0 2023-12-24 03:23:02,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1476840.0, ans=0.1 2023-12-24 03:23:14,279 INFO [train.py:886] (1/4) Epoch 47, batch 2300, loss[loss=0.01123, audio_tagging_loss=0.01123, over 25000.00 frames. ], tot_loss[loss=0.01075, audio_tagging_loss=0.01075, over 4946243.37 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:23:22,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1476906.6666666667, ans=0.0 2023-12-24 03:23:33,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1477040.0, ans=0.125 2023-12-24 03:23:43,100 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.93 vs. limit=6.0 2023-12-24 03:23:54,055 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.58 vs. limit=15.0 2023-12-24 03:23:54,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1477173.3333333333, ans=0.0 2023-12-24 03:24:05,702 INFO [train.py:886] (1/4) Epoch 47, batch 2350, loss[loss=0.01083, audio_tagging_loss=0.01083, over 25000.00 frames. ], tot_loss[loss=0.01073, audio_tagging_loss=0.01073, over 4951188.10 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:24:17,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1477306.6666666667, ans=0.0 2023-12-24 03:24:19,383 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.580e+01 3.910e+01 4.055e+01 4.262e+01 5.306e+01, threshold=8.110e+01, percent-clipped=0.0 2023-12-24 03:24:25,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1477306.6666666667, ans=0.125 2023-12-24 03:24:42,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1477440.0, ans=0.1 2023-12-24 03:24:48,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1477506.6666666667, ans=0.1 2023-12-24 03:24:58,090 INFO [train.py:886] (1/4) Epoch 47, batch 2400, loss[loss=0.009581, audio_tagging_loss=0.009581, over 22482.00 frames. ], tot_loss[loss=0.0107, audio_tagging_loss=0.0107, over 4944052.64 frames. ], batch size: 107, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:25:18,230 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2023-12-24 03:25:20,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1477706.6666666667, ans=0.125 2023-12-24 03:25:49,203 INFO [train.py:886] (1/4) Epoch 47, batch 2450, loss[loss=0.01089, audio_tagging_loss=0.01089, over 25000.00 frames. ], tot_loss[loss=0.01078, audio_tagging_loss=0.01078, over 4947945.00 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:25:54,902 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:25:55,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1477906.6666666667, ans=0.1 2023-12-24 03:26:03,620 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.638e+01 3.970e+01 4.140e+01 4.271e+01 4.902e+01, threshold=8.281e+01, percent-clipped=0.0 2023-12-24 03:26:13,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1478040.0, ans=0.125 2023-12-24 03:26:29,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1478106.6666666667, ans=0.125 2023-12-24 03:26:42,136 INFO [train.py:886] (1/4) Epoch 47, batch 2500, loss[loss=0.01455, audio_tagging_loss=0.01455, over 24961.00 frames. ], tot_loss[loss=0.01082, audio_tagging_loss=0.01082, over 4949882.71 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:26:44,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1478240.0, ans=0.1 2023-12-24 03:26:47,421 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.87 vs. limit=6.0 2023-12-24 03:27:03,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1478373.3333333333, ans=0.0 2023-12-24 03:27:19,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1478440.0, ans=0.2 2023-12-24 03:27:19,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1478440.0, ans=0.125 2023-12-24 03:27:33,050 INFO [train.py:886] (1/4) Epoch 47, batch 2550, loss[loss=0.01285, audio_tagging_loss=0.01285, over 24750.00 frames. ], tot_loss[loss=0.01092, audio_tagging_loss=0.01092, over 4946184.52 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:27:39,530 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.86 vs. limit=22.5 2023-12-24 03:27:46,215 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.38 vs. limit=22.5 2023-12-24 03:27:47,510 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.559e+01 3.962e+01 4.101e+01 4.307e+01 5.190e+01, threshold=8.202e+01, percent-clipped=0.0 2023-12-24 03:28:24,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1478906.6666666667, ans=0.125 2023-12-24 03:28:25,401 INFO [train.py:886] (1/4) Epoch 47, batch 2600, loss[loss=0.009652, audio_tagging_loss=0.009652, over 24750.00 frames. ], tot_loss[loss=0.01089, audio_tagging_loss=0.01089, over 4949694.14 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:28:29,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1478906.6666666667, ans=0.125 2023-12-24 03:28:29,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1478906.6666666667, ans=0.0 2023-12-24 03:28:31,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1478906.6666666667, ans=0.125 2023-12-24 03:28:47,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1479040.0, ans=0.125 2023-12-24 03:28:49,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1479040.0, ans=0.09899494936611666 2023-12-24 03:28:58,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1479106.6666666667, ans=0.5 2023-12-24 03:28:59,354 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.23 vs. limit=22.5 2023-12-24 03:29:09,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1479173.3333333333, ans=0.2 2023-12-24 03:29:17,351 INFO [train.py:886] (1/4) Epoch 47, batch 2650, loss[loss=0.009589, audio_tagging_loss=0.009589, over 25000.00 frames. ], tot_loss[loss=0.0108, audio_tagging_loss=0.0108, over 4953431.39 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:29:25,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1479240.0, ans=0.07 2023-12-24 03:29:30,367 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.599e+01 3.913e+01 4.117e+01 4.340e+01 5.739e+01, threshold=8.234e+01, percent-clipped=0.0 2023-12-24 03:29:58,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1479506.6666666667, ans=0.125 2023-12-24 03:30:08,771 INFO [train.py:886] (1/4) Epoch 47, batch 2700, loss[loss=0.009855, audio_tagging_loss=0.009855, over 21738.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4951579.83 frames. ], batch size: 107, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:30:08,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1479573.3333333333, ans=0.1 2023-12-24 03:30:18,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1479640.0, ans=0.0 2023-12-24 03:30:25,052 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.37 vs. limit=5.0 2023-12-24 03:30:51,738 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.76 vs. limit=15.0 2023-12-24 03:31:01,099 INFO [train.py:886] (1/4) Epoch 47, batch 2750, loss[loss=0.01187, audio_tagging_loss=0.01187, over 25000.00 frames. ], tot_loss[loss=0.01075, audio_tagging_loss=0.01075, over 4958033.94 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:31:10,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1479973.3333333333, ans=0.2 2023-12-24 03:31:13,508 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.89 vs. limit=15.0 2023-12-24 03:31:14,035 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.573e+01 3.907e+01 4.081e+01 4.248e+01 4.984e+01, threshold=8.163e+01, percent-clipped=0.0 2023-12-24 03:31:30,147 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.35 vs. limit=15.0 2023-12-24 03:31:51,820 INFO [train.py:886] (1/4) Epoch 47, batch 2800, loss[loss=0.01204, audio_tagging_loss=0.01204, over 24750.00 frames. ], tot_loss[loss=0.01078, audio_tagging_loss=0.01078, over 4960577.90 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:32:17,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1480373.3333333333, ans=0.125 2023-12-24 03:32:43,758 INFO [train.py:886] (1/4) Epoch 47, batch 2850, loss[loss=0.01222, audio_tagging_loss=0.01222, over 24750.00 frames. ], tot_loss[loss=0.01086, audio_tagging_loss=0.01086, over 4957042.19 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:32:54,105 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:32:56,596 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.697e+01 4.004e+01 4.137e+01 4.360e+01 4.931e+01, threshold=8.275e+01, percent-clipped=0.0 2023-12-24 03:33:04,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1480706.6666666667, ans=0.0 2023-12-24 03:33:28,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1480840.0, ans=0.125 2023-12-24 03:33:30,523 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:33:34,980 INFO [train.py:886] (1/4) Epoch 47, batch 2900, loss[loss=0.01158, audio_tagging_loss=0.01158, over 24750.00 frames. ], tot_loss[loss=0.01088, audio_tagging_loss=0.01088, over 4953391.68 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:33:57,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.15 vs. limit=15.0 2023-12-24 03:33:59,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1481040.0, ans=0.125 2023-12-24 03:34:05,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.48 vs. limit=15.0 2023-12-24 03:34:12,204 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.91 vs. limit=22.5 2023-12-24 03:34:20,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1481173.3333333333, ans=0.04949747468305833 2023-12-24 03:34:21,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1481173.3333333333, ans=0.1 2023-12-24 03:34:21,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1481173.3333333333, ans=10.0 2023-12-24 03:34:27,558 INFO [train.py:886] (1/4) Epoch 47, batch 2950, loss[loss=0.01087, audio_tagging_loss=0.01087, over 25000.00 frames. ], tot_loss[loss=0.01078, audio_tagging_loss=0.01078, over 4951527.57 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:34:36,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1481306.6666666667, ans=0.1 2023-12-24 03:34:41,282 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.488e+01 3.898e+01 4.063e+01 4.276e+01 4.870e+01, threshold=8.126e+01, percent-clipped=0.0 2023-12-24 03:35:18,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1481506.6666666667, ans=0.2 2023-12-24 03:35:18,649 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.08 vs. limit=22.5 2023-12-24 03:35:19,986 INFO [train.py:886] (1/4) Epoch 47, batch 3000, loss[loss=0.008534, audio_tagging_loss=0.008534, over 25000.00 frames. ], tot_loss[loss=0.01072, audio_tagging_loss=0.01072, over 4950217.47 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:35:19,987 INFO [train.py:909] (1/4) Computing validation loss 2023-12-24 03:35:41,609 INFO [train.py:917] (1/4) Epoch 47, validation: loss=0.03661, audio_tagging_loss=0.03661, over 3737520.00 frames. 2023-12-24 03:35:41,609 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-24 03:35:43,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1481573.3333333333, ans=0.125 2023-12-24 03:35:54,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1481640.0, ans=0.125 2023-12-24 03:36:10,634 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.17 vs. limit=15.0 2023-12-24 03:36:11,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1481773.3333333333, ans=0.125 2023-12-24 03:36:13,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1481773.3333333333, ans=0.1 2023-12-24 03:36:33,030 INFO [train.py:886] (1/4) Epoch 47, batch 3050, loss[loss=0.01121, audio_tagging_loss=0.01121, over 25000.00 frames. ], tot_loss[loss=0.01065, audio_tagging_loss=0.01065, over 4951003.56 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:36:46,061 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.561e+01 3.890e+01 4.067e+01 4.265e+01 5.158e+01, threshold=8.135e+01, percent-clipped=0.0 2023-12-24 03:37:05,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1482106.6666666667, ans=0.04949747468305833 2023-12-24 03:37:11,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1482106.6666666667, ans=0.125 2023-12-24 03:37:25,366 INFO [train.py:886] (1/4) Epoch 47, batch 3100, loss[loss=0.01021, audio_tagging_loss=0.01021, over 24750.00 frames. ], tot_loss[loss=0.01073, audio_tagging_loss=0.01073, over 4959436.63 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:37:28,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1482240.0, ans=0.05 2023-12-24 03:37:31,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=1482240.0, ans=15.0 2023-12-24 03:37:33,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1482240.0, ans=0.125 2023-12-24 03:37:46,458 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.13 vs. limit=15.0 2023-12-24 03:37:54,591 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.52 vs. limit=15.0 2023-12-24 03:38:02,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1482440.0, ans=0.0 2023-12-24 03:38:03,992 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.35 vs. limit=15.0 2023-12-24 03:38:16,247 INFO [train.py:886] (1/4) Epoch 47, batch 3150, loss[loss=0.01264, audio_tagging_loss=0.01264, over 25000.00 frames. ], tot_loss[loss=0.01081, audio_tagging_loss=0.01081, over 4957295.29 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:38:18,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1482573.3333333333, ans=0.125 2023-12-24 03:38:30,725 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.480e+01 3.934e+01 4.165e+01 4.401e+01 5.350e+01, threshold=8.330e+01, percent-clipped=0.0 2023-12-24 03:38:36,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1482706.6666666667, ans=0.1 2023-12-24 03:39:08,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1482906.6666666667, ans=0.125 2023-12-24 03:39:08,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1482906.6666666667, ans=0.125 2023-12-24 03:39:08,899 INFO [train.py:886] (1/4) Epoch 47, batch 3200, loss[loss=0.008273, audio_tagging_loss=0.008273, over 25000.00 frames. ], tot_loss[loss=0.01083, audio_tagging_loss=0.01083, over 4954725.10 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:39:12,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.88 vs. limit=12.0 2023-12-24 03:39:14,045 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.92 vs. limit=15.0 2023-12-24 03:39:14,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1482906.6666666667, ans=0.125 2023-12-24 03:39:42,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1483106.6666666667, ans=0.125 2023-12-24 03:39:51,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1483173.3333333333, ans=0.0 2023-12-24 03:39:55,055 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.92 vs. limit=15.0 2023-12-24 03:40:00,768 INFO [train.py:886] (1/4) Epoch 47, batch 3250, loss[loss=0.01117, audio_tagging_loss=0.01117, over 24750.00 frames. ], tot_loss[loss=0.01075, audio_tagging_loss=0.01075, over 4955480.31 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:40:14,365 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.530e+01 3.967e+01 4.152e+01 4.354e+01 4.796e+01, threshold=8.304e+01, percent-clipped=0.0 2023-12-24 03:40:45,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1483506.6666666667, ans=0.05 2023-12-24 03:40:52,247 INFO [train.py:886] (1/4) Epoch 47, batch 3300, loss[loss=0.0102, audio_tagging_loss=0.0102, over 25000.00 frames. ], tot_loss[loss=0.01069, audio_tagging_loss=0.01069, over 4957886.44 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:41:04,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1483640.0, ans=0.2 2023-12-24 03:41:04,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.66 vs. limit=15.0 2023-12-24 03:41:10,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1483640.0, ans=0.95 2023-12-24 03:41:26,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1483773.3333333333, ans=0.2 2023-12-24 03:41:30,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1483773.3333333333, ans=10.0 2023-12-24 03:41:42,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=1483906.6666666667, ans=0.02 2023-12-24 03:41:43,691 INFO [train.py:886] (1/4) Epoch 47, batch 3350, loss[loss=0.009949, audio_tagging_loss=0.009949, over 25000.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4962329.76 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:41:57,457 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.599e+01 3.920e+01 4.118e+01 4.246e+01 4.815e+01, threshold=8.236e+01, percent-clipped=0.0 2023-12-24 03:42:09,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1484040.0, ans=0.1 2023-12-24 03:42:09,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1484040.0, ans=0.1 2023-12-24 03:42:11,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1484040.0, ans=0.125 2023-12-24 03:42:28,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1484173.3333333333, ans=0.0 2023-12-24 03:42:28,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1484173.3333333333, ans=0.025 2023-12-24 03:42:35,584 INFO [train.py:886] (1/4) Epoch 47, batch 3400, loss[loss=0.01034, audio_tagging_loss=0.01034, over 25000.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4965477.62 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:42:54,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1484306.6666666667, ans=0.1 2023-12-24 03:43:03,946 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2023-12-24 03:43:15,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1484440.0, ans=0.125 2023-12-24 03:43:27,010 INFO [train.py:886] (1/4) Epoch 47, batch 3450, loss[loss=0.007784, audio_tagging_loss=0.007784, over 24055.00 frames. ], tot_loss[loss=0.01085, audio_tagging_loss=0.01085, over 4957869.09 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:43:34,645 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:43:40,801 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.504e+01 3.946e+01 4.174e+01 4.346e+01 5.015e+01, threshold=8.347e+01, percent-clipped=0.0 2023-12-24 03:44:03,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1484773.3333333333, ans=0.0 2023-12-24 03:44:19,477 INFO [train.py:886] (1/4) Epoch 47, batch 3500, loss[loss=0.01099, audio_tagging_loss=0.01099, over 25000.00 frames. ], tot_loss[loss=0.01089, audio_tagging_loss=0.01089, over 4949414.75 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:44:25,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1484906.6666666667, ans=0.1 2023-12-24 03:44:26,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1484906.6666666667, ans=0.125 2023-12-24 03:44:26,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1484906.6666666667, ans=0.125 2023-12-24 03:44:50,727 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:44:50,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1485106.6666666667, ans=0.125 2023-12-24 03:45:06,123 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.71 vs. limit=15.0 2023-12-24 03:45:10,896 INFO [train.py:886] (1/4) Epoch 47, batch 3550, loss[loss=0.01007, audio_tagging_loss=0.01007, over 24750.00 frames. ], tot_loss[loss=0.01079, audio_tagging_loss=0.01079, over 4940007.22 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:45:13,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1485240.0, ans=0.125 2023-12-24 03:45:13,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1485240.0, ans=0.0 2023-12-24 03:45:24,590 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.366e+01 3.931e+01 4.091e+01 4.261e+01 4.917e+01, threshold=8.182e+01, percent-clipped=0.0 2023-12-24 03:45:40,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1485373.3333333333, ans=0.125 2023-12-24 03:45:54,018 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.22 vs. limit=6.0 2023-12-24 03:45:59,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1485506.6666666667, ans=0.0 2023-12-24 03:46:02,702 INFO [train.py:886] (1/4) Epoch 47, batch 3600, loss[loss=0.009918, audio_tagging_loss=0.009918, over 25000.00 frames. ], tot_loss[loss=0.01073, audio_tagging_loss=0.01073, over 4946504.82 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:46:19,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1485640.0, ans=0.0 2023-12-24 03:46:20,462 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.28 vs. limit=10.0 2023-12-24 03:46:27,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=1485706.6666666667, ans=0.95 2023-12-24 03:46:27,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1485706.6666666667, ans=0.0 2023-12-24 03:46:40,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1485773.3333333333, ans=0.0 2023-12-24 03:46:55,309 INFO [train.py:886] (1/4) Epoch 47, batch 3650, loss[loss=0.009609, audio_tagging_loss=0.009609, over 25000.00 frames. ], tot_loss[loss=0.01073, audio_tagging_loss=0.01073, over 4950924.26 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:47:02,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.32 vs. limit=10.0 2023-12-24 03:47:08,293 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.641e+01 3.947e+01 4.112e+01 4.320e+01 4.973e+01, threshold=8.224e+01, percent-clipped=0.0 2023-12-24 03:47:24,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1486040.0, ans=0.125 2023-12-24 03:47:28,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1486106.6666666667, ans=0.1 2023-12-24 03:47:39,072 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.61 vs. limit=15.0 2023-12-24 03:47:46,826 INFO [train.py:886] (1/4) Epoch 47, batch 3700, loss[loss=0.01145, audio_tagging_loss=0.01145, over 25000.00 frames. ], tot_loss[loss=0.01075, audio_tagging_loss=0.01075, over 4956730.75 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:48:00,158 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.72 vs. limit=15.0 2023-12-24 03:48:06,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1486306.6666666667, ans=0.125 2023-12-24 03:48:21,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1486440.0, ans=0.0 2023-12-24 03:48:39,070 INFO [train.py:886] (1/4) Epoch 47, batch 3750, loss[loss=0.01131, audio_tagging_loss=0.01131, over 25000.00 frames. ], tot_loss[loss=0.01089, audio_tagging_loss=0.01089, over 4960522.50 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:48:51,905 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.685e+01 4.010e+01 4.129e+01 4.399e+01 4.975e+01, threshold=8.259e+01, percent-clipped=0.0 2023-12-24 03:48:57,414 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.24 vs. limit=10.0 2023-12-24 03:49:24,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1486840.0, ans=0.0 2023-12-24 03:49:29,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1486906.6666666667, ans=0.5 2023-12-24 03:49:30,076 INFO [train.py:886] (1/4) Epoch 47, batch 3800, loss[loss=0.01033, audio_tagging_loss=0.01033, over 25000.00 frames. ], tot_loss[loss=0.01091, audio_tagging_loss=0.01091, over 4956303.70 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:49:46,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1486973.3333333333, ans=0.07 2023-12-24 03:49:59,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1487106.6666666667, ans=0.125 2023-12-24 03:50:12,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1487173.3333333333, ans=0.0 2023-12-24 03:50:22,433 INFO [train.py:886] (1/4) Epoch 47, batch 3850, loss[loss=0.01162, audio_tagging_loss=0.01162, over 25000.00 frames. ], tot_loss[loss=0.01081, audio_tagging_loss=0.01081, over 4954936.96 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:50:35,804 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.564e+01 3.975e+01 4.146e+01 4.354e+01 5.243e+01, threshold=8.293e+01, percent-clipped=0.0 2023-12-24 03:50:46,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1487373.3333333333, ans=0.0 2023-12-24 03:51:03,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1487506.6666666667, ans=0.0 2023-12-24 03:51:15,181 INFO [train.py:886] (1/4) Epoch 47, batch 3900, loss[loss=0.009909, audio_tagging_loss=0.009909, over 25000.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4951224.17 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:51:18,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1487573.3333333333, ans=0.07 2023-12-24 03:51:58,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1487840.0, ans=0.0 2023-12-24 03:51:58,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1487840.0, ans=0.0 2023-12-24 03:51:58,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1487840.0, ans=0.1 2023-12-24 03:52:06,201 INFO [train.py:886] (1/4) Epoch 47, batch 3950, loss[loss=0.01016, audio_tagging_loss=0.01016, over 25000.00 frames. ], tot_loss[loss=0.0107, audio_tagging_loss=0.0107, over 4958191.36 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:52:08,700 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.45 vs. limit=10.0 2023-12-24 03:52:14,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1487906.6666666667, ans=0.0 2023-12-24 03:52:19,822 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.576e+01 3.897e+01 4.099e+01 4.303e+01 4.809e+01, threshold=8.198e+01, percent-clipped=0.0 2023-12-24 03:52:21,370 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.61 vs. limit=15.0 2023-12-24 03:52:24,978 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:52:35,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1488106.6666666667, ans=0.2 2023-12-24 03:52:38,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1488106.6666666667, ans=0.2 2023-12-24 03:52:51,257 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.94 vs. limit=15.0 2023-12-24 03:52:58,008 INFO [train.py:886] (1/4) Epoch 47, batch 4000, loss[loss=0.01069, audio_tagging_loss=0.01069, over 22607.00 frames. ], tot_loss[loss=0.0107, audio_tagging_loss=0.0107, over 4960977.85 frames. ], batch size: 107, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:52:58,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1488240.0, ans=0.04949747468305833 2023-12-24 03:53:09,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1488306.6666666667, ans=0.1 2023-12-24 03:53:10,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1488306.6666666667, ans=0.0 2023-12-24 03:53:44,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1488506.6666666667, ans=0.125 2023-12-24 03:53:49,973 INFO [train.py:886] (1/4) Epoch 47, batch 4050, loss[loss=0.01119, audio_tagging_loss=0.01119, over 24750.00 frames. ], tot_loss[loss=0.01073, audio_tagging_loss=0.01073, over 4962875.04 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 03:54:03,599 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.451e+01 4.007e+01 4.158e+01 4.362e+01 4.853e+01, threshold=8.315e+01, percent-clipped=0.0 2023-12-24 03:54:09,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1488640.0, ans=0.125 2023-12-24 03:54:10,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1488706.6666666667, ans=0.125 2023-12-24 03:54:24,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1488773.3333333333, ans=0.125 2023-12-24 03:54:34,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1488840.0, ans=0.125 2023-12-24 03:54:39,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1488840.0, ans=0.2 2023-12-24 03:54:41,999 INFO [train.py:886] (1/4) Epoch 47, batch 4100, loss[loss=0.01024, audio_tagging_loss=0.01024, over 24750.00 frames. ], tot_loss[loss=0.01084, audio_tagging_loss=0.01084, over 4957754.26 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 03:54:46,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1488906.6666666667, ans=0.125 2023-12-24 03:54:57,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1488973.3333333333, ans=0.125 2023-12-24 03:55:00,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=1488973.3333333333, ans=0.02 2023-12-24 03:55:10,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1489040.0, ans=0.2 2023-12-24 03:55:33,582 INFO [train.py:886] (1/4) Epoch 47, batch 4150, loss[loss=0.01001, audio_tagging_loss=0.01001, over 25000.00 frames. ], tot_loss[loss=0.0108, audio_tagging_loss=0.0108, over 4948780.22 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 03:55:47,354 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.657e+01 3.988e+01 4.176e+01 4.439e+01 5.267e+01, threshold=8.351e+01, percent-clipped=0.0 2023-12-24 03:55:47,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1489306.6666666667, ans=0.125 2023-12-24 03:55:56,464 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=15.0 2023-12-24 03:55:57,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1489373.3333333333, ans=0.1 2023-12-24 03:56:11,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1489440.0, ans=0.1 2023-12-24 03:56:25,312 INFO [train.py:886] (1/4) Epoch 47, batch 4200, loss[loss=0.009935, audio_tagging_loss=0.009935, over 25000.00 frames. ], tot_loss[loss=0.01079, audio_tagging_loss=0.01079, over 4955692.98 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 03:56:36,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1489640.0, ans=0.125 2023-12-24 03:56:55,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1489706.6666666667, ans=0.0 2023-12-24 03:56:56,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1489773.3333333333, ans=0.0 2023-12-24 03:56:58,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1489773.3333333333, ans=0.125 2023-12-24 03:57:18,708 INFO [train.py:886] (1/4) Epoch 47, batch 4250, loss[loss=0.01198, audio_tagging_loss=0.01198, over 22344.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4959800.12 frames. ], batch size: 107, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 03:57:28,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1489973.3333333333, ans=0.0 2023-12-24 03:57:29,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1489973.3333333333, ans=0.2 2023-12-24 03:57:31,101 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.425e+01 3.978e+01 4.120e+01 4.273e+01 4.787e+01, threshold=8.239e+01, percent-clipped=0.0 2023-12-24 03:58:05,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1490173.3333333333, ans=0.0 2023-12-24 03:58:09,477 INFO [train.py:886] (1/4) Epoch 47, batch 4300, loss[loss=0.008332, audio_tagging_loss=0.008332, over 25000.00 frames. ], tot_loss[loss=0.01065, audio_tagging_loss=0.01065, over 4956532.95 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 03:58:09,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1490240.0, ans=0.125 2023-12-24 03:58:10,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1490240.0, ans=0.125 2023-12-24 03:58:14,156 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:58:14,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1490240.0, ans=0.07 2023-12-24 03:58:26,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1490306.6666666667, ans=0.2 2023-12-24 03:58:26,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1490306.6666666667, ans=0.125 2023-12-24 03:58:29,047 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.41 vs. limit=10.0 2023-12-24 03:58:42,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1490440.0, ans=0.2 2023-12-24 03:58:44,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1490440.0, ans=0.0 2023-12-24 03:58:52,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1490506.6666666667, ans=0.2 2023-12-24 03:59:01,782 INFO [train.py:886] (1/4) Epoch 47, batch 4350, loss[loss=0.01122, audio_tagging_loss=0.01122, over 25000.00 frames. ], tot_loss[loss=0.01066, audio_tagging_loss=0.01066, over 4962188.89 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 03:59:10,077 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.19 vs. limit=15.0 2023-12-24 03:59:14,773 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.601e+01 4.048e+01 4.179e+01 4.362e+01 5.083e+01, threshold=8.358e+01, percent-clipped=0.0 2023-12-24 03:59:21,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1490706.6666666667, ans=0.0 2023-12-24 03:59:25,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1490706.6666666667, ans=0.0 2023-12-24 03:59:38,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1490773.3333333333, ans=0.2 2023-12-24 03:59:39,326 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.59 vs. limit=10.0 2023-12-24 03:59:43,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1490840.0, ans=0.125 2023-12-24 03:59:44,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1490840.0, ans=0.1 2023-12-24 03:59:52,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1490840.0, ans=0.1 2023-12-24 03:59:53,618 INFO [train.py:886] (1/4) Epoch 47, batch 4400, loss[loss=0.01114, audio_tagging_loss=0.01114, over 24750.00 frames. ], tot_loss[loss=0.01073, audio_tagging_loss=0.01073, over 4957697.61 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 03:59:54,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1490906.6666666667, ans=0.2 2023-12-24 03:59:55,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1490906.6666666667, ans=0.125 2023-12-24 04:00:21,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1491040.0, ans=0.125 2023-12-24 04:00:25,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1491106.6666666667, ans=0.125 2023-12-24 04:00:30,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1491106.6666666667, ans=0.125 2023-12-24 04:00:37,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1491173.3333333333, ans=0.5 2023-12-24 04:00:37,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1491173.3333333333, ans=0.125 2023-12-24 04:00:45,147 INFO [train.py:886] (1/4) Epoch 47, batch 4450, loss[loss=0.01109, audio_tagging_loss=0.01109, over 24750.00 frames. ], tot_loss[loss=0.01083, audio_tagging_loss=0.01083, over 4955008.61 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 04:00:46,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1491240.0, ans=0.1 2023-12-24 04:00:58,157 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:00:58,334 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.70 vs. limit=15.0 2023-12-24 04:00:58,840 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.617e+01 3.970e+01 4.177e+01 4.303e+01 4.832e+01, threshold=8.353e+01, percent-clipped=0.0 2023-12-24 04:01:11,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1491373.3333333333, ans=0.125 2023-12-24 04:01:12,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1491373.3333333333, ans=0.07 2023-12-24 04:01:30,088 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.69 vs. limit=15.0 2023-12-24 04:01:37,668 INFO [train.py:886] (1/4) Epoch 47, batch 4500, loss[loss=0.009741, audio_tagging_loss=0.009741, over 25000.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4954568.72 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 04:01:41,028 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.61 vs. limit=15.0 2023-12-24 04:02:24,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1491840.0, ans=0.125 2023-12-24 04:02:29,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1491906.6666666667, ans=0.0 2023-12-24 04:02:30,056 INFO [train.py:886] (1/4) Epoch 47, batch 4550, loss[loss=0.01147, audio_tagging_loss=0.01147, over 25000.00 frames. ], tot_loss[loss=0.01073, audio_tagging_loss=0.01073, over 4954701.25 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 04:02:30,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1491906.6666666667, ans=0.125 2023-12-24 04:02:30,472 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=15.0 2023-12-24 04:02:34,232 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:02:43,227 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.462e+01 3.976e+01 4.121e+01 4.329e+01 5.064e+01, threshold=8.243e+01, percent-clipped=0.0 2023-12-24 04:02:51,272 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.69 vs. limit=15.0 2023-12-24 04:02:57,702 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.95 vs. limit=22.5 2023-12-24 04:03:06,901 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.74 vs. limit=15.0 2023-12-24 04:03:08,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1492106.6666666667, ans=0.2 2023-12-24 04:03:21,238 INFO [train.py:886] (1/4) Epoch 47, batch 4600, loss[loss=0.01147, audio_tagging_loss=0.01147, over 21762.00 frames. ], tot_loss[loss=0.01075, audio_tagging_loss=0.01075, over 4957993.20 frames. ], batch size: 107, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 04:03:41,987 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.09 vs. limit=22.5 2023-12-24 04:04:11,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1492506.6666666667, ans=0.0 2023-12-24 04:04:13,641 INFO [train.py:886] (1/4) Epoch 47, batch 4650, loss[loss=0.009839, audio_tagging_loss=0.009839, over 25000.00 frames. ], tot_loss[loss=0.01077, audio_tagging_loss=0.01077, over 4961724.38 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 04:04:26,947 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.563e+01 3.987e+01 4.134e+01 4.270e+01 4.832e+01, threshold=8.268e+01, percent-clipped=0.0 2023-12-24 04:04:27,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1492640.0, ans=0.0 2023-12-24 04:04:28,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1492640.0, ans=0.0 2023-12-24 04:04:34,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1492706.6666666667, ans=0.0 2023-12-24 04:04:45,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1492773.3333333333, ans=0.0 2023-12-24 04:04:49,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1492773.3333333333, ans=0.1 2023-12-24 04:05:04,476 INFO [train.py:886] (1/4) Epoch 47, batch 4700, loss[loss=0.009351, audio_tagging_loss=0.009351, over 24750.00 frames. ], tot_loss[loss=0.01087, audio_tagging_loss=0.01087, over 4952463.72 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 04:05:15,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1492973.3333333333, ans=0.1 2023-12-24 04:05:15,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1492973.3333333333, ans=0.0 2023-12-24 04:05:20,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1492973.3333333333, ans=0.0 2023-12-24 04:05:30,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.91 vs. limit=15.0 2023-12-24 04:05:34,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1493106.6666666667, ans=0.2 2023-12-24 04:05:37,750 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.29 vs. limit=10.0 2023-12-24 04:05:41,126 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.66 vs. limit=15.0 2023-12-24 04:05:41,865 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:05:42,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.19 vs. limit=6.0 2023-12-24 04:05:49,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1493173.3333333333, ans=0.0 2023-12-24 04:05:51,833 INFO [train.py:886] (1/4) Epoch 47, batch 4750, loss[loss=0.01449, audio_tagging_loss=0.01449, over 24750.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4943205.82 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 04:05:55,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1493240.0, ans=0.125 2023-12-24 04:06:03,898 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.583e+01 4.050e+01 4.260e+01 4.432e+01 5.167e+01, threshold=8.521e+01, percent-clipped=0.0 2023-12-24 04:06:27,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1493346.6666666667, ans=0.125 2023-12-24 04:06:28,391 INFO [train.py:886] (1/4) Epoch 48, batch 0, loss[loss=0.02985, audio_tagging_loss=0.02985, over 20704.00 frames. ], tot_loss[loss=0.02985, audio_tagging_loss=0.02985, over 20704.00 frames. ], batch size: 107, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:06:28,392 INFO [train.py:909] (1/4) Computing validation loss 2023-12-24 04:06:49,465 INFO [train.py:917] (1/4) Epoch 48, validation: loss=0.03686, audio_tagging_loss=0.03686, over 3737520.00 frames. 2023-12-24 04:06:49,466 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-24 04:07:06,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1493413.3333333333, ans=0.0 2023-12-24 04:07:34,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.07 vs. limit=6.0 2023-12-24 04:07:41,196 INFO [train.py:886] (1/4) Epoch 48, batch 50, loss[loss=0.01365, audio_tagging_loss=0.01365, over 25000.00 frames. ], tot_loss[loss=0.01758, audio_tagging_loss=0.01758, over 1119878.28 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:07:45,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1493680.0, ans=0.0 2023-12-24 04:07:58,925 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:08:00,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1493813.3333333333, ans=0.1 2023-12-24 04:08:13,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1493880.0, ans=0.0 2023-12-24 04:08:14,141 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.25 vs. limit=15.0 2023-12-24 04:08:23,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.19 vs. limit=15.0 2023-12-24 04:08:31,115 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 4.056e+01 4.648e+01 5.127e+01 5.664e+01 9.776e+01, threshold=1.025e+02, percent-clipped=5.0 2023-12-24 04:08:33,027 INFO [train.py:886] (1/4) Epoch 48, batch 100, loss[loss=0.01449, audio_tagging_loss=0.01449, over 25000.00 frames. ], tot_loss[loss=0.01514, audio_tagging_loss=0.01514, over 1965985.88 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:08:41,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1494013.3333333333, ans=0.125 2023-12-24 04:08:47,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1494080.0, ans=0.125 2023-12-24 04:08:51,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1494080.0, ans=0.0 2023-12-24 04:09:04,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1494213.3333333333, ans=0.125 2023-12-24 04:09:10,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1494213.3333333333, ans=0.125 2023-12-24 04:09:20,594 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=12.0 2023-12-24 04:09:24,827 INFO [train.py:886] (1/4) Epoch 48, batch 150, loss[loss=0.01184, audio_tagging_loss=0.01184, over 22194.00 frames. ], tot_loss[loss=0.01387, audio_tagging_loss=0.01387, over 2632631.39 frames. ], batch size: 107, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:09:32,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1494346.6666666667, ans=0.2 2023-12-24 04:09:40,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1494413.3333333333, ans=0.125 2023-12-24 04:09:43,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1494413.3333333333, ans=0.0 2023-12-24 04:09:48,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1494480.0, ans=0.125 2023-12-24 04:09:48,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1494480.0, ans=0.0 2023-12-24 04:09:50,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1494480.0, ans=0.0 2023-12-24 04:09:54,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1494546.6666666667, ans=0.0 2023-12-24 04:10:14,270 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.608e+01 4.085e+01 4.287e+01 4.458e+01 4.971e+01, threshold=8.574e+01, percent-clipped=0.0 2023-12-24 04:10:16,199 INFO [train.py:886] (1/4) Epoch 48, batch 200, loss[loss=0.01151, audio_tagging_loss=0.01151, over 24750.00 frames. ], tot_loss[loss=0.01289, audio_tagging_loss=0.01289, over 3153296.22 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:10:33,220 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.79 vs. limit=15.0 2023-12-24 04:10:43,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1494813.3333333333, ans=0.0 2023-12-24 04:10:44,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1494813.3333333333, ans=0.0 2023-12-24 04:10:50,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1494880.0, ans=10.0 2023-12-24 04:11:08,778 INFO [train.py:886] (1/4) Epoch 48, batch 250, loss[loss=0.009854, audio_tagging_loss=0.009854, over 25000.00 frames. ], tot_loss[loss=0.01239, audio_tagging_loss=0.01239, over 3560943.54 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:11:15,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1495013.3333333333, ans=0.0 2023-12-24 04:11:15,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=1495013.3333333333, ans=10.0 2023-12-24 04:11:21,470 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.80 vs. limit=15.0 2023-12-24 04:11:25,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1495080.0, ans=0.0 2023-12-24 04:11:43,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1495213.3333333333, ans=0.125 2023-12-24 04:11:51,250 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:11:58,317 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.624e+01 3.958e+01 4.151e+01 4.361e+01 5.160e+01, threshold=8.303e+01, percent-clipped=0.0 2023-12-24 04:12:00,940 INFO [train.py:886] (1/4) Epoch 48, batch 300, loss[loss=0.008096, audio_tagging_loss=0.008096, over 21674.00 frames. ], tot_loss[loss=0.01205, audio_tagging_loss=0.01205, over 3863903.88 frames. ], batch size: 107, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:12:02,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1495346.6666666667, ans=0.0 2023-12-24 04:12:02,292 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.72 vs. limit=6.0 2023-12-24 04:12:03,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1495346.6666666667, ans=0.125 2023-12-24 04:12:11,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1495413.3333333333, ans=0.0 2023-12-24 04:12:22,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1495480.0, ans=0.125 2023-12-24 04:12:36,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1495546.6666666667, ans=0.0 2023-12-24 04:12:52,533 INFO [train.py:886] (1/4) Epoch 48, batch 350, loss[loss=0.01137, audio_tagging_loss=0.01137, over 25000.00 frames. ], tot_loss[loss=0.01182, audio_tagging_loss=0.01182, over 4101907.30 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:13:08,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1495746.6666666667, ans=0.125 2023-12-24 04:13:21,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1495813.3333333333, ans=0.0 2023-12-24 04:13:42,334 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.481e+01 3.929e+01 4.116e+01 4.285e+01 5.525e+01, threshold=8.232e+01, percent-clipped=0.0 2023-12-24 04:13:44,926 INFO [train.py:886] (1/4) Epoch 48, batch 400, loss[loss=0.00824, audio_tagging_loss=0.00824, over 25000.00 frames. ], tot_loss[loss=0.01148, audio_tagging_loss=0.01148, over 4289055.45 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:13:52,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1496013.3333333333, ans=0.0 2023-12-24 04:13:53,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1496013.3333333333, ans=0.0 2023-12-24 04:14:04,040 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.91 vs. limit=15.0 2023-12-24 04:14:13,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1496146.6666666667, ans=0.07 2023-12-24 04:14:18,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1496213.3333333333, ans=0.0 2023-12-24 04:14:25,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1496280.0, ans=0.125 2023-12-24 04:14:35,635 INFO [train.py:886] (1/4) Epoch 48, batch 450, loss[loss=0.01185, audio_tagging_loss=0.01185, over 25000.00 frames. ], tot_loss[loss=0.01127, audio_tagging_loss=0.01127, over 4439341.31 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:14:48,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1496413.3333333333, ans=0.2 2023-12-24 04:15:04,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1496480.0, ans=0.125 2023-12-24 04:15:07,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1496546.6666666667, ans=0.2 2023-12-24 04:15:16,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1496613.3333333333, ans=10.0 2023-12-24 04:15:26,674 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.400e+01 3.891e+01 4.102e+01 4.306e+01 5.682e+01, threshold=8.203e+01, percent-clipped=0.0 2023-12-24 04:15:28,581 INFO [train.py:886] (1/4) Epoch 48, batch 500, loss[loss=0.01127, audio_tagging_loss=0.01127, over 21500.00 frames. ], tot_loss[loss=0.01106, audio_tagging_loss=0.01106, over 4548804.13 frames. ], batch size: 107, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:15:32,776 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.07 vs. limit=15.0 2023-12-24 04:15:35,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1496680.0, ans=0.95 2023-12-24 04:15:36,596 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.90 vs. limit=15.0 2023-12-24 04:15:38,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1496746.6666666667, ans=0.0 2023-12-24 04:15:45,215 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.22 vs. limit=15.0 2023-12-24 04:15:49,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1496813.3333333333, ans=0.09899494936611666 2023-12-24 04:15:55,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1496813.3333333333, ans=0.0 2023-12-24 04:16:02,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.61 vs. limit=15.0 2023-12-24 04:16:08,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1496946.6666666667, ans=0.125 2023-12-24 04:16:11,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=1496946.6666666667, ans=6.0 2023-12-24 04:16:19,550 INFO [train.py:886] (1/4) Epoch 48, batch 550, loss[loss=0.01157, audio_tagging_loss=0.01157, over 25000.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4640023.84 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:16:20,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.66 vs. limit=15.0 2023-12-24 04:16:29,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1497080.0, ans=0.125 2023-12-24 04:16:32,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.74 vs. limit=6.0 2023-12-24 04:16:33,696 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.75 vs. limit=10.0 2023-12-24 04:16:39,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1497146.6666666667, ans=0.0 2023-12-24 04:16:40,255 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.80 vs. limit=15.0 2023-12-24 04:16:54,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.91 vs. limit=10.0 2023-12-24 04:17:03,770 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.94 vs. limit=22.5 2023-12-24 04:17:10,274 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.621e+01 4.004e+01 4.167e+01 4.372e+01 5.263e+01, threshold=8.335e+01, percent-clipped=0.0 2023-12-24 04:17:12,153 INFO [train.py:886] (1/4) Epoch 48, batch 600, loss[loss=0.01107, audio_tagging_loss=0.01107, over 24750.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4707212.39 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:17:27,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1497413.3333333333, ans=0.0 2023-12-24 04:17:29,472 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.94 vs. limit=10.0 2023-12-24 04:17:32,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1497480.0, ans=0.125 2023-12-24 04:17:36,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1497480.0, ans=0.125 2023-12-24 04:17:37,381 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=27.42 vs. limit=22.5 2023-12-24 04:17:48,018 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.40 vs. limit=22.5 2023-12-24 04:17:57,236 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.87 vs. limit=12.0 2023-12-24 04:18:03,735 INFO [train.py:886] (1/4) Epoch 48, batch 650, loss[loss=0.009127, audio_tagging_loss=0.009127, over 24750.00 frames. ], tot_loss[loss=0.01092, audio_tagging_loss=0.01092, over 4751644.16 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:18:25,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.49 vs. limit=15.0 2023-12-24 04:18:37,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1497880.0, ans=0.1 2023-12-24 04:18:52,643 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.607e+01 3.971e+01 4.176e+01 4.392e+01 5.928e+01, threshold=8.353e+01, percent-clipped=0.0 2023-12-24 04:18:55,247 INFO [train.py:886] (1/4) Epoch 48, batch 700, loss[loss=0.01024, audio_tagging_loss=0.01024, over 25000.00 frames. ], tot_loss[loss=0.01091, audio_tagging_loss=0.01091, over 4792677.10 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:19:04,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1498080.0, ans=0.2 2023-12-24 04:19:11,042 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:19:20,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1498146.6666666667, ans=0.0 2023-12-24 04:19:26,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1498213.3333333333, ans=0.0 2023-12-24 04:19:28,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1498213.3333333333, ans=0.125 2023-12-24 04:19:31,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1498213.3333333333, ans=0.1 2023-12-24 04:19:41,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1498280.0, ans=0.0 2023-12-24 04:19:42,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1498280.0, ans=0.1 2023-12-24 04:19:45,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1498280.0, ans=0.1 2023-12-24 04:19:46,796 INFO [train.py:886] (1/4) Epoch 48, batch 750, loss[loss=0.009866, audio_tagging_loss=0.009866, over 24034.00 frames. ], tot_loss[loss=0.01082, audio_tagging_loss=0.01082, over 4827892.95 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:19:48,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1498346.6666666667, ans=0.125 2023-12-24 04:19:50,146 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.13 vs. limit=15.0 2023-12-24 04:19:51,914 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.94 vs. limit=15.0 2023-12-24 04:20:05,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1498480.0, ans=0.2 2023-12-24 04:20:20,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1498546.6666666667, ans=0.125 2023-12-24 04:20:36,083 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.587e+01 3.930e+01 4.066e+01 4.267e+01 5.066e+01, threshold=8.132e+01, percent-clipped=0.0 2023-12-24 04:20:38,019 INFO [train.py:886] (1/4) Epoch 48, batch 800, loss[loss=0.007985, audio_tagging_loss=0.007985, over 25000.00 frames. ], tot_loss[loss=0.01075, audio_tagging_loss=0.01075, over 4851100.42 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:20:41,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1498680.0, ans=0.0 2023-12-24 04:20:46,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1498680.0, ans=0.125 2023-12-24 04:20:49,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1498746.6666666667, ans=0.2 2023-12-24 04:20:50,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1498746.6666666667, ans=0.2 2023-12-24 04:20:54,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1498746.6666666667, ans=0.2 2023-12-24 04:21:20,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1498946.6666666667, ans=0.125 2023-12-24 04:21:30,309 INFO [train.py:886] (1/4) Epoch 48, batch 850, loss[loss=0.00905, audio_tagging_loss=0.00905, over 25000.00 frames. ], tot_loss[loss=0.01077, audio_tagging_loss=0.01077, over 4878010.87 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:21:49,105 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=15.0 2023-12-24 04:22:06,022 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.80 vs. limit=15.0 2023-12-24 04:22:20,132 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.702e+01 4.028e+01 4.180e+01 4.366e+01 5.371e+01, threshold=8.359e+01, percent-clipped=0.0 2023-12-24 04:22:22,867 INFO [train.py:886] (1/4) Epoch 48, batch 900, loss[loss=0.01204, audio_tagging_loss=0.01204, over 24750.00 frames. ], tot_loss[loss=0.01083, audio_tagging_loss=0.01083, over 4899126.67 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:22:32,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1499413.3333333333, ans=0.035 2023-12-24 04:22:32,832 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.47 vs. limit=22.5 2023-12-24 04:22:37,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1499413.3333333333, ans=0.125 2023-12-24 04:23:12,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1499613.3333333333, ans=0.125 2023-12-24 04:23:14,539 INFO [train.py:886] (1/4) Epoch 48, batch 950, loss[loss=0.01059, audio_tagging_loss=0.01059, over 24750.00 frames. ], tot_loss[loss=0.01093, audio_tagging_loss=0.01093, over 4906177.88 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:23:31,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1499746.6666666667, ans=0.2 2023-12-24 04:23:48,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1499880.0, ans=0.125 2023-12-24 04:23:49,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1499880.0, ans=0.04949747468305833 2023-12-24 04:23:51,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1499880.0, ans=0.2 2023-12-24 04:24:01,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1499946.6666666667, ans=0.1 2023-12-24 04:24:04,740 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.583e+01 3.993e+01 4.147e+01 4.321e+01 5.221e+01, threshold=8.295e+01, percent-clipped=0.0 2023-12-24 04:24:04,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1499946.6666666667, ans=0.2 2023-12-24 04:24:06,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1500013.3333333333, ans=0.1 2023-12-24 04:24:07,320 INFO [train.py:886] (1/4) Epoch 48, batch 1000, loss[loss=0.01247, audio_tagging_loss=0.01247, over 24750.00 frames. ], tot_loss[loss=0.01083, audio_tagging_loss=0.01083, over 4907281.47 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:24:23,344 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:24:24,784 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.68 vs. limit=15.0 2023-12-24 04:24:25,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1500080.0, ans=0.125 2023-12-24 04:24:37,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1500213.3333333333, ans=0.2 2023-12-24 04:24:39,893 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.84 vs. limit=22.5 2023-12-24 04:24:45,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1500213.3333333333, ans=0.0 2023-12-24 04:24:58,993 INFO [train.py:886] (1/4) Epoch 48, batch 1050, loss[loss=0.009344, audio_tagging_loss=0.009344, over 24750.00 frames. ], tot_loss[loss=0.01082, audio_tagging_loss=0.01082, over 4915839.08 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:25:05,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1500346.6666666667, ans=0.0 2023-12-24 04:25:14,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1500413.3333333333, ans=0.07 2023-12-24 04:25:17,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1500413.3333333333, ans=0.1 2023-12-24 04:25:23,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1500480.0, ans=0.2 2023-12-24 04:25:23,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1500480.0, ans=0.0 2023-12-24 04:25:33,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1500546.6666666667, ans=0.125 2023-12-24 04:25:42,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1500613.3333333333, ans=0.125 2023-12-24 04:25:48,715 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.258e+01 3.950e+01 4.096e+01 4.313e+01 4.903e+01, threshold=8.193e+01, percent-clipped=0.0 2023-12-24 04:25:50,623 INFO [train.py:886] (1/4) Epoch 48, batch 1100, loss[loss=0.01026, audio_tagging_loss=0.01026, over 24750.00 frames. ], tot_loss[loss=0.01075, audio_tagging_loss=0.01075, over 4927451.98 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:25:52,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.96 vs. limit=22.5 2023-12-24 04:25:54,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1500680.0, ans=0.95 2023-12-24 04:26:11,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1500813.3333333333, ans=0.125 2023-12-24 04:26:17,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1500813.3333333333, ans=0.05 2023-12-24 04:26:35,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1500946.6666666667, ans=0.2 2023-12-24 04:26:42,947 INFO [train.py:886] (1/4) Epoch 48, batch 1150, loss[loss=0.01194, audio_tagging_loss=0.01194, over 25000.00 frames. ], tot_loss[loss=0.01078, audio_tagging_loss=0.01078, over 4935366.55 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:26:47,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1501013.3333333333, ans=0.0 2023-12-24 04:26:48,902 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.70 vs. limit=12.0 2023-12-24 04:26:54,648 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.05 vs. limit=22.5 2023-12-24 04:27:11,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1501146.6666666667, ans=0.125 2023-12-24 04:27:16,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1501213.3333333333, ans=0.0 2023-12-24 04:27:32,817 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.629e+01 3.995e+01 4.171e+01 4.338e+01 4.792e+01, threshold=8.343e+01, percent-clipped=0.0 2023-12-24 04:27:34,749 INFO [train.py:886] (1/4) Epoch 48, batch 1200, loss[loss=0.0099, audio_tagging_loss=0.0099, over 24750.00 frames. ], tot_loss[loss=0.01081, audio_tagging_loss=0.01081, over 4942140.00 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:27:35,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1501346.6666666667, ans=0.0 2023-12-24 04:28:05,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1501546.6666666667, ans=0.125 2023-12-24 04:28:07,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1501546.6666666667, ans=0.125 2023-12-24 04:28:26,241 INFO [train.py:886] (1/4) Epoch 48, batch 1250, loss[loss=0.009684, audio_tagging_loss=0.009684, over 25000.00 frames. ], tot_loss[loss=0.01088, audio_tagging_loss=0.01088, over 4937883.41 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:28:39,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1501746.6666666667, ans=0.0 2023-12-24 04:28:41,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1501746.6666666667, ans=0.0 2023-12-24 04:28:44,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1501746.6666666667, ans=0.1 2023-12-24 04:28:46,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1501813.3333333333, ans=0.125 2023-12-24 04:28:47,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1501813.3333333333, ans=0.0 2023-12-24 04:28:48,228 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.28 vs. limit=15.0 2023-12-24 04:29:16,864 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.731e+01 4.024e+01 4.196e+01 4.446e+01 5.087e+01, threshold=8.392e+01, percent-clipped=0.0 2023-12-24 04:29:18,767 INFO [train.py:886] (1/4) Epoch 48, batch 1300, loss[loss=0.01062, audio_tagging_loss=0.01062, over 24750.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4938852.86 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:29:22,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1502013.3333333333, ans=0.125 2023-12-24 04:29:23,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1502013.3333333333, ans=0.1 2023-12-24 04:29:25,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1502013.3333333333, ans=0.125 2023-12-24 04:29:26,718 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.81 vs. limit=10.0 2023-12-24 04:29:38,973 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:29:42,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1502146.6666666667, ans=0.125 2023-12-24 04:29:44,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1502146.6666666667, ans=0.1 2023-12-24 04:29:50,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1502213.3333333333, ans=0.0 2023-12-24 04:29:53,936 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.23 vs. limit=6.0 2023-12-24 04:30:04,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1502280.0, ans=0.1 2023-12-24 04:30:08,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1502280.0, ans=0.04949747468305833 2023-12-24 04:30:10,928 INFO [train.py:886] (1/4) Epoch 48, batch 1350, loss[loss=0.01195, audio_tagging_loss=0.01195, over 25000.00 frames. ], tot_loss[loss=0.01094, audio_tagging_loss=0.01094, over 4938645.57 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:30:16,122 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.87 vs. limit=15.0 2023-12-24 04:30:58,852 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.408e-02 2023-12-24 04:31:00,586 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.655e+01 3.929e+01 4.129e+01 4.400e+01 5.128e+01, threshold=8.257e+01, percent-clipped=0.0 2023-12-24 04:31:02,530 INFO [train.py:886] (1/4) Epoch 48, batch 1400, loss[loss=0.01085, audio_tagging_loss=0.01085, over 25000.00 frames. ], tot_loss[loss=0.01086, audio_tagging_loss=0.01086, over 4938168.11 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:31:21,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1502746.6666666667, ans=0.125 2023-12-24 04:31:25,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1502813.3333333333, ans=0.125 2023-12-24 04:31:31,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1502813.3333333333, ans=0.0 2023-12-24 04:31:37,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1502880.0, ans=0.1 2023-12-24 04:31:44,398 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:31:46,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1502946.6666666667, ans=10.0 2023-12-24 04:31:54,557 INFO [train.py:886] (1/4) Epoch 48, batch 1450, loss[loss=0.01147, audio_tagging_loss=0.01147, over 25000.00 frames. ], tot_loss[loss=0.01079, audio_tagging_loss=0.01079, over 4937981.26 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:32:03,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1503013.3333333333, ans=0.0 2023-12-24 04:32:09,195 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.72 vs. limit=15.0 2023-12-24 04:32:15,671 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.24 vs. limit=12.0 2023-12-24 04:32:16,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1503146.6666666667, ans=0.2 2023-12-24 04:32:43,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1503280.0, ans=0.1 2023-12-24 04:32:44,960 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.529e+01 3.948e+01 4.171e+01 4.358e+01 4.772e+01, threshold=8.342e+01, percent-clipped=0.0 2023-12-24 04:32:46,904 INFO [train.py:886] (1/4) Epoch 48, batch 1500, loss[loss=0.01112, audio_tagging_loss=0.01112, over 24750.00 frames. ], tot_loss[loss=0.01079, audio_tagging_loss=0.01079, over 4939288.01 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:32:49,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1503346.6666666667, ans=0.0 2023-12-24 04:32:52,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1503346.6666666667, ans=0.1 2023-12-24 04:33:40,129 INFO [train.py:886] (1/4) Epoch 48, batch 1550, loss[loss=0.00862, audio_tagging_loss=0.00862, over 24750.00 frames. ], tot_loss[loss=0.01084, audio_tagging_loss=0.01084, over 4940909.92 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:33:44,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1503680.0, ans=0.125 2023-12-24 04:33:49,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1503746.6666666667, ans=0.125 2023-12-24 04:33:55,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1503746.6666666667, ans=0.125 2023-12-24 04:33:58,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1503746.6666666667, ans=0.125 2023-12-24 04:34:12,402 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:34:19,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1503880.0, ans=0.125 2023-12-24 04:34:19,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1503880.0, ans=0.1 2023-12-24 04:34:29,206 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.717e+01 4.053e+01 4.191e+01 4.372e+01 4.989e+01, threshold=8.382e+01, percent-clipped=0.0 2023-12-24 04:34:31,134 INFO [train.py:886] (1/4) Epoch 48, batch 1600, loss[loss=0.01354, audio_tagging_loss=0.01354, over 24750.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4938415.76 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:35:02,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1504213.3333333333, ans=0.0 2023-12-24 04:35:06,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1504213.3333333333, ans=0.1 2023-12-24 04:35:18,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1504280.0, ans=0.2 2023-12-24 04:35:22,952 INFO [train.py:886] (1/4) Epoch 48, batch 1650, loss[loss=0.008553, audio_tagging_loss=0.008553, over 22347.00 frames. ], tot_loss[loss=0.01078, audio_tagging_loss=0.01078, over 4932189.14 frames. ], batch size: 107, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:35:24,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1504346.6666666667, ans=0.0 2023-12-24 04:35:27,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1504346.6666666667, ans=0.125 2023-12-24 04:35:35,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1504413.3333333333, ans=0.2 2023-12-24 04:35:44,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1504480.0, ans=0.125 2023-12-24 04:35:51,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1504480.0, ans=0.125 2023-12-24 04:35:51,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1504480.0, ans=0.125 2023-12-24 04:35:59,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1504546.6666666667, ans=0.125 2023-12-24 04:36:07,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1504613.3333333333, ans=0.0 2023-12-24 04:36:11,661 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.638e+01 3.961e+01 4.120e+01 4.349e+01 5.089e+01, threshold=8.240e+01, percent-clipped=0.0 2023-12-24 04:36:14,271 INFO [train.py:886] (1/4) Epoch 48, batch 1700, loss[loss=0.01072, audio_tagging_loss=0.01072, over 24919.00 frames. ], tot_loss[loss=0.01078, audio_tagging_loss=0.01078, over 4937531.46 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 04:36:19,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1504680.0, ans=0.04949747468305833 2023-12-24 04:36:34,128 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.91 vs. limit=15.0 2023-12-24 04:36:37,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1504813.3333333333, ans=0.0 2023-12-24 04:36:47,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1504880.0, ans=0.125 2023-12-24 04:37:06,803 INFO [train.py:886] (1/4) Epoch 48, batch 1750, loss[loss=0.009655, audio_tagging_loss=0.009655, over 25000.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4945950.79 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 04:37:08,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1505013.3333333333, ans=0.2 2023-12-24 04:37:12,762 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=22.5 2023-12-24 04:37:14,121 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.05 vs. limit=15.0 2023-12-24 04:37:25,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1505080.0, ans=0.0 2023-12-24 04:37:29,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1505146.6666666667, ans=0.0 2023-12-24 04:37:29,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1505146.6666666667, ans=0.0 2023-12-24 04:37:35,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1505146.6666666667, ans=0.0 2023-12-24 04:37:56,316 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.626e+01 3.943e+01 4.153e+01 4.308e+01 5.197e+01, threshold=8.305e+01, percent-clipped=0.0 2023-12-24 04:37:59,056 INFO [train.py:886] (1/4) Epoch 48, batch 1800, loss[loss=0.01048, audio_tagging_loss=0.01048, over 21571.00 frames. ], tot_loss[loss=0.01058, audio_tagging_loss=0.01058, over 4950929.53 frames. ], batch size: 107, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 04:38:07,042 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.97 vs. limit=15.0 2023-12-24 04:38:12,490 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.76 vs. limit=6.0 2023-12-24 04:38:13,179 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:38:14,238 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=24.39 vs. limit=15.0 2023-12-24 04:38:45,146 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.67 vs. limit=15.0 2023-12-24 04:38:50,143 INFO [train.py:886] (1/4) Epoch 48, batch 1850, loss[loss=0.00966, audio_tagging_loss=0.00966, over 25000.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4952381.81 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 04:39:05,830 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.24 vs. limit=10.0 2023-12-24 04:39:15,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1505813.3333333333, ans=0.125 2023-12-24 04:39:24,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1505880.0, ans=0.125 2023-12-24 04:39:40,445 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.564e+01 4.037e+01 4.214e+01 4.372e+01 5.067e+01, threshold=8.428e+01, percent-clipped=0.0 2023-12-24 04:39:42,329 INFO [train.py:886] (1/4) Epoch 48, batch 1900, loss[loss=0.009534, audio_tagging_loss=0.009534, over 24750.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4943839.92 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 04:39:42,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1506013.3333333333, ans=0.07 2023-12-24 04:39:47,231 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.56 vs. limit=6.0 2023-12-24 04:40:02,232 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.48 vs. limit=12.0 2023-12-24 04:40:15,948 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.81 vs. limit=15.0 2023-12-24 04:40:27,091 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.71 vs. limit=22.5 2023-12-24 04:40:33,861 INFO [train.py:886] (1/4) Epoch 48, batch 1950, loss[loss=0.01031, audio_tagging_loss=0.01031, over 24750.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4946169.75 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 04:40:47,934 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.57 vs. limit=15.0 2023-12-24 04:40:50,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1506413.3333333333, ans=0.125 2023-12-24 04:41:20,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1506613.3333333333, ans=0.05 2023-12-24 04:41:24,468 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.628e+01 3.985e+01 4.127e+01 4.367e+01 5.324e+01, threshold=8.253e+01, percent-clipped=0.0 2023-12-24 04:41:26,440 INFO [train.py:886] (1/4) Epoch 48, batch 2000, loss[loss=0.008254, audio_tagging_loss=0.008254, over 25000.00 frames. ], tot_loss[loss=0.01063, audio_tagging_loss=0.01063, over 4950296.81 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:41:37,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1506746.6666666667, ans=0.1 2023-12-24 04:42:17,928 INFO [train.py:886] (1/4) Epoch 48, batch 2050, loss[loss=0.01155, audio_tagging_loss=0.01155, over 25000.00 frames. ], tot_loss[loss=0.01066, audio_tagging_loss=0.01066, over 4950547.50 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:42:22,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1507013.3333333333, ans=0.1 2023-12-24 04:42:23,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1507013.3333333333, ans=0.015 2023-12-24 04:42:36,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1507080.0, ans=0.2 2023-12-24 04:42:38,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1507146.6666666667, ans=0.125 2023-12-24 04:42:47,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.55 vs. limit=22.5 2023-12-24 04:42:48,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1507213.3333333333, ans=0.09899494936611666 2023-12-24 04:42:49,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1507213.3333333333, ans=0.2 2023-12-24 04:42:51,462 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.83 vs. limit=15.0 2023-12-24 04:42:55,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1507213.3333333333, ans=0.1 2023-12-24 04:43:07,461 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.609e+01 3.972e+01 4.171e+01 4.415e+01 5.113e+01, threshold=8.342e+01, percent-clipped=0.0 2023-12-24 04:43:09,383 INFO [train.py:886] (1/4) Epoch 48, batch 2100, loss[loss=0.01016, audio_tagging_loss=0.01016, over 25000.00 frames. ], tot_loss[loss=0.01065, audio_tagging_loss=0.01065, over 4952148.72 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:43:12,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1507346.6666666667, ans=0.0 2023-12-24 04:43:15,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1507346.6666666667, ans=0.1 2023-12-24 04:43:27,952 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:43:44,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1507546.6666666667, ans=0.125 2023-12-24 04:43:53,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1507613.3333333333, ans=0.1 2023-12-24 04:44:00,270 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.23 vs. limit=15.0 2023-12-24 04:44:00,795 INFO [train.py:886] (1/4) Epoch 48, batch 2150, loss[loss=0.009567, audio_tagging_loss=0.009567, over 24750.00 frames. ], tot_loss[loss=0.01066, audio_tagging_loss=0.01066, over 4951020.82 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:44:01,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1507680.0, ans=0.2 2023-12-24 04:44:20,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1507746.6666666667, ans=0.125 2023-12-24 04:44:23,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1507813.3333333333, ans=0.07 2023-12-24 04:44:27,379 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:44:31,387 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.34 vs. limit=10.0 2023-12-24 04:44:32,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1507880.0, ans=0.0 2023-12-24 04:44:43,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1507946.6666666667, ans=0.0 2023-12-24 04:44:43,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1507946.6666666667, ans=0.125 2023-12-24 04:44:50,297 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.605e+01 4.004e+01 4.221e+01 4.415e+01 5.119e+01, threshold=8.442e+01, percent-clipped=0.0 2023-12-24 04:44:52,944 INFO [train.py:886] (1/4) Epoch 48, batch 2200, loss[loss=0.01127, audio_tagging_loss=0.01127, over 24750.00 frames. ], tot_loss[loss=0.01072, audio_tagging_loss=0.01072, over 4948078.38 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:45:02,187 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.55 vs. limit=22.5 2023-12-24 04:45:12,108 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.23 vs. limit=15.0 2023-12-24 04:45:15,884 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.36 vs. limit=15.0 2023-12-24 04:45:19,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1508146.6666666667, ans=0.2 2023-12-24 04:45:19,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=1508146.6666666667, ans=15.0 2023-12-24 04:45:25,795 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.67 vs. limit=15.0 2023-12-24 04:45:32,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1508280.0, ans=10.0 2023-12-24 04:45:39,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1508280.0, ans=0.2 2023-12-24 04:45:40,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1508280.0, ans=0.2 2023-12-24 04:45:43,768 INFO [train.py:886] (1/4) Epoch 48, batch 2250, loss[loss=0.009909, audio_tagging_loss=0.009909, over 24750.00 frames. ], tot_loss[loss=0.01075, audio_tagging_loss=0.01075, over 4951141.48 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:45:44,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1508346.6666666667, ans=0.125 2023-12-24 04:46:02,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1508413.3333333333, ans=0.1 2023-12-24 04:46:08,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1508480.0, ans=0.125 2023-12-24 04:46:11,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1508480.0, ans=0.2 2023-12-24 04:46:12,530 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:46:16,343 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:46:32,556 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.473e+01 4.034e+01 4.155e+01 4.335e+01 5.306e+01, threshold=8.310e+01, percent-clipped=0.0 2023-12-24 04:46:34,440 INFO [train.py:886] (1/4) Epoch 48, batch 2300, loss[loss=0.01222, audio_tagging_loss=0.01222, over 24750.00 frames. ], tot_loss[loss=0.01066, audio_tagging_loss=0.01066, over 4949070.42 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:46:35,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1508680.0, ans=0.0 2023-12-24 04:46:43,435 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:46:51,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1508746.6666666667, ans=0.2 2023-12-24 04:46:51,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1508746.6666666667, ans=0.125 2023-12-24 04:47:14,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1508946.6666666667, ans=0.125 2023-12-24 04:47:20,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1508946.6666666667, ans=0.0 2023-12-24 04:47:25,851 INFO [train.py:886] (1/4) Epoch 48, batch 2350, loss[loss=0.00911, audio_tagging_loss=0.00911, over 24750.00 frames. ], tot_loss[loss=0.01058, audio_tagging_loss=0.01058, over 4953804.78 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:47:28,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1509013.3333333333, ans=0.0 2023-12-24 04:47:35,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1509013.3333333333, ans=0.0 2023-12-24 04:47:53,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1509146.6666666667, ans=0.0 2023-12-24 04:48:01,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1509213.3333333333, ans=0.0 2023-12-24 04:48:11,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1509280.0, ans=0.0 2023-12-24 04:48:15,769 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.606e+01 3.960e+01 4.074e+01 4.287e+01 4.962e+01, threshold=8.148e+01, percent-clipped=0.0 2023-12-24 04:48:17,723 INFO [train.py:886] (1/4) Epoch 48, batch 2400, loss[loss=0.01065, audio_tagging_loss=0.01065, over 25000.00 frames. ], tot_loss[loss=0.01053, audio_tagging_loss=0.01053, over 4956121.72 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:48:26,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1509346.6666666667, ans=0.125 2023-12-24 04:48:39,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1509480.0, ans=0.125 2023-12-24 04:49:07,945 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.38 vs. limit=15.0 2023-12-24 04:49:10,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1509680.0, ans=0.0 2023-12-24 04:49:10,951 INFO [train.py:886] (1/4) Epoch 48, batch 2450, loss[loss=0.01058, audio_tagging_loss=0.01058, over 25000.00 frames. ], tot_loss[loss=0.01065, audio_tagging_loss=0.01065, over 4952794.88 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:49:13,497 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2023-12-24 04:49:30,385 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.34 vs. limit=15.0 2023-12-24 04:49:41,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1509880.0, ans=0.2 2023-12-24 04:50:00,248 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.486e+01 4.002e+01 4.128e+01 4.314e+01 5.407e+01, threshold=8.257e+01, percent-clipped=0.0 2023-12-24 04:50:02,156 INFO [train.py:886] (1/4) Epoch 48, batch 2500, loss[loss=0.009176, audio_tagging_loss=0.009176, over 24750.00 frames. ], tot_loss[loss=0.0107, audio_tagging_loss=0.0107, over 4952264.17 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:50:42,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1510213.3333333333, ans=0.125 2023-12-24 04:50:54,097 INFO [train.py:886] (1/4) Epoch 48, batch 2550, loss[loss=0.008963, audio_tagging_loss=0.008963, over 24750.00 frames. ], tot_loss[loss=0.01081, audio_tagging_loss=0.01081, over 4950814.87 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:50:57,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1510346.6666666667, ans=0.125 2023-12-24 04:51:23,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1510480.0, ans=0.1 2023-12-24 04:51:24,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1510546.6666666667, ans=15.0 2023-12-24 04:51:27,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1510546.6666666667, ans=0.125 2023-12-24 04:51:33,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1510546.6666666667, ans=0.1 2023-12-24 04:51:36,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1510613.3333333333, ans=0.1 2023-12-24 04:51:37,813 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.34 vs. limit=12.0 2023-12-24 04:51:43,844 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.503e+01 4.050e+01 4.211e+01 4.452e+01 5.003e+01, threshold=8.421e+01, percent-clipped=0.0 2023-12-24 04:51:46,468 INFO [train.py:886] (1/4) Epoch 48, batch 2600, loss[loss=0.01065, audio_tagging_loss=0.01065, over 24750.00 frames. ], tot_loss[loss=0.01075, audio_tagging_loss=0.01075, over 4947338.88 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:51:57,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1510746.6666666667, ans=0.125 2023-12-24 04:51:59,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1510746.6666666667, ans=0.125 2023-12-24 04:52:05,304 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.69 vs. limit=6.0 2023-12-24 04:52:12,089 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.49 vs. limit=10.0 2023-12-24 04:52:27,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1510946.6666666667, ans=0.125 2023-12-24 04:52:31,284 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.53 vs. limit=15.0 2023-12-24 04:52:37,990 INFO [train.py:886] (1/4) Epoch 48, batch 2650, loss[loss=0.01148, audio_tagging_loss=0.01148, over 24750.00 frames. ], tot_loss[loss=0.01071, audio_tagging_loss=0.01071, over 4949568.49 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:52:38,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1511013.3333333333, ans=0.125 2023-12-24 04:53:01,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1511146.6666666667, ans=0.1 2023-12-24 04:53:27,885 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.34 vs. limit=6.0 2023-12-24 04:53:28,180 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.465e+01 3.977e+01 4.124e+01 4.274e+01 5.169e+01, threshold=8.248e+01, percent-clipped=0.0 2023-12-24 04:53:30,070 INFO [train.py:886] (1/4) Epoch 48, batch 2700, loss[loss=0.01153, audio_tagging_loss=0.01153, over 25000.00 frames. ], tot_loss[loss=0.01069, audio_tagging_loss=0.01069, over 4955626.36 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:53:36,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1511346.6666666667, ans=0.0 2023-12-24 04:53:41,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1511413.3333333333, ans=0.0 2023-12-24 04:53:44,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1511413.3333333333, ans=0.0 2023-12-24 04:53:51,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1511480.0, ans=0.125 2023-12-24 04:54:07,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1511546.6666666667, ans=0.125 2023-12-24 04:54:15,406 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.20 vs. limit=6.0 2023-12-24 04:54:18,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1511613.3333333333, ans=0.125 2023-12-24 04:54:20,666 INFO [train.py:886] (1/4) Epoch 48, batch 2750, loss[loss=0.01291, audio_tagging_loss=0.01291, over 25000.00 frames. ], tot_loss[loss=0.01067, audio_tagging_loss=0.01067, over 4958003.70 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:54:38,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1511746.6666666667, ans=0.125 2023-12-24 04:54:52,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1511880.0, ans=0.125 2023-12-24 04:54:56,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1511880.0, ans=0.125 2023-12-24 04:55:06,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.06 vs. limit=15.0 2023-12-24 04:55:07,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.71 vs. limit=15.0 2023-12-24 04:55:08,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1511946.6666666667, ans=0.2 2023-12-24 04:55:10,762 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.456e+01 3.953e+01 4.094e+01 4.279e+01 4.852e+01, threshold=8.188e+01, percent-clipped=0.0 2023-12-24 04:55:12,678 INFO [train.py:886] (1/4) Epoch 48, batch 2800, loss[loss=0.01045, audio_tagging_loss=0.01045, over 24750.00 frames. ], tot_loss[loss=0.01063, audio_tagging_loss=0.01063, over 4956234.51 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:55:24,437 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.47 vs. limit=10.0 2023-12-24 04:55:26,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1512080.0, ans=0.125 2023-12-24 04:55:33,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1512146.6666666667, ans=0.0 2023-12-24 04:55:47,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1512213.3333333333, ans=0.2 2023-12-24 04:55:55,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1512280.0, ans=0.1 2023-12-24 04:56:02,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1512280.0, ans=0.125 2023-12-24 04:56:04,335 INFO [train.py:886] (1/4) Epoch 48, batch 2850, loss[loss=0.01297, audio_tagging_loss=0.01297, over 24750.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4949345.37 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:56:04,910 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.84 vs. limit=15.0 2023-12-24 04:56:22,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1512413.3333333333, ans=0.125 2023-12-24 04:56:23,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1512480.0, ans=0.125 2023-12-24 04:56:25,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1512480.0, ans=0.125 2023-12-24 04:56:41,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1512546.6666666667, ans=0.1 2023-12-24 04:56:43,736 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.13 vs. limit=12.0 2023-12-24 04:56:50,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1512613.3333333333, ans=0.125 2023-12-24 04:56:53,476 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.637e+01 3.988e+01 4.154e+01 4.396e+01 6.475e+01, threshold=8.307e+01, percent-clipped=0.0 2023-12-24 04:56:53,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1512613.3333333333, ans=0.0 2023-12-24 04:56:55,366 INFO [train.py:886] (1/4) Epoch 48, batch 2900, loss[loss=0.01295, audio_tagging_loss=0.01295, over 25000.00 frames. ], tot_loss[loss=0.01075, audio_tagging_loss=0.01075, over 4946359.81 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:57:08,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1512746.6666666667, ans=0.2 2023-12-24 04:57:14,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1512746.6666666667, ans=0.1 2023-12-24 04:57:47,896 INFO [train.py:886] (1/4) Epoch 48, batch 2950, loss[loss=0.009291, audio_tagging_loss=0.009291, over 25000.00 frames. ], tot_loss[loss=0.01076, audio_tagging_loss=0.01076, over 4952565.07 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:58:09,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1513146.6666666667, ans=0.2 2023-12-24 04:58:28,887 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.58 vs. limit=15.0 2023-12-24 04:58:37,210 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.638e+01 3.916e+01 4.052e+01 4.286e+01 4.882e+01, threshold=8.104e+01, percent-clipped=0.0 2023-12-24 04:58:39,120 INFO [train.py:886] (1/4) Epoch 48, batch 3000, loss[loss=0.009482, audio_tagging_loss=0.009482, over 25000.00 frames. ], tot_loss[loss=0.01072, audio_tagging_loss=0.01072, over 4953609.91 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:58:39,121 INFO [train.py:909] (1/4) Computing validation loss 2023-12-24 04:59:00,487 INFO [train.py:917] (1/4) Epoch 48, validation: loss=0.03695, audio_tagging_loss=0.03695, over 3737520.00 frames. 2023-12-24 04:59:00,488 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-24 04:59:04,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.27 vs. limit=10.0 2023-12-24 04:59:05,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1513346.6666666667, ans=0.2 2023-12-24 04:59:16,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=1513413.3333333333, ans=15.0 2023-12-24 04:59:18,239 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.46 vs. limit=15.0 2023-12-24 04:59:30,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1513480.0, ans=0.2 2023-12-24 04:59:46,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1513613.3333333333, ans=0.125 2023-12-24 04:59:52,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1513680.0, ans=0.04949747468305833 2023-12-24 04:59:52,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=1513680.0, ans=6.0 2023-12-24 04:59:52,507 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.46 vs. limit=12.0 2023-12-24 04:59:52,585 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.32 vs. limit=12.0 2023-12-24 04:59:52,929 INFO [train.py:886] (1/4) Epoch 48, batch 3050, loss[loss=0.01067, audio_tagging_loss=0.01067, over 25000.00 frames. ], tot_loss[loss=0.01077, audio_tagging_loss=0.01077, over 4960958.63 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 05:00:12,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1513813.3333333333, ans=0.5 2023-12-24 05:00:33,106 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.24 vs. limit=15.0 2023-12-24 05:00:41,843 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.634e+01 4.038e+01 4.200e+01 4.350e+01 5.861e+01, threshold=8.401e+01, percent-clipped=0.0 2023-12-24 05:00:44,479 INFO [train.py:886] (1/4) Epoch 48, batch 3100, loss[loss=0.01005, audio_tagging_loss=0.01005, over 25000.00 frames. ], tot_loss[loss=0.01072, audio_tagging_loss=0.01072, over 4958573.27 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 05:01:17,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1514213.3333333333, ans=0.1 2023-12-24 05:01:20,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1514213.3333333333, ans=0.0 2023-12-24 05:01:20,362 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.78 vs. limit=12.0 2023-12-24 05:01:34,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.87 vs. limit=15.0 2023-12-24 05:01:36,065 INFO [train.py:886] (1/4) Epoch 48, batch 3150, loss[loss=0.00944, audio_tagging_loss=0.00944, over 24750.00 frames. ], tot_loss[loss=0.01075, audio_tagging_loss=0.01075, over 4957882.15 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 05:01:36,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1514346.6666666667, ans=0.2 2023-12-24 05:01:44,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1514346.6666666667, ans=0.0 2023-12-24 05:01:52,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1514413.3333333333, ans=0.125 2023-12-24 05:01:56,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1514480.0, ans=0.125 2023-12-24 05:01:56,379 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:02:22,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1514613.3333333333, ans=0.125 2023-12-24 05:02:25,638 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:02:26,563 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.591e+01 4.003e+01 4.208e+01 4.385e+01 5.175e+01, threshold=8.416e+01, percent-clipped=0.0 2023-12-24 05:02:26,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1514613.3333333333, ans=0.125 2023-12-24 05:02:28,486 INFO [train.py:886] (1/4) Epoch 48, batch 3200, loss[loss=0.0101, audio_tagging_loss=0.0101, over 24750.00 frames. ], tot_loss[loss=0.01078, audio_tagging_loss=0.01078, over 4955279.01 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 05:02:39,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.19 vs. limit=12.0 2023-12-24 05:02:43,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1514746.6666666667, ans=0.125 2023-12-24 05:02:52,802 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.89 vs. limit=22.5 2023-12-24 05:02:59,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1514880.0, ans=0.0 2023-12-24 05:03:00,024 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.55 vs. limit=15.0 2023-12-24 05:03:03,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1514880.0, ans=0.125 2023-12-24 05:03:06,139 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.69 vs. limit=15.0 2023-12-24 05:03:16,156 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.59 vs. limit=15.0 2023-12-24 05:03:20,324 INFO [train.py:886] (1/4) Epoch 48, batch 3250, loss[loss=0.01309, audio_tagging_loss=0.01309, over 24750.00 frames. ], tot_loss[loss=0.01075, audio_tagging_loss=0.01075, over 4958089.67 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 05:03:45,443 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.07 vs. limit=22.5 2023-12-24 05:03:51,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1515213.3333333333, ans=0.0 2023-12-24 05:03:55,776 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.00 vs. limit=22.5 2023-12-24 05:04:03,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1515280.0, ans=0.0 2023-12-24 05:04:07,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=1515280.0, ans=22.5 2023-12-24 05:04:10,062 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.551e+01 3.994e+01 4.173e+01 4.405e+01 4.939e+01, threshold=8.347e+01, percent-clipped=0.0 2023-12-24 05:04:12,017 INFO [train.py:886] (1/4) Epoch 48, batch 3300, loss[loss=0.008256, audio_tagging_loss=0.008256, over 24750.00 frames. ], tot_loss[loss=0.0106, audio_tagging_loss=0.0106, over 4955594.55 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 05:04:20,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1515346.6666666667, ans=0.0 2023-12-24 05:04:36,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1515480.0, ans=0.0 2023-12-24 05:04:52,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1515546.6666666667, ans=0.0 2023-12-24 05:04:54,533 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.16 vs. limit=10.0 2023-12-24 05:05:04,990 INFO [train.py:886] (1/4) Epoch 48, batch 3350, loss[loss=0.01313, audio_tagging_loss=0.01313, over 25000.00 frames. ], tot_loss[loss=0.01053, audio_tagging_loss=0.01053, over 4955009.86 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 05:05:08,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1515680.0, ans=0.0 2023-12-24 05:05:23,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1515746.6666666667, ans=0.125 2023-12-24 05:05:45,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1515946.6666666667, ans=0.125 2023-12-24 05:05:46,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=1515946.6666666667, ans=15.0 2023-12-24 05:05:54,536 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.602e+01 3.981e+01 4.138e+01 4.310e+01 5.201e+01, threshold=8.276e+01, percent-clipped=0.0 2023-12-24 05:05:56,184 INFO [train.py:886] (1/4) Epoch 48, batch 3400, loss[loss=0.008966, audio_tagging_loss=0.008966, over 25000.00 frames. ], tot_loss[loss=0.01058, audio_tagging_loss=0.01058, over 4958735.64 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 05:05:58,339 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.46 vs. limit=15.0 2023-12-24 05:06:00,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1516013.3333333333, ans=0.125 2023-12-24 05:06:03,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1516013.3333333333, ans=0.125 2023-12-24 05:06:12,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1516080.0, ans=0.125 2023-12-24 05:06:19,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1516146.6666666667, ans=0.1 2023-12-24 05:06:30,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1516213.3333333333, ans=0.125 2023-12-24 05:06:41,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1516280.0, ans=0.125 2023-12-24 05:06:46,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1516280.0, ans=0.0 2023-12-24 05:06:47,747 INFO [train.py:886] (1/4) Epoch 48, batch 3450, loss[loss=0.009827, audio_tagging_loss=0.009827, over 24750.00 frames. ], tot_loss[loss=0.01067, audio_tagging_loss=0.01067, over 4959806.27 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 05:06:47,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1516346.6666666667, ans=0.0 2023-12-24 05:06:48,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1516346.6666666667, ans=0.95 2023-12-24 05:06:56,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1516413.3333333333, ans=10.0 2023-12-24 05:07:02,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1516413.3333333333, ans=0.125 2023-12-24 05:07:13,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1516480.0, ans=0.035 2023-12-24 05:07:17,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1516546.6666666667, ans=0.2 2023-12-24 05:07:32,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1516613.3333333333, ans=0.2 2023-12-24 05:07:38,220 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.417e+01 4.055e+01 4.203e+01 4.420e+01 4.923e+01, threshold=8.407e+01, percent-clipped=0.0 2023-12-24 05:07:39,189 INFO [train.py:886] (1/4) Epoch 48, batch 3500, loss[loss=0.01145, audio_tagging_loss=0.01145, over 24750.00 frames. ], tot_loss[loss=0.01071, audio_tagging_loss=0.01071, over 4949786.69 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 05:07:43,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1516680.0, ans=0.2 2023-12-24 05:07:47,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1516680.0, ans=0.2 2023-12-24 05:08:00,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1516813.3333333333, ans=0.125 2023-12-24 05:08:01,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1516813.3333333333, ans=0.2 2023-12-24 05:08:30,199 INFO [train.py:886] (1/4) Epoch 48, batch 3550, loss[loss=0.01108, audio_tagging_loss=0.01108, over 25000.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4953113.73 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 05:08:31,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1517013.3333333333, ans=0.035 2023-12-24 05:08:32,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1517013.3333333333, ans=0.5 2023-12-24 05:08:42,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1517080.0, ans=0.0 2023-12-24 05:08:54,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1517146.6666666667, ans=0.2 2023-12-24 05:08:58,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1517146.6666666667, ans=0.1 2023-12-24 05:09:02,034 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.18 vs. limit=15.0 2023-12-24 05:09:22,409 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.601e+01 3.929e+01 4.166e+01 4.382e+01 5.169e+01, threshold=8.332e+01, percent-clipped=0.0 2023-12-24 05:09:23,414 INFO [train.py:886] (1/4) Epoch 48, batch 3600, loss[loss=0.01123, audio_tagging_loss=0.01123, over 25000.00 frames. ], tot_loss[loss=0.01053, audio_tagging_loss=0.01053, over 4956741.88 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 05:09:23,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1517346.6666666667, ans=0.0 2023-12-24 05:09:27,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1517346.6666666667, ans=0.0 2023-12-24 05:09:44,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1517480.0, ans=0.0 2023-12-24 05:09:53,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1517546.6666666667, ans=0.035 2023-12-24 05:09:59,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1517546.6666666667, ans=0.09899494936611666 2023-12-24 05:10:06,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1517613.3333333333, ans=0.1 2023-12-24 05:10:14,237 INFO [train.py:886] (1/4) Epoch 48, batch 3650, loss[loss=0.01115, audio_tagging_loss=0.01115, over 25000.00 frames. ], tot_loss[loss=0.0105, audio_tagging_loss=0.0105, over 4951367.48 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 05:10:27,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.16 vs. limit=15.0 2023-12-24 05:10:34,820 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.81 vs. limit=10.0 2023-12-24 05:10:38,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1517813.3333333333, ans=0.125 2023-12-24 05:10:46,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1517880.0, ans=0.125 2023-12-24 05:10:53,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1517880.0, ans=0.125 2023-12-24 05:10:59,964 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.23 vs. limit=15.0 2023-12-24 05:11:05,044 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.632e+01 4.020e+01 4.210e+01 4.432e+01 5.110e+01, threshold=8.421e+01, percent-clipped=0.0 2023-12-24 05:11:06,035 INFO [train.py:886] (1/4) Epoch 48, batch 3700, loss[loss=0.01009, audio_tagging_loss=0.01009, over 25000.00 frames. ], tot_loss[loss=0.01061, audio_tagging_loss=0.01061, over 4951915.30 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:11:14,143 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.11 vs. limit=15.0 2023-12-24 05:11:28,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=1518146.6666666667, ans=0.95 2023-12-24 05:11:35,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1518146.6666666667, ans=0.0 2023-12-24 05:11:40,231 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.58 vs. limit=15.0 2023-12-24 05:11:51,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1518280.0, ans=0.0 2023-12-24 05:11:56,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1518280.0, ans=0.125 2023-12-24 05:11:58,849 INFO [train.py:886] (1/4) Epoch 48, batch 3750, loss[loss=0.008934, audio_tagging_loss=0.008934, over 21842.00 frames. ], tot_loss[loss=0.01066, audio_tagging_loss=0.01066, over 4949234.95 frames. ], batch size: 107, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:12:00,336 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.29 vs. limit=15.0 2023-12-24 05:12:04,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1518346.6666666667, ans=0.125 2023-12-24 05:12:34,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1518546.6666666667, ans=0.125 2023-12-24 05:12:36,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1518546.6666666667, ans=0.0 2023-12-24 05:12:49,909 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.681e+01 4.049e+01 4.266e+01 4.434e+01 5.925e+01, threshold=8.531e+01, percent-clipped=0.0 2023-12-24 05:12:50,904 INFO [train.py:886] (1/4) Epoch 48, batch 3800, loss[loss=0.01071, audio_tagging_loss=0.01071, over 24750.00 frames. ], tot_loss[loss=0.01075, audio_tagging_loss=0.01075, over 4946597.69 frames. ], batch size: 99, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:12:51,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1518680.0, ans=0.0 2023-12-24 05:12:54,078 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.21 vs. limit=15.0 2023-12-24 05:12:56,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=15.0 2023-12-24 05:13:30,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1518880.0, ans=0.2 2023-12-24 05:13:40,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1518946.6666666667, ans=0.2 2023-12-24 05:13:43,509 INFO [train.py:886] (1/4) Epoch 48, batch 3850, loss[loss=0.009671, audio_tagging_loss=0.009671, over 22105.00 frames. ], tot_loss[loss=0.01068, audio_tagging_loss=0.01068, over 4943097.61 frames. ], batch size: 107, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:13:54,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1519080.0, ans=0.0 2023-12-24 05:13:55,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1519080.0, ans=0.0 2023-12-24 05:14:05,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1519146.6666666667, ans=0.1 2023-12-24 05:14:05,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1519146.6666666667, ans=0.0 2023-12-24 05:14:15,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1519213.3333333333, ans=0.125 2023-12-24 05:14:33,476 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.582e+01 4.007e+01 4.136e+01 4.360e+01 5.120e+01, threshold=8.272e+01, percent-clipped=0.0 2023-12-24 05:14:35,546 INFO [train.py:886] (1/4) Epoch 48, batch 3900, loss[loss=0.01082, audio_tagging_loss=0.01082, over 24922.00 frames. ], tot_loss[loss=0.01063, audio_tagging_loss=0.01063, over 4941062.60 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:14:44,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1519346.6666666667, ans=0.125 2023-12-24 05:14:46,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1519413.3333333333, ans=0.125 2023-12-24 05:14:49,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1519413.3333333333, ans=0.2 2023-12-24 05:14:50,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1519413.3333333333, ans=0.0 2023-12-24 05:14:56,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1519480.0, ans=0.125 2023-12-24 05:14:58,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1519480.0, ans=0.125 2023-12-24 05:15:06,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1519546.6666666667, ans=0.125 2023-12-24 05:15:11,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1519546.6666666667, ans=0.125 2023-12-24 05:15:27,384 INFO [train.py:886] (1/4) Epoch 48, batch 3950, loss[loss=0.009968, audio_tagging_loss=0.009968, over 25000.00 frames. ], tot_loss[loss=0.01054, audio_tagging_loss=0.01054, over 4943328.49 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:15:29,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1519680.0, ans=0.5 2023-12-24 05:15:46,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=1519746.6666666667, ans=15.0 2023-12-24 05:15:56,381 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.52 vs. limit=15.0 2023-12-24 05:15:59,249 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2023-12-24 05:16:20,888 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.458e+01 4.031e+01 4.185e+01 4.396e+01 5.576e+01, threshold=8.370e+01, percent-clipped=0.0 2023-12-24 05:16:21,854 INFO [train.py:886] (1/4) Epoch 48, batch 4000, loss[loss=0.01281, audio_tagging_loss=0.01281, over 25000.00 frames. ], tot_loss[loss=0.01055, audio_tagging_loss=0.01055, over 4943486.16 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:16:42,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1520146.6666666667, ans=0.2 2023-12-24 05:16:43,065 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.62 vs. limit=15.0 2023-12-24 05:16:49,515 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.34 vs. limit=22.5 2023-12-24 05:17:13,345 INFO [train.py:886] (1/4) Epoch 48, batch 4050, loss[loss=0.01015, audio_tagging_loss=0.01015, over 24750.00 frames. ], tot_loss[loss=0.01066, audio_tagging_loss=0.01066, over 4948672.42 frames. ], batch size: 99, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:17:18,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1520346.6666666667, ans=0.125 2023-12-24 05:17:36,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1520480.0, ans=0.125 2023-12-24 05:17:39,514 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=15.0 2023-12-24 05:17:45,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1520546.6666666667, ans=0.1 2023-12-24 05:18:02,141 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:18:05,529 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.431e+01 4.000e+01 4.154e+01 4.324e+01 5.032e+01, threshold=8.309e+01, percent-clipped=0.0 2023-12-24 05:18:06,510 INFO [train.py:886] (1/4) Epoch 48, batch 4100, loss[loss=0.01097, audio_tagging_loss=0.01097, over 24750.00 frames. ], tot_loss[loss=0.01072, audio_tagging_loss=0.01072, over 4943320.54 frames. ], batch size: 99, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:18:17,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1520746.6666666667, ans=0.125 2023-12-24 05:18:24,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1520746.6666666667, ans=0.09899494936611666 2023-12-24 05:18:40,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1520880.0, ans=0.0 2023-12-24 05:18:49,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1520946.6666666667, ans=0.125 2023-12-24 05:18:58,355 INFO [train.py:886] (1/4) Epoch 48, batch 4150, loss[loss=0.007772, audio_tagging_loss=0.007772, over 24750.00 frames. ], tot_loss[loss=0.01069, audio_tagging_loss=0.01069, over 4939743.51 frames. ], batch size: 99, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:19:03,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1521013.3333333333, ans=0.0 2023-12-24 05:19:07,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1521013.3333333333, ans=0.1 2023-12-24 05:19:12,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1521080.0, ans=0.125 2023-12-24 05:19:12,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1521080.0, ans=0.125 2023-12-24 05:19:15,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1521080.0, ans=0.125 2023-12-24 05:19:36,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1521213.3333333333, ans=0.1 2023-12-24 05:19:49,396 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.487e+01 4.024e+01 4.153e+01 4.360e+01 4.986e+01, threshold=8.306e+01, percent-clipped=0.0 2023-12-24 05:19:50,388 INFO [train.py:886] (1/4) Epoch 48, batch 4200, loss[loss=0.009333, audio_tagging_loss=0.009333, over 24750.00 frames. ], tot_loss[loss=0.01065, audio_tagging_loss=0.01065, over 4939627.90 frames. ], batch size: 99, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:20:01,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1521413.3333333333, ans=0.05 2023-12-24 05:20:11,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1521480.0, ans=0.2 2023-12-24 05:20:27,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1521546.6666666667, ans=0.125 2023-12-24 05:20:40,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1521613.3333333333, ans=0.2 2023-12-24 05:20:43,081 INFO [train.py:886] (1/4) Epoch 48, batch 4250, loss[loss=0.01066, audio_tagging_loss=0.01066, over 23983.00 frames. ], tot_loss[loss=0.01062, audio_tagging_loss=0.01062, over 4944953.99 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:20:57,553 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=15.0 2023-12-24 05:21:18,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1521880.0, ans=0.125 2023-12-24 05:21:31,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1521946.6666666667, ans=0.125 2023-12-24 05:21:33,912 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.572e+01 3.984e+01 4.154e+01 4.315e+01 4.731e+01, threshold=8.307e+01, percent-clipped=0.0 2023-12-24 05:21:34,914 INFO [train.py:886] (1/4) Epoch 48, batch 4300, loss[loss=0.01057, audio_tagging_loss=0.01057, over 24750.00 frames. ], tot_loss[loss=0.01054, audio_tagging_loss=0.01054, over 4946201.47 frames. ], batch size: 99, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:21:41,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1522013.3333333333, ans=0.1 2023-12-24 05:21:50,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1522080.0, ans=0.1 2023-12-24 05:21:50,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1522080.0, ans=0.0 2023-12-24 05:21:52,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1522080.0, ans=0.125 2023-12-24 05:21:56,808 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.44 vs. limit=15.0 2023-12-24 05:21:59,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1522146.6666666667, ans=0.0 2023-12-24 05:22:04,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1522146.6666666667, ans=0.125 2023-12-24 05:22:09,678 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.06 vs. limit=15.0 2023-12-24 05:22:26,763 INFO [train.py:886] (1/4) Epoch 48, batch 4350, loss[loss=0.01062, audio_tagging_loss=0.01062, over 24750.00 frames. ], tot_loss[loss=0.01058, audio_tagging_loss=0.01058, over 4954365.63 frames. ], batch size: 99, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:22:36,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1522413.3333333333, ans=0.04949747468305833 2023-12-24 05:22:37,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1522413.3333333333, ans=0.0 2023-12-24 05:22:52,742 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.05 vs. limit=15.0 2023-12-24 05:23:02,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1522546.6666666667, ans=0.07 2023-12-24 05:23:06,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1522546.6666666667, ans=0.0 2023-12-24 05:23:07,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1522613.3333333333, ans=0.0 2023-12-24 05:23:18,154 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.751e+01 4.021e+01 4.163e+01 4.358e+01 4.850e+01, threshold=8.326e+01, percent-clipped=0.0 2023-12-24 05:23:19,141 INFO [train.py:886] (1/4) Epoch 48, batch 4400, loss[loss=0.0127, audio_tagging_loss=0.0127, over 24750.00 frames. ], tot_loss[loss=0.01071, audio_tagging_loss=0.01071, over 4951665.59 frames. ], batch size: 99, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:23:19,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1522680.0, ans=0.125 2023-12-24 05:23:32,337 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:23:43,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1522813.3333333333, ans=10.0 2023-12-24 05:23:59,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1522946.6666666667, ans=0.125 2023-12-24 05:24:10,773 INFO [train.py:886] (1/4) Epoch 48, batch 4450, loss[loss=0.01114, audio_tagging_loss=0.01114, over 24750.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4948462.90 frames. ], batch size: 99, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:24:31,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1523146.6666666667, ans=0.125 2023-12-24 05:24:32,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1523146.6666666667, ans=0.125 2023-12-24 05:24:37,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1523146.6666666667, ans=0.1 2023-12-24 05:24:42,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1523213.3333333333, ans=0.125 2023-12-24 05:24:47,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1523213.3333333333, ans=0.1 2023-12-24 05:25:00,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1523280.0, ans=0.0 2023-12-24 05:25:01,880 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.573e+01 4.053e+01 4.196e+01 4.337e+01 4.887e+01, threshold=8.393e+01, percent-clipped=0.0 2023-12-24 05:25:02,871 INFO [train.py:886] (1/4) Epoch 48, batch 4500, loss[loss=0.013, audio_tagging_loss=0.013, over 25000.00 frames. ], tot_loss[loss=0.01067, audio_tagging_loss=0.01067, over 4950336.24 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:25:09,040 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.77 vs. limit=15.0 2023-12-24 05:25:09,995 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.16 vs. limit=15.0 2023-12-24 05:25:23,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1523480.0, ans=0.125 2023-12-24 05:25:26,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1523480.0, ans=0.125 2023-12-24 05:25:36,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1523546.6666666667, ans=0.1 2023-12-24 05:25:38,362 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.40 vs. limit=15.0 2023-12-24 05:25:50,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1523613.3333333333, ans=0.125 2023-12-24 05:25:53,562 INFO [train.py:886] (1/4) Epoch 48, batch 4550, loss[loss=0.01061, audio_tagging_loss=0.01061, over 24750.00 frames. ], tot_loss[loss=0.0107, audio_tagging_loss=0.0107, over 4947059.31 frames. ], batch size: 99, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:26:02,153 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.14 vs. limit=10.0 2023-12-24 05:26:10,598 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:26:12,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1523746.6666666667, ans=0.0 2023-12-24 05:26:30,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1523880.0, ans=0.05 2023-12-24 05:26:44,558 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.666e+01 4.022e+01 4.224e+01 4.415e+01 4.872e+01, threshold=8.448e+01, percent-clipped=0.0 2023-12-24 05:26:45,543 INFO [train.py:886] (1/4) Epoch 48, batch 4600, loss[loss=0.01, audio_tagging_loss=0.01, over 25000.00 frames. ], tot_loss[loss=0.01067, audio_tagging_loss=0.01067, over 4956384.14 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:26:45,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1524013.3333333333, ans=0.2 2023-12-24 05:26:45,767 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:26:57,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1524080.0, ans=0.0 2023-12-24 05:26:59,331 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:27:00,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1524080.0, ans=0.0 2023-12-24 05:27:02,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1524080.0, ans=0.125 2023-12-24 05:27:09,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1524146.6666666667, ans=0.07 2023-12-24 05:27:38,195 INFO [train.py:886] (1/4) Epoch 48, batch 4650, loss[loss=0.009704, audio_tagging_loss=0.009704, over 25000.00 frames. ], tot_loss[loss=0.01067, audio_tagging_loss=0.01067, over 4960050.17 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:27:42,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1524346.6666666667, ans=0.2 2023-12-24 05:28:09,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1524546.6666666667, ans=0.125 2023-12-24 05:28:10,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1524546.6666666667, ans=0.125 2023-12-24 05:28:13,664 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.50 vs. limit=12.0 2023-12-24 05:28:24,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1524613.3333333333, ans=0.125 2023-12-24 05:28:26,938 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.730e+01 4.056e+01 4.273e+01 4.476e+01 5.742e+01, threshold=8.545e+01, percent-clipped=0.0 2023-12-24 05:28:27,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1524680.0, ans=0.125 2023-12-24 05:28:27,900 INFO [train.py:886] (1/4) Epoch 48, batch 4700, loss[loss=0.01061, audio_tagging_loss=0.01061, over 24750.00 frames. ], tot_loss[loss=0.01072, audio_tagging_loss=0.01072, over 4957255.10 frames. ], batch size: 99, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:28:30,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1524680.0, ans=0.1 2023-12-24 05:28:37,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1524746.6666666667, ans=0.125 2023-12-24 05:28:43,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.08 vs. limit=22.5 2023-12-24 05:28:45,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1524746.6666666667, ans=0.0 2023-12-24 05:29:09,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1524946.6666666667, ans=0.125 2023-12-24 05:29:11,457 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.90 vs. limit=10.0 2023-12-24 05:29:15,580 INFO [train.py:886] (1/4) Epoch 48, batch 4750, loss[loss=0.01246, audio_tagging_loss=0.01246, over 24750.00 frames. ], tot_loss[loss=0.01082, audio_tagging_loss=0.01082, over 4950865.39 frames. ], batch size: 99, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:29:20,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1525013.3333333333, ans=0.125 2023-12-24 05:29:21,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1525013.3333333333, ans=0.125 2023-12-24 05:29:28,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1525080.0, ans=0.0 2023-12-24 05:29:50,893 INFO [train.py:886] (1/4) Epoch 49, batch 0, loss[loss=0.0256, audio_tagging_loss=0.0256, over 24024.00 frames. ], tot_loss[loss=0.0256, audio_tagging_loss=0.0256, over 24024.00 frames. ], batch size: 100, lr: 2.20e-03, grad_scale: 32.0 2023-12-24 05:29:50,894 INFO [train.py:909] (1/4) Computing validation loss 2023-12-24 05:30:12,075 INFO [train.py:917] (1/4) Epoch 49, validation: loss=0.03671, audio_tagging_loss=0.03671, over 3737520.00 frames. 2023-12-24 05:30:12,076 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-24 05:30:17,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1525120.0, ans=0.125 2023-12-24 05:30:23,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1525186.6666666667, ans=0.1 2023-12-24 05:30:44,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1525320.0, ans=0.2 2023-12-24 05:30:47,304 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.515e+01 4.166e+01 4.419e+01 5.711e+01 1.124e+02, threshold=8.838e+01, percent-clipped=6.0 2023-12-24 05:30:53,358 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.81 vs. limit=15.0 2023-12-24 05:31:03,271 INFO [train.py:886] (1/4) Epoch 49, batch 50, loss[loss=0.01305, audio_tagging_loss=0.01305, over 25000.00 frames. ], tot_loss[loss=0.01722, audio_tagging_loss=0.01722, over 1116958.07 frames. ], batch size: 100, lr: 2.20e-03, grad_scale: 16.0 2023-12-24 05:31:20,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1525520.0, ans=0.0 2023-12-24 05:31:24,306 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.42 vs. limit=12.0 2023-12-24 05:31:33,367 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.25 vs. limit=15.0 2023-12-24 05:31:51,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1525720.0, ans=0.125 2023-12-24 05:31:54,150 INFO [train.py:886] (1/4) Epoch 49, batch 100, loss[loss=0.0118, audio_tagging_loss=0.0118, over 25000.00 frames. ], tot_loss[loss=0.01506, audio_tagging_loss=0.01506, over 1968938.89 frames. ], batch size: 100, lr: 2.20e-03, grad_scale: 16.0 2023-12-24 05:31:54,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1525786.6666666667, ans=0.05 2023-12-24 05:31:54,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1525786.6666666667, ans=0.125 2023-12-24 05:31:59,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1525786.6666666667, ans=0.09899494936611666 2023-12-24 05:32:01,873 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.77 vs. limit=15.0 2023-12-24 05:32:16,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1525920.0, ans=0.125 2023-12-24 05:32:28,985 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.922e+01 4.393e+01 4.587e+01 4.980e+01 5.717e+01, threshold=9.174e+01, percent-clipped=0.0 2023-12-24 05:32:38,885 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.58 vs. limit=12.0 2023-12-24 05:32:44,824 INFO [train.py:886] (1/4) Epoch 49, batch 150, loss[loss=0.01399, audio_tagging_loss=0.01399, over 24750.00 frames. ], tot_loss[loss=0.01357, audio_tagging_loss=0.01357, over 2632249.54 frames. ], batch size: 99, lr: 2.20e-03, grad_scale: 16.0 2023-12-24 05:33:04,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1526253.3333333333, ans=0.2 2023-12-24 05:33:30,414 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.34 vs. limit=15.0 2023-12-24 05:33:32,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1526386.6666666667, ans=0.0 2023-12-24 05:33:37,112 INFO [train.py:886] (1/4) Epoch 49, batch 200, loss[loss=0.009367, audio_tagging_loss=0.009367, over 25000.00 frames. ], tot_loss[loss=0.01267, audio_tagging_loss=0.01267, over 3147725.05 frames. ], batch size: 100, lr: 2.20e-03, grad_scale: 16.0 2023-12-24 05:33:39,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1526453.3333333333, ans=0.125 2023-12-24 05:33:39,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=15.0 2023-12-24 05:34:12,029 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.501e+01 4.050e+01 4.227e+01 4.381e+01 4.955e+01, threshold=8.454e+01, percent-clipped=0.0 2023-12-24 05:34:12,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1526653.3333333333, ans=0.125 2023-12-24 05:34:18,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1526720.0, ans=0.125 2023-12-24 05:34:28,165 INFO [train.py:886] (1/4) Epoch 49, batch 250, loss[loss=0.01236, audio_tagging_loss=0.01236, over 24940.00 frames. ], tot_loss[loss=0.01206, audio_tagging_loss=0.01206, over 3548624.55 frames. ], batch size: 100, lr: 2.20e-03, grad_scale: 16.0 2023-12-24 05:34:45,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1526853.3333333333, ans=0.125 2023-12-24 05:34:49,747 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:34:55,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1526920.0, ans=0.125 2023-12-24 05:35:19,240 INFO [train.py:886] (1/4) Epoch 49, batch 300, loss[loss=0.01263, audio_tagging_loss=0.01263, over 24750.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 3862148.06 frames. ], batch size: 99, lr: 2.20e-03, grad_scale: 16.0 2023-12-24 05:35:38,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.93 vs. limit=15.0 2023-12-24 05:35:43,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.77 vs. limit=15.0 2023-12-24 05:35:53,619 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.671e+01 4.036e+01 4.211e+01 4.378e+01 5.277e+01, threshold=8.421e+01, percent-clipped=0.0 2023-12-24 05:36:03,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1527386.6666666667, ans=0.1 2023-12-24 05:36:10,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1527453.3333333333, ans=0.1 2023-12-24 05:36:10,943 INFO [train.py:886] (1/4) Epoch 49, batch 350, loss[loss=0.01281, audio_tagging_loss=0.01281, over 24750.00 frames. ], tot_loss[loss=0.01156, audio_tagging_loss=0.01156, over 4093324.81 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 16.0 2023-12-24 05:36:15,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1527453.3333333333, ans=0.95 2023-12-24 05:36:19,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1527453.3333333333, ans=0.125 2023-12-24 05:36:23,608 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.01 vs. limit=10.0 2023-12-24 05:36:26,324 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.76 vs. limit=15.0 2023-12-24 05:36:52,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1527720.0, ans=0.2 2023-12-24 05:37:01,191 INFO [train.py:886] (1/4) Epoch 49, batch 400, loss[loss=0.01196, audio_tagging_loss=0.01196, over 24750.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4282764.84 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:37:24,739 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2023-12-24 05:37:34,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1527986.6666666667, ans=0.125 2023-12-24 05:37:35,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1527986.6666666667, ans=0.2 2023-12-24 05:37:36,419 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.568e+01 3.880e+01 4.116e+01 4.327e+01 4.784e+01, threshold=8.231e+01, percent-clipped=0.0 2023-12-24 05:37:40,624 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.77 vs. limit=12.0 2023-12-24 05:37:46,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1528053.3333333333, ans=0.125 2023-12-24 05:37:53,100 INFO [train.py:886] (1/4) Epoch 49, batch 450, loss[loss=0.01238, audio_tagging_loss=0.01238, over 21299.00 frames. ], tot_loss[loss=0.01111, audio_tagging_loss=0.01111, over 4420310.09 frames. ], batch size: 107, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:38:03,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1528186.6666666667, ans=0.125 2023-12-24 05:38:10,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1528186.6666666667, ans=0.1 2023-12-24 05:38:17,507 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.46 vs. limit=15.0 2023-12-24 05:38:23,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1528320.0, ans=0.125 2023-12-24 05:38:29,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1528320.0, ans=0.125 2023-12-24 05:38:31,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1528320.0, ans=0.125 2023-12-24 05:38:33,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1528386.6666666667, ans=0.125 2023-12-24 05:38:44,058 INFO [train.py:886] (1/4) Epoch 49, batch 500, loss[loss=0.01244, audio_tagging_loss=0.01244, over 25000.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4539485.44 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:38:45,518 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.58 vs. limit=15.0 2023-12-24 05:38:56,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1528520.0, ans=0.0 2023-12-24 05:38:59,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1528520.0, ans=0.125 2023-12-24 05:39:04,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1528586.6666666667, ans=0.1 2023-12-24 05:39:05,065 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.34 vs. limit=15.0 2023-12-24 05:39:16,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1528653.3333333333, ans=0.04949747468305833 2023-12-24 05:39:16,890 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.77 vs. limit=12.0 2023-12-24 05:39:18,320 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.604e+01 3.968e+01 4.140e+01 4.300e+01 5.100e+01, threshold=8.280e+01, percent-clipped=0.0 2023-12-24 05:39:24,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1528720.0, ans=0.1 2023-12-24 05:39:33,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1528720.0, ans=10.0 2023-12-24 05:39:33,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1528720.0, ans=0.125 2023-12-24 05:39:34,831 INFO [train.py:886] (1/4) Epoch 49, batch 550, loss[loss=0.009441, audio_tagging_loss=0.009441, over 25000.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4638136.12 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:39:49,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1528853.3333333333, ans=0.2 2023-12-24 05:39:54,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1528920.0, ans=0.1 2023-12-24 05:40:06,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1528986.6666666667, ans=0.1 2023-12-24 05:40:08,435 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:40:09,717 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.13 vs. limit=22.5 2023-12-24 05:40:17,045 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.20 vs. limit=6.0 2023-12-24 05:40:25,957 INFO [train.py:886] (1/4) Epoch 49, batch 600, loss[loss=0.01235, audio_tagging_loss=0.01235, over 24750.00 frames. ], tot_loss[loss=0.01096, audio_tagging_loss=0.01096, over 4711098.89 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:40:43,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1529186.6666666667, ans=0.125 2023-12-24 05:40:53,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1529253.3333333333, ans=0.125 2023-12-24 05:40:59,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1529320.0, ans=0.07 2023-12-24 05:41:01,068 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.737e+01 4.021e+01 4.199e+01 4.428e+01 6.437e+01, threshold=8.399e+01, percent-clipped=0.0 2023-12-24 05:41:05,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1529320.0, ans=0.125 2023-12-24 05:41:14,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1529386.6666666667, ans=0.125 2023-12-24 05:41:18,329 INFO [train.py:886] (1/4) Epoch 49, batch 650, loss[loss=0.01306, audio_tagging_loss=0.01306, over 24750.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4755424.30 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:41:36,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1529520.0, ans=0.125 2023-12-24 05:41:43,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1529586.6666666667, ans=0.0 2023-12-24 05:41:48,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1529653.3333333333, ans=0.125 2023-12-24 05:41:56,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1529653.3333333333, ans=0.125 2023-12-24 05:42:09,763 INFO [train.py:886] (1/4) Epoch 49, batch 700, loss[loss=0.01025, audio_tagging_loss=0.01025, over 22269.00 frames. ], tot_loss[loss=0.01091, audio_tagging_loss=0.01091, over 4794955.89 frames. ], batch size: 107, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:42:16,854 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.17 vs. limit=15.0 2023-12-24 05:42:28,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=1529853.3333333333, ans=0.025 2023-12-24 05:42:28,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1529853.3333333333, ans=0.0 2023-12-24 05:42:42,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.03 vs. limit=15.0 2023-12-24 05:42:44,609 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.582e+01 3.993e+01 4.206e+01 4.464e+01 5.068e+01, threshold=8.413e+01, percent-clipped=0.0 2023-12-24 05:42:48,870 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.90 vs. limit=15.0 2023-12-24 05:42:52,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1530053.3333333333, ans=0.0 2023-12-24 05:42:56,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1530053.3333333333, ans=0.1 2023-12-24 05:43:00,366 INFO [train.py:886] (1/4) Epoch 49, batch 750, loss[loss=0.0102, audio_tagging_loss=0.0102, over 25000.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4825313.17 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:43:19,946 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.64 vs. limit=10.0 2023-12-24 05:43:29,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1530253.3333333333, ans=0.1 2023-12-24 05:43:52,130 INFO [train.py:886] (1/4) Epoch 49, batch 800, loss[loss=0.01316, audio_tagging_loss=0.01316, over 25000.00 frames. ], tot_loss[loss=0.01073, audio_tagging_loss=0.01073, over 4853959.91 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:43:57,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1530453.3333333333, ans=0.1 2023-12-24 05:44:16,720 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.54 vs. limit=12.0 2023-12-24 05:44:26,018 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.11 vs. limit=15.0 2023-12-24 05:44:27,101 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.579e+01 3.966e+01 4.159e+01 4.355e+01 5.694e+01, threshold=8.318e+01, percent-clipped=0.0 2023-12-24 05:44:34,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=1530720.0, ans=0.5 2023-12-24 05:44:36,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1530720.0, ans=0.125 2023-12-24 05:44:43,597 INFO [train.py:886] (1/4) Epoch 49, batch 850, loss[loss=0.0113, audio_tagging_loss=0.0113, over 24750.00 frames. ], tot_loss[loss=0.01059, audio_tagging_loss=0.01059, over 4882964.81 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:44:44,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1530786.6666666667, ans=0.125 2023-12-24 05:44:50,532 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.69 vs. limit=15.0 2023-12-24 05:45:16,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1530986.6666666667, ans=0.125 2023-12-24 05:45:17,333 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.46 vs. limit=22.5 2023-12-24 05:45:18,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1530986.6666666667, ans=0.0 2023-12-24 05:45:19,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1530986.6666666667, ans=0.0 2023-12-24 05:45:25,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1531053.3333333333, ans=0.1 2023-12-24 05:45:34,378 INFO [train.py:886] (1/4) Epoch 49, batch 900, loss[loss=0.01167, audio_tagging_loss=0.01167, over 24750.00 frames. ], tot_loss[loss=0.01071, audio_tagging_loss=0.01071, over 4903038.54 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:45:34,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1531120.0, ans=0.0 2023-12-24 05:45:50,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1531186.6666666667, ans=0.125 2023-12-24 05:46:00,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.97 vs. limit=15.0 2023-12-24 05:46:00,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1531253.3333333333, ans=0.1 2023-12-24 05:46:06,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1531320.0, ans=0.125 2023-12-24 05:46:08,730 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.704e+01 4.050e+01 4.200e+01 4.404e+01 5.257e+01, threshold=8.399e+01, percent-clipped=0.0 2023-12-24 05:46:16,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1531386.6666666667, ans=0.125 2023-12-24 05:46:22,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.49 vs. limit=22.5 2023-12-24 05:46:24,532 INFO [train.py:886] (1/4) Epoch 49, batch 950, loss[loss=0.01053, audio_tagging_loss=0.01053, over 25000.00 frames. ], tot_loss[loss=0.01085, audio_tagging_loss=0.01085, over 4914456.46 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:46:35,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1531520.0, ans=0.125 2023-12-24 05:46:41,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1531520.0, ans=0.2 2023-12-24 05:46:42,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1531520.0, ans=0.125 2023-12-24 05:46:58,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1531653.3333333333, ans=0.125 2023-12-24 05:47:11,004 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:47:14,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1531720.0, ans=0.125 2023-12-24 05:47:17,018 INFO [train.py:886] (1/4) Epoch 49, batch 1000, loss[loss=0.01145, audio_tagging_loss=0.01145, over 24750.00 frames. ], tot_loss[loss=0.01087, audio_tagging_loss=0.01087, over 4914477.00 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:47:21,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.31 vs. limit=6.0 2023-12-24 05:47:30,632 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2023-12-24 05:47:48,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1531986.6666666667, ans=0.05 2023-12-24 05:47:50,841 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.669e+01 3.990e+01 4.143e+01 4.294e+01 8.017e+01, threshold=8.286e+01, percent-clipped=0.0 2023-12-24 05:47:51,310 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.77 vs. limit=15.0 2023-12-24 05:48:01,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1532053.3333333333, ans=0.125 2023-12-24 05:48:06,661 INFO [train.py:886] (1/4) Epoch 49, batch 1050, loss[loss=0.01066, audio_tagging_loss=0.01066, over 25000.00 frames. ], tot_loss[loss=0.0107, audio_tagging_loss=0.0107, over 4919871.93 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:48:08,503 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:48:41,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1532320.0, ans=0.0 2023-12-24 05:48:41,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1532320.0, ans=0.2 2023-12-24 05:48:44,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1532320.0, ans=0.125 2023-12-24 05:48:48,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1532386.6666666667, ans=0.95 2023-12-24 05:48:50,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1532386.6666666667, ans=0.0 2023-12-24 05:48:55,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1532386.6666666667, ans=0.125 2023-12-24 05:48:58,207 INFO [train.py:886] (1/4) Epoch 49, batch 1100, loss[loss=0.009752, audio_tagging_loss=0.009752, over 24750.00 frames. ], tot_loss[loss=0.01067, audio_tagging_loss=0.01067, over 4927683.50 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:49:15,154 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.68 vs. limit=15.0 2023-12-24 05:49:16,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1532520.0, ans=0.95 2023-12-24 05:49:20,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1532586.6666666667, ans=0.125 2023-12-24 05:49:33,437 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.655e+01 3.973e+01 4.201e+01 4.376e+01 5.038e+01, threshold=8.402e+01, percent-clipped=0.0 2023-12-24 05:49:35,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1532653.3333333333, ans=0.125 2023-12-24 05:49:38,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1532720.0, ans=0.125 2023-12-24 05:49:50,814 INFO [train.py:886] (1/4) Epoch 49, batch 1150, loss[loss=0.01012, audio_tagging_loss=0.01012, over 25000.00 frames. ], tot_loss[loss=0.01058, audio_tagging_loss=0.01058, over 4935291.14 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:50:10,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1532920.0, ans=0.125 2023-12-24 05:50:20,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1532920.0, ans=0.0 2023-12-24 05:50:42,340 INFO [train.py:886] (1/4) Epoch 49, batch 1200, loss[loss=0.01013, audio_tagging_loss=0.01013, over 25000.00 frames. ], tot_loss[loss=0.01065, audio_tagging_loss=0.01065, over 4941300.56 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:50:48,608 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.62 vs. limit=22.5 2023-12-24 05:50:49,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1533120.0, ans=0.1 2023-12-24 05:50:57,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1533186.6666666667, ans=0.125 2023-12-24 05:50:58,369 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:51:16,884 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.561e+01 4.054e+01 4.177e+01 4.431e+01 4.907e+01, threshold=8.353e+01, percent-clipped=0.0 2023-12-24 05:51:22,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1533320.0, ans=0.0 2023-12-24 05:51:24,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1533386.6666666667, ans=0.0 2023-12-24 05:51:33,654 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.32 vs. limit=22.5 2023-12-24 05:51:34,932 INFO [train.py:886] (1/4) Epoch 49, batch 1250, loss[loss=0.00992, audio_tagging_loss=0.00992, over 24750.00 frames. ], tot_loss[loss=0.01072, audio_tagging_loss=0.01072, over 4934067.40 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:51:36,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1533453.3333333333, ans=0.0 2023-12-24 05:51:37,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1533453.3333333333, ans=0.1 2023-12-24 05:51:38,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1533453.3333333333, ans=0.125 2023-12-24 05:51:40,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.03 vs. limit=15.0 2023-12-24 05:51:41,154 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.75 vs. limit=6.0 2023-12-24 05:51:43,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=1533520.0, ans=0.02 2023-12-24 05:51:51,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=1533520.0, ans=0.025 2023-12-24 05:52:05,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1533653.3333333333, ans=0.125 2023-12-24 05:52:08,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1533653.3333333333, ans=0.125 2023-12-24 05:52:19,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1533720.0, ans=0.0 2023-12-24 05:52:25,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1533786.6666666667, ans=0.1 2023-12-24 05:52:26,494 INFO [train.py:886] (1/4) Epoch 49, batch 1300, loss[loss=0.009657, audio_tagging_loss=0.009657, over 25000.00 frames. ], tot_loss[loss=0.01078, audio_tagging_loss=0.01078, over 4928137.30 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:52:36,203 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.04 vs. limit=15.0 2023-12-24 05:52:38,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1533853.3333333333, ans=0.125 2023-12-24 05:52:40,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1533853.3333333333, ans=0.1 2023-12-24 05:52:43,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1533853.3333333333, ans=0.125 2023-12-24 05:53:01,778 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.636e+01 4.042e+01 4.229e+01 4.423e+01 4.890e+01, threshold=8.459e+01, percent-clipped=0.0 2023-12-24 05:53:04,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1533986.6666666667, ans=0.125 2023-12-24 05:53:09,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1534053.3333333333, ans=0.0 2023-12-24 05:53:10,263 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.21 vs. limit=15.0 2023-12-24 05:53:17,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1534120.0, ans=0.0 2023-12-24 05:53:18,437 INFO [train.py:886] (1/4) Epoch 49, batch 1350, loss[loss=0.01004, audio_tagging_loss=0.01004, over 25000.00 frames. ], tot_loss[loss=0.01069, audio_tagging_loss=0.01069, over 4934987.52 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:53:24,427 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.64 vs. limit=15.0 2023-12-24 05:53:26,431 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.92 vs. limit=22.5 2023-12-24 05:53:41,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1534253.3333333333, ans=0.07 2023-12-24 05:53:57,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1534320.0, ans=0.125 2023-12-24 05:54:10,949 INFO [train.py:886] (1/4) Epoch 49, batch 1400, loss[loss=0.009983, audio_tagging_loss=0.009983, over 25000.00 frames. ], tot_loss[loss=0.01066, audio_tagging_loss=0.01066, over 4942775.34 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:54:13,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1534453.3333333333, ans=0.0 2023-12-24 05:54:36,552 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.82 vs. limit=15.0 2023-12-24 05:54:40,099 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=15.0 2023-12-24 05:54:45,411 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.570e+01 3.938e+01 4.141e+01 4.265e+01 4.875e+01, threshold=8.281e+01, percent-clipped=0.0 2023-12-24 05:54:46,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1534653.3333333333, ans=0.0 2023-12-24 05:54:56,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1534720.0, ans=0.125 2023-12-24 05:55:00,598 INFO [train.py:886] (1/4) Epoch 49, batch 1450, loss[loss=0.01069, audio_tagging_loss=0.01069, over 24750.00 frames. ], tot_loss[loss=0.01054, audio_tagging_loss=0.01054, over 4948908.54 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:55:00,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1534786.6666666667, ans=0.0 2023-12-24 05:55:12,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1534853.3333333333, ans=0.125 2023-12-24 05:55:26,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1534920.0, ans=0.05 2023-12-24 05:55:53,555 INFO [train.py:886] (1/4) Epoch 49, batch 1500, loss[loss=0.01241, audio_tagging_loss=0.01241, over 25000.00 frames. ], tot_loss[loss=0.01056, audio_tagging_loss=0.01056, over 4950809.68 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:56:03,285 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:56:08,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1535186.6666666667, ans=0.125 2023-12-24 05:56:14,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1535253.3333333333, ans=0.125 2023-12-24 05:56:26,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1535320.0, ans=0.1 2023-12-24 05:56:28,576 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.582e+01 4.106e+01 4.257e+01 4.421e+01 5.889e+01, threshold=8.514e+01, percent-clipped=0.0 2023-12-24 05:56:38,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1535386.6666666667, ans=0.125 2023-12-24 05:56:45,304 INFO [train.py:886] (1/4) Epoch 49, batch 1550, loss[loss=0.009752, audio_tagging_loss=0.009752, over 25000.00 frames. ], tot_loss[loss=0.01057, audio_tagging_loss=0.01057, over 4951325.46 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:56:45,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1535453.3333333333, ans=0.2 2023-12-24 05:57:09,307 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=22.5 2023-12-24 05:57:22,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1535653.3333333333, ans=0.1 2023-12-24 05:57:26,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1535720.0, ans=0.125 2023-12-24 05:57:37,142 INFO [train.py:886] (1/4) Epoch 49, batch 1600, loss[loss=0.01076, audio_tagging_loss=0.01076, over 24750.00 frames. ], tot_loss[loss=0.01068, audio_tagging_loss=0.01068, over 4949962.39 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:58:12,890 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.640e+01 4.058e+01 4.221e+01 4.400e+01 5.973e+01, threshold=8.442e+01, percent-clipped=0.0 2023-12-24 05:58:17,894 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.26 vs. limit=12.0 2023-12-24 05:58:26,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1536053.3333333333, ans=0.0 2023-12-24 05:58:30,231 INFO [train.py:886] (1/4) Epoch 49, batch 1650, loss[loss=0.01103, audio_tagging_loss=0.01103, over 25000.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4946723.35 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:58:33,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1536120.0, ans=0.1 2023-12-24 05:58:50,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1536253.3333333333, ans=0.125 2023-12-24 05:58:51,037 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.82 vs. limit=15.0 2023-12-24 05:58:52,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1536253.3333333333, ans=0.07 2023-12-24 05:59:21,938 INFO [train.py:886] (1/4) Epoch 49, batch 1700, loss[loss=0.009724, audio_tagging_loss=0.009724, over 24750.00 frames. ], tot_loss[loss=0.01063, audio_tagging_loss=0.01063, over 4941541.59 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:59:44,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1536586.6666666667, ans=0.125 2023-12-24 05:59:44,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1536586.6666666667, ans=0.125 2023-12-24 05:59:57,644 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.681e+01 4.027e+01 4.185e+01 4.353e+01 5.164e+01, threshold=8.370e+01, percent-clipped=0.0 2023-12-24 05:59:57,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1536653.3333333333, ans=0.125 2023-12-24 05:59:59,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1536653.3333333333, ans=0.125 2023-12-24 06:00:01,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1536653.3333333333, ans=0.125 2023-12-24 06:00:13,568 INFO [train.py:886] (1/4) Epoch 49, batch 1750, loss[loss=0.01165, audio_tagging_loss=0.01165, over 25000.00 frames. ], tot_loss[loss=0.01054, audio_tagging_loss=0.01054, over 4943842.25 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 06:00:18,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1536786.6666666667, ans=0.1 2023-12-24 06:00:50,006 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.91 vs. limit=15.0 2023-12-24 06:01:05,972 INFO [train.py:886] (1/4) Epoch 49, batch 1800, loss[loss=0.012, audio_tagging_loss=0.012, over 25000.00 frames. ], tot_loss[loss=0.01056, audio_tagging_loss=0.01056, over 4948460.71 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 06:01:08,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1537120.0, ans=0.125 2023-12-24 06:01:41,127 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.734e+01 4.057e+01 4.201e+01 4.362e+01 5.500e+01, threshold=8.403e+01, percent-clipped=0.0 2023-12-24 06:01:57,702 INFO [train.py:886] (1/4) Epoch 49, batch 1850, loss[loss=0.01143, audio_tagging_loss=0.01143, over 24750.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4951399.55 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 06:02:06,491 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.64 vs. limit=15.0 2023-12-24 06:02:09,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1537520.0, ans=0.1 2023-12-24 06:02:14,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1537520.0, ans=0.125 2023-12-24 06:02:17,488 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.04 vs. limit=12.0 2023-12-24 06:02:21,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=5.93 vs. limit=15.0 2023-12-24 06:02:28,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1537653.3333333333, ans=0.0 2023-12-24 06:02:29,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1537653.3333333333, ans=0.125 2023-12-24 06:02:36,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1537653.3333333333, ans=0.125 2023-12-24 06:02:49,914 INFO [train.py:886] (1/4) Epoch 49, batch 1900, loss[loss=0.01024, audio_tagging_loss=0.01024, over 25000.00 frames. ], tot_loss[loss=0.01072, audio_tagging_loss=0.01072, over 4941692.70 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 06:02:53,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1537786.6666666667, ans=0.125 2023-12-24 06:03:03,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1537853.3333333333, ans=0.0 2023-12-24 06:03:09,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1537920.0, ans=0.125 2023-12-24 06:03:12,299 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=15.0 2023-12-24 06:03:24,923 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.654e+01 4.070e+01 4.198e+01 4.398e+01 6.870e+01, threshold=8.397e+01, percent-clipped=0.0 2023-12-24 06:03:40,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1538120.0, ans=0.125 2023-12-24 06:03:41,670 INFO [train.py:886] (1/4) Epoch 49, batch 1950, loss[loss=0.01112, audio_tagging_loss=0.01112, over 24750.00 frames. ], tot_loss[loss=0.01071, audio_tagging_loss=0.01071, over 4939301.25 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 06:04:00,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1538186.6666666667, ans=0.1 2023-12-24 06:04:33,126 INFO [train.py:886] (1/4) Epoch 49, batch 2000, loss[loss=0.00874, audio_tagging_loss=0.00874, over 25000.00 frames. ], tot_loss[loss=0.01065, audio_tagging_loss=0.01065, over 4942556.36 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 06:04:55,675 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.28 vs. limit=15.0 2023-12-24 06:05:08,350 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.364e+01 3.984e+01 4.128e+01 4.387e+01 6.325e+01, threshold=8.257e+01, percent-clipped=0.0 2023-12-24 06:05:08,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1538653.3333333333, ans=0.125 2023-12-24 06:05:18,043 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.86 vs. limit=22.5 2023-12-24 06:05:26,322 INFO [train.py:886] (1/4) Epoch 49, batch 2050, loss[loss=0.01459, audio_tagging_loss=0.01459, over 25000.00 frames. ], tot_loss[loss=0.01063, audio_tagging_loss=0.01063, over 4947439.44 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 64.0 2023-12-24 06:05:30,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1538786.6666666667, ans=0.0 2023-12-24 06:05:30,478 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.58 vs. limit=15.0 2023-12-24 06:06:17,054 INFO [train.py:886] (1/4) Epoch 49, batch 2100, loss[loss=0.01381, audio_tagging_loss=0.01381, over 25000.00 frames. ], tot_loss[loss=0.01056, audio_tagging_loss=0.01056, over 4948381.05 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 64.0 2023-12-24 06:06:20,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1539120.0, ans=0.125 2023-12-24 06:06:42,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1539253.3333333333, ans=0.125 2023-12-24 06:06:48,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1539320.0, ans=0.125 2023-12-24 06:06:52,077 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.619e+01 3.997e+01 4.198e+01 4.409e+01 5.519e+01, threshold=8.397e+01, percent-clipped=0.0 2023-12-24 06:07:09,406 INFO [train.py:886] (1/4) Epoch 49, batch 2150, loss[loss=0.01045, audio_tagging_loss=0.01045, over 24750.00 frames. ], tot_loss[loss=0.01055, audio_tagging_loss=0.01055, over 4949084.52 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 64.0 2023-12-24 06:07:14,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1539453.3333333333, ans=0.125 2023-12-24 06:07:17,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1539453.3333333333, ans=0.125 2023-12-24 06:07:37,154 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.05 vs. limit=15.0 2023-12-24 06:07:42,959 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.69 vs. limit=15.0 2023-12-24 06:07:43,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1539653.3333333333, ans=0.125 2023-12-24 06:07:46,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1539653.3333333333, ans=0.125 2023-12-24 06:07:59,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1539786.6666666667, ans=0.2 2023-12-24 06:08:01,535 INFO [train.py:886] (1/4) Epoch 49, batch 2200, loss[loss=0.01037, audio_tagging_loss=0.01037, over 24750.00 frames. ], tot_loss[loss=0.01058, audio_tagging_loss=0.01058, over 4944344.41 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 64.0 2023-12-24 06:08:03,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1539786.6666666667, ans=0.125 2023-12-24 06:08:31,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.43 vs. limit=5.0 2023-12-24 06:08:36,617 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.719e+01 4.113e+01 4.276e+01 4.515e+01 5.433e+01, threshold=8.552e+01, percent-clipped=0.0 2023-12-24 06:08:40,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1539986.6666666667, ans=0.0 2023-12-24 06:08:51,808 INFO [train.py:886] (1/4) Epoch 49, batch 2250, loss[loss=0.01057, audio_tagging_loss=0.01057, over 24750.00 frames. ], tot_loss[loss=0.01067, audio_tagging_loss=0.01067, over 4937311.29 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 64.0 2023-12-24 06:09:05,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1540186.6666666667, ans=0.0 2023-12-24 06:09:10,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1540186.6666666667, ans=0.125 2023-12-24 06:09:15,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1540253.3333333333, ans=0.1 2023-12-24 06:09:18,180 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.19 vs. limit=6.0 2023-12-24 06:09:18,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1540253.3333333333, ans=0.125 2023-12-24 06:09:18,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1540253.3333333333, ans=0.04949747468305833 2023-12-24 06:09:19,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1540253.3333333333, ans=0.0 2023-12-24 06:09:24,136 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.53 vs. limit=12.0 2023-12-24 06:09:29,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1540320.0, ans=0.125 2023-12-24 06:09:33,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1540386.6666666667, ans=0.125 2023-12-24 06:09:45,108 INFO [train.py:886] (1/4) Epoch 49, batch 2300, loss[loss=0.009409, audio_tagging_loss=0.009409, over 25000.00 frames. ], tot_loss[loss=0.01058, audio_tagging_loss=0.01058, over 4942066.59 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 06:09:47,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1540453.3333333333, ans=0.125 2023-12-24 06:10:21,202 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.695e+01 3.977e+01 4.114e+01 4.288e+01 4.900e+01, threshold=8.227e+01, percent-clipped=0.0 2023-12-24 06:10:36,360 INFO [train.py:886] (1/4) Epoch 49, batch 2350, loss[loss=0.01003, audio_tagging_loss=0.01003, over 24750.00 frames. ], tot_loss[loss=0.01056, audio_tagging_loss=0.01056, over 4945154.92 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 06:10:45,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1540786.6666666667, ans=0.1 2023-12-24 06:10:51,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1540853.3333333333, ans=0.125 2023-12-24 06:11:02,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1540920.0, ans=0.0 2023-12-24 06:11:28,818 INFO [train.py:886] (1/4) Epoch 49, batch 2400, loss[loss=0.009216, audio_tagging_loss=0.009216, over 25000.00 frames. ], tot_loss[loss=0.01049, audio_tagging_loss=0.01049, over 4948082.76 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 06:11:32,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1541120.0, ans=0.125 2023-12-24 06:11:32,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1541120.0, ans=0.125 2023-12-24 06:11:41,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1541186.6666666667, ans=0.125 2023-12-24 06:11:41,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1541186.6666666667, ans=0.125 2023-12-24 06:11:56,385 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.44 vs. limit=22.5 2023-12-24 06:11:57,182 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.75 vs. limit=15.0 2023-12-24 06:12:04,396 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.621e+01 3.987e+01 4.152e+01 4.367e+01 5.469e+01, threshold=8.304e+01, percent-clipped=0.0 2023-12-24 06:12:20,354 INFO [train.py:886] (1/4) Epoch 49, batch 2450, loss[loss=0.01022, audio_tagging_loss=0.01022, over 25000.00 frames. ], tot_loss[loss=0.0105, audio_tagging_loss=0.0105, over 4950889.66 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 06:12:24,300 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.43 vs. limit=15.0 2023-12-24 06:12:28,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.50 vs. limit=22.5 2023-12-24 06:12:41,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1541586.6666666667, ans=0.0 2023-12-24 06:13:11,144 INFO [train.py:886] (1/4) Epoch 49, batch 2500, loss[loss=0.009653, audio_tagging_loss=0.009653, over 24750.00 frames. ], tot_loss[loss=0.0106, audio_tagging_loss=0.0106, over 4950065.20 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:13:11,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1541786.6666666667, ans=15.0 2023-12-24 06:13:30,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1541853.3333333333, ans=0.125 2023-12-24 06:13:47,840 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.799e+01 4.107e+01 4.264e+01 4.424e+01 5.486e+01, threshold=8.528e+01, percent-clipped=0.0 2023-12-24 06:13:48,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1541986.6666666667, ans=0.0 2023-12-24 06:13:55,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1542053.3333333333, ans=0.125 2023-12-24 06:14:03,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1542120.0, ans=0.125 2023-12-24 06:14:04,220 INFO [train.py:886] (1/4) Epoch 49, batch 2550, loss[loss=0.009858, audio_tagging_loss=0.009858, over 25000.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4944669.76 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:14:16,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1542186.6666666667, ans=0.0 2023-12-24 06:14:17,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1542186.6666666667, ans=0.1 2023-12-24 06:14:30,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1542253.3333333333, ans=0.125 2023-12-24 06:14:38,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1542320.0, ans=0.09899494936611666 2023-12-24 06:14:42,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1542320.0, ans=0.125 2023-12-24 06:14:55,074 INFO [train.py:886] (1/4) Epoch 49, batch 2600, loss[loss=0.01089, audio_tagging_loss=0.01089, over 25000.00 frames. ], tot_loss[loss=0.01069, audio_tagging_loss=0.01069, over 4947575.93 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:15:13,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1542520.0, ans=0.1 2023-12-24 06:15:21,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1542586.6666666667, ans=0.125 2023-12-24 06:15:27,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1542653.3333333333, ans=0.07 2023-12-24 06:15:29,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1542653.3333333333, ans=0.1 2023-12-24 06:15:32,853 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.739e+01 4.055e+01 4.229e+01 4.404e+01 4.899e+01, threshold=8.458e+01, percent-clipped=0.0 2023-12-24 06:15:47,951 INFO [train.py:886] (1/4) Epoch 49, batch 2650, loss[loss=0.01061, audio_tagging_loss=0.01061, over 25000.00 frames. ], tot_loss[loss=0.01061, audio_tagging_loss=0.01061, over 4947581.03 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:16:15,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1542920.0, ans=0.1 2023-12-24 06:16:20,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1542986.6666666667, ans=0.125 2023-12-24 06:16:21,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1542986.6666666667, ans=0.125 2023-12-24 06:16:34,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1543053.3333333333, ans=0.0 2023-12-24 06:16:40,431 INFO [train.py:886] (1/4) Epoch 49, batch 2700, loss[loss=0.01063, audio_tagging_loss=0.01063, over 24750.00 frames. ], tot_loss[loss=0.01056, audio_tagging_loss=0.01056, over 4945924.26 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:16:41,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1543120.0, ans=0.05 2023-12-24 06:16:45,538 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.72 vs. limit=22.5 2023-12-24 06:16:51,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1543186.6666666667, ans=0.125 2023-12-24 06:16:52,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1543186.6666666667, ans=0.2 2023-12-24 06:16:54,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1543186.6666666667, ans=0.0 2023-12-24 06:16:54,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1543186.6666666667, ans=10.0 2023-12-24 06:17:04,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1543253.3333333333, ans=0.2 2023-12-24 06:17:16,720 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.523e+01 3.964e+01 4.171e+01 4.415e+01 4.994e+01, threshold=8.341e+01, percent-clipped=0.0 2023-12-24 06:17:31,775 INFO [train.py:886] (1/4) Epoch 49, batch 2750, loss[loss=0.009776, audio_tagging_loss=0.009776, over 25000.00 frames. ], tot_loss[loss=0.01054, audio_tagging_loss=0.01054, over 4950921.81 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:17:45,901 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.65 vs. limit=22.5 2023-12-24 06:17:52,453 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.45 vs. limit=22.5 2023-12-24 06:18:12,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1543720.0, ans=0.125 2023-12-24 06:18:18,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1543720.0, ans=0.0 2023-12-24 06:18:23,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1543786.6666666667, ans=0.125 2023-12-24 06:18:24,107 INFO [train.py:886] (1/4) Epoch 49, batch 2800, loss[loss=0.0111, audio_tagging_loss=0.0111, over 24750.00 frames. ], tot_loss[loss=0.01053, audio_tagging_loss=0.01053, over 4941632.06 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:18:31,039 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.80 vs. limit=6.0 2023-12-24 06:18:31,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1543786.6666666667, ans=0.0 2023-12-24 06:18:38,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1543853.3333333333, ans=0.125 2023-12-24 06:18:59,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1543986.6666666667, ans=0.0 2023-12-24 06:19:00,880 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.540e+01 4.058e+01 4.176e+01 4.407e+01 5.903e+01, threshold=8.351e+01, percent-clipped=0.0 2023-12-24 06:19:05,155 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.07 vs. limit=15.0 2023-12-24 06:19:16,498 INFO [train.py:886] (1/4) Epoch 49, batch 2850, loss[loss=0.01223, audio_tagging_loss=0.01223, over 24750.00 frames. ], tot_loss[loss=0.01063, audio_tagging_loss=0.01063, over 4935871.31 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:19:17,072 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.79 vs. limit=6.0 2023-12-24 06:19:24,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1544120.0, ans=0.125 2023-12-24 06:19:27,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1544186.6666666667, ans=0.125 2023-12-24 06:19:35,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1544253.3333333333, ans=0.1 2023-12-24 06:19:40,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1544253.3333333333, ans=0.125 2023-12-24 06:19:46,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1544320.0, ans=0.0 2023-12-24 06:19:49,931 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 06:20:00,138 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 06:20:08,191 INFO [train.py:886] (1/4) Epoch 49, batch 2900, loss[loss=0.01096, audio_tagging_loss=0.01096, over 24750.00 frames. ], tot_loss[loss=0.01065, audio_tagging_loss=0.01065, over 4936410.26 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:20:17,585 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 06:20:32,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1544586.6666666667, ans=0.125 2023-12-24 06:20:34,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1544586.6666666667, ans=0.0 2023-12-24 06:20:44,088 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.584e+01 4.050e+01 4.220e+01 4.394e+01 4.987e+01, threshold=8.439e+01, percent-clipped=0.0 2023-12-24 06:20:48,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1544720.0, ans=0.125 2023-12-24 06:20:59,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1544786.6666666667, ans=0.125 2023-12-24 06:21:00,376 INFO [train.py:886] (1/4) Epoch 49, batch 2950, loss[loss=0.00941, audio_tagging_loss=0.00941, over 24750.00 frames. ], tot_loss[loss=0.01054, audio_tagging_loss=0.01054, over 4933718.38 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:21:11,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1544853.3333333333, ans=0.125 2023-12-24 06:21:18,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1544853.3333333333, ans=0.125 2023-12-24 06:21:18,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1544853.3333333333, ans=0.125 2023-12-24 06:21:33,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1544986.6666666667, ans=0.2 2023-12-24 06:21:37,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1544986.6666666667, ans=0.0 2023-12-24 06:21:42,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1545053.3333333333, ans=0.125 2023-12-24 06:21:52,350 INFO [train.py:886] (1/4) Epoch 49, batch 3000, loss[loss=0.01091, audio_tagging_loss=0.01091, over 25000.00 frames. ], tot_loss[loss=0.01049, audio_tagging_loss=0.01049, over 4942257.79 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:21:52,350 INFO [train.py:909] (1/4) Computing validation loss 2023-12-24 06:22:10,330 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.6236, 3.7389, 3.4618, 3.2826], device='cuda:1') 2023-12-24 06:22:13,826 INFO [train.py:917] (1/4) Epoch 49, validation: loss=0.03737, audio_tagging_loss=0.03737, over 3737520.00 frames. 2023-12-24 06:22:13,826 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-24 06:22:19,870 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.09 vs. limit=15.0 2023-12-24 06:22:25,207 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.65 vs. limit=15.0 2023-12-24 06:22:35,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1545253.3333333333, ans=0.0 2023-12-24 06:22:50,401 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.477e+01 3.989e+01 4.185e+01 4.456e+01 5.215e+01, threshold=8.370e+01, percent-clipped=0.0 2023-12-24 06:23:00,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1545386.6666666667, ans=0.125 2023-12-24 06:23:06,472 INFO [train.py:886] (1/4) Epoch 49, batch 3050, loss[loss=0.008851, audio_tagging_loss=0.008851, over 25000.00 frames. ], tot_loss[loss=0.01046, audio_tagging_loss=0.01046, over 4949785.54 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:23:13,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1545453.3333333333, ans=0.125 2023-12-24 06:23:17,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1545520.0, ans=0.125 2023-12-24 06:23:24,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1545520.0, ans=0.125 2023-12-24 06:23:28,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=1545586.6666666667, ans=22.5 2023-12-24 06:23:35,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1545586.6666666667, ans=0.125 2023-12-24 06:23:50,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1545720.0, ans=0.125 2023-12-24 06:23:57,221 INFO [train.py:886] (1/4) Epoch 49, batch 3100, loss[loss=0.01307, audio_tagging_loss=0.01307, over 24750.00 frames. ], tot_loss[loss=0.0105, audio_tagging_loss=0.0105, over 4948488.77 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:24:07,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1545853.3333333333, ans=0.1 2023-12-24 06:24:12,409 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.07 vs. limit=22.5 2023-12-24 06:24:13,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1545853.3333333333, ans=0.125 2023-12-24 06:24:25,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1545920.0, ans=0.5 2023-12-24 06:24:33,099 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.660e+01 4.061e+01 4.253e+01 4.429e+01 4.827e+01, threshold=8.507e+01, percent-clipped=0.0 2023-12-24 06:24:40,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1546053.3333333333, ans=0.07 2023-12-24 06:24:47,146 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 06:24:47,933 INFO [train.py:886] (1/4) Epoch 49, batch 3150, loss[loss=0.01098, audio_tagging_loss=0.01098, over 24750.00 frames. ], tot_loss[loss=0.01059, audio_tagging_loss=0.01059, over 4943281.32 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:24:56,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1546120.0, ans=0.0 2023-12-24 06:25:11,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1546253.3333333333, ans=0.0 2023-12-24 06:25:21,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1546320.0, ans=0.2 2023-12-24 06:25:40,631 INFO [train.py:886] (1/4) Epoch 49, batch 3200, loss[loss=0.009926, audio_tagging_loss=0.009926, over 24750.00 frames. ], tot_loss[loss=0.0106, audio_tagging_loss=0.0106, over 4943459.01 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:25:50,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1546520.0, ans=0.0 2023-12-24 06:25:54,475 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.16 vs. limit=15.0 2023-12-24 06:26:02,907 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.26 vs. limit=6.0 2023-12-24 06:26:04,101 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-24 06:26:18,570 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.708e+01 4.064e+01 4.235e+01 4.462e+01 5.298e+01, threshold=8.470e+01, percent-clipped=0.0 2023-12-24 06:26:33,564 INFO [train.py:886] (1/4) Epoch 49, batch 3250, loss[loss=0.01148, audio_tagging_loss=0.01148, over 25000.00 frames. ], tot_loss[loss=0.01058, audio_tagging_loss=0.01058, over 4945313.83 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:26:40,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1546786.6666666667, ans=0.125 2023-12-24 06:26:48,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1546853.3333333333, ans=0.0 2023-12-24 06:26:52,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1546853.3333333333, ans=0.125 2023-12-24 06:26:57,483 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.32 vs. limit=15.0 2023-12-24 06:26:59,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1546920.0, ans=0.125 2023-12-24 06:27:03,858 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2023-12-24 06:27:18,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1547053.3333333333, ans=0.125 2023-12-24 06:27:26,092 INFO [train.py:886] (1/4) Epoch 49, batch 3300, loss[loss=0.01039, audio_tagging_loss=0.01039, over 24750.00 frames. ], tot_loss[loss=0.01048, audio_tagging_loss=0.01048, over 4948093.33 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:27:51,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.02 vs. limit=15.0 2023-12-24 06:27:57,660 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2023-12-24 06:28:01,842 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.618e+01 4.007e+01 4.177e+01 4.374e+01 5.032e+01, threshold=8.354e+01, percent-clipped=0.0 2023-12-24 06:28:05,207 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.69 vs. limit=15.0 2023-12-24 06:28:17,562 INFO [train.py:886] (1/4) Epoch 49, batch 3350, loss[loss=0.01009, audio_tagging_loss=0.01009, over 25000.00 frames. ], tot_loss[loss=0.01046, audio_tagging_loss=0.01046, over 4951300.74 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:28:18,114 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.40 vs. limit=10.0 2023-12-24 06:28:18,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.91 vs. limit=12.0 2023-12-24 06:28:25,006 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.93 vs. limit=15.0 2023-12-24 06:28:25,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1547453.3333333333, ans=0.125 2023-12-24 06:28:26,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1547453.3333333333, ans=0.0 2023-12-24 06:28:29,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1547520.0, ans=0.125 2023-12-24 06:28:59,142 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.74 vs. limit=10.0 2023-12-24 06:29:00,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1547720.0, ans=0.2 2023-12-24 06:29:08,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1547786.6666666667, ans=0.0 2023-12-24 06:29:09,057 INFO [train.py:886] (1/4) Epoch 49, batch 3400, loss[loss=0.01107, audio_tagging_loss=0.01107, over 25000.00 frames. ], tot_loss[loss=0.01048, audio_tagging_loss=0.01048, over 4953161.27 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:29:18,749 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.21 vs. limit=15.0 2023-12-24 06:29:39,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1547986.6666666667, ans=0.0 2023-12-24 06:29:40,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1547986.6666666667, ans=0.2 2023-12-24 06:29:45,480 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.515e+01 4.092e+01 4.242e+01 4.462e+01 5.102e+01, threshold=8.484e+01, percent-clipped=0.0 2023-12-24 06:29:59,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1548053.3333333333, ans=0.1 2023-12-24 06:30:02,456 INFO [train.py:886] (1/4) Epoch 49, batch 3450, loss[loss=0.01071, audio_tagging_loss=0.01071, over 24750.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4950577.27 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:30:09,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1548120.0, ans=0.0 2023-12-24 06:30:13,929 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 06:30:41,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1548386.6666666667, ans=0.125 2023-12-24 06:30:41,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1548386.6666666667, ans=0.07 2023-12-24 06:30:43,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1548386.6666666667, ans=0.125 2023-12-24 06:30:47,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1548386.6666666667, ans=0.125 2023-12-24 06:30:52,206 INFO [train.py:886] (1/4) Epoch 49, batch 3500, loss[loss=0.01035, audio_tagging_loss=0.01035, over 24750.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4945154.42 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:30:52,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1548453.3333333333, ans=0.125 2023-12-24 06:31:06,965 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=15.0 2023-12-24 06:31:22,209 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.73 vs. limit=15.0 2023-12-24 06:31:29,131 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.463e+01 4.041e+01 4.186e+01 4.358e+01 4.992e+01, threshold=8.372e+01, percent-clipped=0.0 2023-12-24 06:31:30,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1548653.3333333333, ans=0.0 2023-12-24 06:31:40,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1548720.0, ans=0.125 2023-12-24 06:31:44,754 INFO [train.py:886] (1/4) Epoch 49, batch 3550, loss[loss=0.008696, audio_tagging_loss=0.008696, over 25000.00 frames. ], tot_loss[loss=0.01059, audio_tagging_loss=0.01059, over 4948914.98 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:32:05,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1548920.0, ans=0.125 2023-12-24 06:32:18,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1548986.6666666667, ans=0.04949747468305833 2023-12-24 06:32:26,077 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.51 vs. limit=15.0 2023-12-24 06:32:34,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1549053.3333333333, ans=0.125 2023-12-24 06:32:36,661 INFO [train.py:886] (1/4) Epoch 49, batch 3600, loss[loss=0.01115, audio_tagging_loss=0.01115, over 25000.00 frames. ], tot_loss[loss=0.01056, audio_tagging_loss=0.01056, over 4954484.52 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:32:37,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1549120.0, ans=0.0 2023-12-24 06:32:51,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1549186.6666666667, ans=0.5 2023-12-24 06:32:52,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1549186.6666666667, ans=0.2 2023-12-24 06:32:57,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1549253.3333333333, ans=0.125 2023-12-24 06:33:11,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1549320.0, ans=0.1 2023-12-24 06:33:11,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=1549320.0, ans=0.02 2023-12-24 06:33:13,502 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.476e+01 4.039e+01 4.199e+01 4.373e+01 5.772e+01, threshold=8.398e+01, percent-clipped=0.0 2023-12-24 06:33:13,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1549320.0, ans=0.125 2023-12-24 06:33:21,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1549386.6666666667, ans=0.2 2023-12-24 06:33:22,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1549386.6666666667, ans=0.125 2023-12-24 06:33:28,336 INFO [train.py:886] (1/4) Epoch 49, batch 3650, loss[loss=0.008394, audio_tagging_loss=0.008394, over 21577.00 frames. ], tot_loss[loss=0.01053, audio_tagging_loss=0.01053, over 4956567.17 frames. ], batch size: 107, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:33:32,349 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.56 vs. limit=15.0 2023-12-24 06:33:35,143 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.39 vs. limit=10.0 2023-12-24 06:34:01,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1549653.3333333333, ans=0.0 2023-12-24 06:34:05,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1549653.3333333333, ans=0.07 2023-12-24 06:34:05,789 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=15.0 2023-12-24 06:34:20,958 INFO [train.py:886] (1/4) Epoch 49, batch 3700, loss[loss=0.01141, audio_tagging_loss=0.01141, over 25000.00 frames. ], tot_loss[loss=0.01063, audio_tagging_loss=0.01063, over 4960909.90 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:34:22,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1549786.6666666667, ans=0.0 2023-12-24 06:34:23,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1549786.6666666667, ans=0.0 2023-12-24 06:34:32,861 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.13 vs. limit=10.0 2023-12-24 06:34:33,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1549853.3333333333, ans=0.2 2023-12-24 06:34:45,711 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.56 vs. limit=15.0 2023-12-24 06:34:50,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2023-12-24 06:34:50,928 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.00 vs. limit=15.0 2023-12-24 06:34:54,802 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.25 vs. limit=15.0 2023-12-24 06:34:57,826 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.487e+01 4.052e+01 4.232e+01 4.524e+01 5.047e+01, threshold=8.465e+01, percent-clipped=0.0 2023-12-24 06:35:12,707 INFO [train.py:886] (1/4) Epoch 49, batch 3750, loss[loss=0.01348, audio_tagging_loss=0.01348, over 24750.00 frames. ], tot_loss[loss=0.01068, audio_tagging_loss=0.01068, over 4954913.02 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:35:26,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1550186.6666666667, ans=0.1 2023-12-24 06:35:28,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1550186.6666666667, ans=0.0 2023-12-24 06:35:32,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1550186.6666666667, ans=0.0 2023-12-24 06:35:51,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1550320.0, ans=0.125 2023-12-24 06:35:53,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1550386.6666666667, ans=0.125 2023-12-24 06:36:04,940 INFO [train.py:886] (1/4) Epoch 49, batch 3800, loss[loss=0.01027, audio_tagging_loss=0.01027, over 24750.00 frames. ], tot_loss[loss=0.01069, audio_tagging_loss=0.01069, over 4945466.84 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:36:27,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1550586.6666666667, ans=0.0 2023-12-24 06:36:41,087 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.564e+01 4.023e+01 4.204e+01 4.391e+01 4.921e+01, threshold=8.409e+01, percent-clipped=0.0 2023-12-24 06:36:55,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1550720.0, ans=0.1 2023-12-24 06:36:57,919 INFO [train.py:886] (1/4) Epoch 49, batch 3850, loss[loss=0.012, audio_tagging_loss=0.012, over 25000.00 frames. ], tot_loss[loss=0.01054, audio_tagging_loss=0.01054, over 4940514.85 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:37:01,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1550786.6666666667, ans=0.1 2023-12-24 06:37:07,696 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.24 vs. limit=15.0 2023-12-24 06:37:11,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1550853.3333333333, ans=0.125 2023-12-24 06:37:24,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=1550920.0, ans=0.02 2023-12-24 06:37:35,506 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.43 vs. limit=12.0 2023-12-24 06:37:46,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1551053.3333333333, ans=0.0 2023-12-24 06:37:49,542 INFO [train.py:886] (1/4) Epoch 49, batch 3900, loss[loss=0.01039, audio_tagging_loss=0.01039, over 24750.00 frames. ], tot_loss[loss=0.01044, audio_tagging_loss=0.01044, over 4946170.35 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:37:49,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1551120.0, ans=0.2 2023-12-24 06:37:58,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1551120.0, ans=0.125 2023-12-24 06:38:25,558 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.639e+01 4.074e+01 4.182e+01 4.374e+01 4.953e+01, threshold=8.363e+01, percent-clipped=0.0 2023-12-24 06:38:39,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1551386.6666666667, ans=0.125 2023-12-24 06:38:41,228 INFO [train.py:886] (1/4) Epoch 49, batch 3950, loss[loss=0.008522, audio_tagging_loss=0.008522, over 25000.00 frames. ], tot_loss[loss=0.01043, audio_tagging_loss=0.01043, over 4948987.32 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:39:02,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1551586.6666666667, ans=0.1 2023-12-24 06:39:06,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1551586.6666666667, ans=0.1 2023-12-24 06:39:07,342 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.30 vs. limit=12.0 2023-12-24 06:39:12,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1551653.3333333333, ans=0.2 2023-12-24 06:39:25,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1551720.0, ans=0.0 2023-12-24 06:39:28,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1551720.0, ans=0.0 2023-12-24 06:39:33,410 INFO [train.py:886] (1/4) Epoch 49, batch 4000, loss[loss=0.01018, audio_tagging_loss=0.01018, over 25000.00 frames. ], tot_loss[loss=0.01047, audio_tagging_loss=0.01047, over 4954001.44 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:39:49,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1551853.3333333333, ans=0.5 2023-12-24 06:40:07,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1551986.6666666667, ans=0.125 2023-12-24 06:40:07,999 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.23 vs. limit=22.5 2023-12-24 06:40:09,394 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.560e+01 4.058e+01 4.250e+01 4.476e+01 5.419e+01, threshold=8.500e+01, percent-clipped=0.0 2023-12-24 06:40:17,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1552053.3333333333, ans=0.125 2023-12-24 06:40:24,543 INFO [train.py:886] (1/4) Epoch 49, batch 4050, loss[loss=0.01058, audio_tagging_loss=0.01058, over 24750.00 frames. ], tot_loss[loss=0.01053, audio_tagging_loss=0.01053, over 4954342.26 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:40:30,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1552120.0, ans=0.2 2023-12-24 06:40:39,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1552186.6666666667, ans=0.125 2023-12-24 06:40:51,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1552253.3333333333, ans=0.0 2023-12-24 06:40:55,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1552320.0, ans=0.0 2023-12-24 06:41:05,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1552320.0, ans=0.1 2023-12-24 06:41:15,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1552386.6666666667, ans=0.125 2023-12-24 06:41:17,514 INFO [train.py:886] (1/4) Epoch 49, batch 4100, loss[loss=0.01032, audio_tagging_loss=0.01032, over 24750.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4945141.92 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:41:20,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1552453.3333333333, ans=0.125 2023-12-24 06:41:24,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1552453.3333333333, ans=0.125 2023-12-24 06:41:26,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1552520.0, ans=0.125 2023-12-24 06:41:32,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1552520.0, ans=0.125 2023-12-24 06:41:35,483 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.53 vs. limit=10.0 2023-12-24 06:41:41,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1552586.6666666667, ans=0.125 2023-12-24 06:41:53,397 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.555e+01 4.048e+01 4.230e+01 4.426e+01 5.079e+01, threshold=8.460e+01, percent-clipped=0.0 2023-12-24 06:42:08,580 INFO [train.py:886] (1/4) Epoch 49, batch 4150, loss[loss=0.009372, audio_tagging_loss=0.009372, over 25000.00 frames. ], tot_loss[loss=0.01055, audio_tagging_loss=0.01055, over 4939880.44 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:42:08,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.22 vs. limit=15.0 2023-12-24 06:42:10,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1552786.6666666667, ans=0.125 2023-12-24 06:42:20,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1552853.3333333333, ans=0.125 2023-12-24 06:42:20,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1552853.3333333333, ans=0.125 2023-12-24 06:42:24,878 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2023-12-24 06:42:38,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1552986.6666666667, ans=0.125 2023-12-24 06:42:45,516 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.43 vs. limit=15.0 2023-12-24 06:42:50,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1553053.3333333333, ans=0.2 2023-12-24 06:42:52,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1553053.3333333333, ans=0.125 2023-12-24 06:42:54,404 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=15.0 2023-12-24 06:42:59,756 INFO [train.py:886] (1/4) Epoch 49, batch 4200, loss[loss=0.01018, audio_tagging_loss=0.01018, over 24750.00 frames. ], tot_loss[loss=0.01051, audio_tagging_loss=0.01051, over 4941548.70 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:43:14,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1553186.6666666667, ans=0.125 2023-12-24 06:43:16,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1553186.6666666667, ans=0.2 2023-12-24 06:43:22,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1553253.3333333333, ans=0.125 2023-12-24 06:43:36,489 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.716e+01 4.005e+01 4.208e+01 4.384e+01 4.988e+01, threshold=8.417e+01, percent-clipped=0.0 2023-12-24 06:43:38,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1553320.0, ans=0.1 2023-12-24 06:43:41,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1553386.6666666667, ans=0.125 2023-12-24 06:43:46,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1553386.6666666667, ans=0.0 2023-12-24 06:43:49,669 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.13 vs. limit=6.0 2023-12-24 06:43:52,897 INFO [train.py:886] (1/4) Epoch 49, batch 4250, loss[loss=0.01184, audio_tagging_loss=0.01184, over 25000.00 frames. ], tot_loss[loss=0.01046, audio_tagging_loss=0.01046, over 4942058.80 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:44:22,790 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.39 vs. limit=15.0 2023-12-24 06:44:32,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1553653.3333333333, ans=0.125 2023-12-24 06:44:43,704 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2023-12-24 06:44:44,047 INFO [train.py:886] (1/4) Epoch 49, batch 4300, loss[loss=0.01129, audio_tagging_loss=0.01129, over 25000.00 frames. ], tot_loss[loss=0.01039, audio_tagging_loss=0.01039, over 4946059.43 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 64.0 2023-12-24 06:44:46,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1553786.6666666667, ans=0.1 2023-12-24 06:45:03,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1553853.3333333333, ans=0.125 2023-12-24 06:45:15,687 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-24 06:45:19,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1553986.6666666667, ans=0.125 2023-12-24 06:45:21,780 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.664e+01 3.972e+01 4.193e+01 4.366e+01 5.433e+01, threshold=8.386e+01, percent-clipped=0.0 2023-12-24 06:45:35,929 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.83 vs. limit=22.5 2023-12-24 06:45:37,144 INFO [train.py:886] (1/4) Epoch 49, batch 4350, loss[loss=0.009891, audio_tagging_loss=0.009891, over 24750.00 frames. ], tot_loss[loss=0.01044, audio_tagging_loss=0.01044, over 4951935.68 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:45:49,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1554186.6666666667, ans=0.125 2023-12-24 06:46:24,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1554386.6666666667, ans=0.125 2023-12-24 06:46:28,750 INFO [train.py:886] (1/4) Epoch 49, batch 4400, loss[loss=0.01173, audio_tagging_loss=0.01173, over 24750.00 frames. ], tot_loss[loss=0.01063, audio_tagging_loss=0.01063, over 4951198.96 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:46:44,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1554520.0, ans=0.0 2023-12-24 06:46:47,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1554520.0, ans=0.125 2023-12-24 06:46:55,102 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.80 vs. limit=6.0 2023-12-24 06:46:56,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1554586.6666666667, ans=0.2 2023-12-24 06:46:59,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1554653.3333333333, ans=0.1 2023-12-24 06:47:07,213 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.673e+01 4.187e+01 4.293e+01 4.475e+01 5.860e+01, threshold=8.587e+01, percent-clipped=0.0 2023-12-24 06:47:09,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=1554653.3333333333, ans=0.05 2023-12-24 06:47:13,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1554720.0, ans=0.125 2023-12-24 06:47:16,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1554720.0, ans=0.2 2023-12-24 06:47:20,687 INFO [train.py:886] (1/4) Epoch 49, batch 4450, loss[loss=0.01342, audio_tagging_loss=0.01342, over 25000.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4948018.69 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:47:28,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1554786.6666666667, ans=0.125 2023-12-24 06:47:32,571 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.04 vs. limit=6.0 2023-12-24 06:47:47,771 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.93 vs. limit=6.0 2023-12-24 06:47:49,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1554920.0, ans=0.125 2023-12-24 06:48:12,853 INFO [train.py:886] (1/4) Epoch 49, batch 4500, loss[loss=0.01055, audio_tagging_loss=0.01055, over 25000.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4945959.39 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:48:21,101 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.56 vs. limit=8.0 2023-12-24 06:48:29,299 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.27 vs. limit=6.0 2023-12-24 06:48:30,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1555186.6666666667, ans=0.035 2023-12-24 06:48:34,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1555253.3333333333, ans=0.2 2023-12-24 06:48:46,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1555320.0, ans=0.2 2023-12-24 06:48:49,614 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.358e+01 3.945e+01 4.149e+01 4.337e+01 5.254e+01, threshold=8.299e+01, percent-clipped=0.0 2023-12-24 06:49:03,790 INFO [train.py:886] (1/4) Epoch 49, batch 4550, loss[loss=0.01092, audio_tagging_loss=0.01092, over 24750.00 frames. ], tot_loss[loss=0.01062, audio_tagging_loss=0.01062, over 4949930.43 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:49:07,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1555453.3333333333, ans=0.04949747468305833 2023-12-24 06:49:10,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1555453.3333333333, ans=0.05 2023-12-24 06:49:39,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1555653.3333333333, ans=0.0 2023-12-24 06:49:56,025 INFO [train.py:886] (1/4) Epoch 49, batch 4600, loss[loss=0.0111, audio_tagging_loss=0.0111, over 25000.00 frames. ], tot_loss[loss=0.01057, audio_tagging_loss=0.01057, over 4955099.86 frames. ], batch size: 100, lr: 2.17e-03, grad_scale: 32.0 2023-12-24 06:49:57,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1555786.6666666667, ans=0.1 2023-12-24 06:50:11,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1555853.3333333333, ans=0.125 2023-12-24 06:50:32,531 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.708e+01 4.082e+01 4.227e+01 4.419e+01 5.112e+01, threshold=8.454e+01, percent-clipped=0.0 2023-12-24 06:50:38,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1556053.3333333333, ans=0.0 2023-12-24 06:50:41,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1556053.3333333333, ans=0.09899494936611666 2023-12-24 06:50:46,515 INFO [train.py:886] (1/4) Epoch 49, batch 4650, loss[loss=0.01435, audio_tagging_loss=0.01435, over 25000.00 frames. ], tot_loss[loss=0.01056, audio_tagging_loss=0.01056, over 4956970.54 frames. ], batch size: 100, lr: 2.17e-03, grad_scale: 32.0 2023-12-24 06:50:52,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1556120.0, ans=0.1 2023-12-24 06:50:52,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1556120.0, ans=0.0 2023-12-24 06:51:02,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1556186.6666666667, ans=0.0 2023-12-24 06:51:12,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1556253.3333333333, ans=0.1 2023-12-24 06:51:16,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1556320.0, ans=0.125 2023-12-24 06:51:18,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1556320.0, ans=0.0 2023-12-24 06:51:22,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1556320.0, ans=0.125 2023-12-24 06:51:32,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1556386.6666666667, ans=0.1 2023-12-24 06:51:35,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1556386.6666666667, ans=0.125 2023-12-24 06:51:37,702 INFO [train.py:886] (1/4) Epoch 49, batch 4700, loss[loss=0.009181, audio_tagging_loss=0.009181, over 24750.00 frames. ], tot_loss[loss=0.01063, audio_tagging_loss=0.01063, over 4949688.52 frames. ], batch size: 99, lr: 2.17e-03, grad_scale: 32.0 2023-12-24 06:51:38,128 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.38 vs. limit=10.0 2023-12-24 06:51:39,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.82 vs. limit=15.0 2023-12-24 06:51:41,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1556453.3333333333, ans=0.125 2023-12-24 06:51:48,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1556520.0, ans=10.0 2023-12-24 06:52:10,982 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.696e+01 4.105e+01 4.278e+01 4.465e+01 5.122e+01, threshold=8.556e+01, percent-clipped=0.0 2023-12-24 06:52:11,718 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.92 vs. limit=8.0 2023-12-24 06:52:24,086 INFO [train.py:886] (1/4) Epoch 49, batch 4750, loss[loss=0.008163, audio_tagging_loss=0.008163, over 25000.00 frames. ], tot_loss[loss=0.01069, audio_tagging_loss=0.01069, over 4942428.99 frames. ], batch size: 100, lr: 2.17e-03, grad_scale: 32.0 2023-12-24 06:52:59,427 INFO [train.py:886] (1/4) Epoch 50, batch 0, loss[loss=0.02559, audio_tagging_loss=0.02559, over 23978.00 frames. ], tot_loss[loss=0.02559, audio_tagging_loss=0.02559, over 23978.00 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 06:52:59,428 INFO [train.py:909] (1/4) Computing validation loss 2023-12-24 06:53:10,724 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.5395, 3.5015, 3.9551, 4.2171], device='cuda:1') 2023-12-24 06:53:21,090 INFO [train.py:917] (1/4) Epoch 50, validation: loss=0.03747, audio_tagging_loss=0.03747, over 3737520.00 frames. 2023-12-24 06:53:21,090 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-24 06:53:25,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1556893.3333333333, ans=0.1 2023-12-24 06:53:28,868 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 06:53:43,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1557026.6666666667, ans=0.09899494936611666 2023-12-24 06:53:44,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1557026.6666666667, ans=0.125 2023-12-24 06:53:52,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1557093.3333333333, ans=0.125 2023-12-24 06:53:56,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1557093.3333333333, ans=0.0 2023-12-24 06:54:00,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1557093.3333333333, ans=0.0 2023-12-24 06:54:00,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1557093.3333333333, ans=0.0 2023-12-24 06:54:01,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1557160.0, ans=0.0 2023-12-24 06:54:11,355 INFO [train.py:886] (1/4) Epoch 50, batch 50, loss[loss=0.01555, audio_tagging_loss=0.01555, over 25000.00 frames. ], tot_loss[loss=0.01726, audio_tagging_loss=0.01726, over 1117613.81 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 16.0 2023-12-24 06:54:35,196 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.740e+01 4.510e+01 5.078e+01 5.716e+01 1.112e+02, threshold=1.016e+02, percent-clipped=6.0 2023-12-24 06:55:04,535 INFO [train.py:886] (1/4) Epoch 50, batch 100, loss[loss=0.01252, audio_tagging_loss=0.01252, over 25000.00 frames. ], tot_loss[loss=0.01472, audio_tagging_loss=0.01472, over 1971957.00 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 16.0 2023-12-24 06:55:18,274 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.11 vs. limit=22.5 2023-12-24 06:55:24,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1557693.3333333333, ans=0.2 2023-12-24 06:55:26,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1557693.3333333333, ans=0.0 2023-12-24 06:55:37,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=1557760.0, ans=0.5 2023-12-24 06:55:43,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1557760.0, ans=0.125 2023-12-24 06:55:44,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1557826.6666666667, ans=0.0 2023-12-24 06:55:47,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1557826.6666666667, ans=0.0 2023-12-24 06:55:48,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1557826.6666666667, ans=0.0 2023-12-24 06:55:51,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1557826.6666666667, ans=0.0 2023-12-24 06:55:53,222 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.50 vs. limit=15.0 2023-12-24 06:55:54,690 INFO [train.py:886] (1/4) Epoch 50, batch 150, loss[loss=0.01119, audio_tagging_loss=0.01119, over 25000.00 frames. ], tot_loss[loss=0.01344, audio_tagging_loss=0.01344, over 2633335.06 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 16.0 2023-12-24 06:55:57,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1557893.3333333333, ans=0.0 2023-12-24 06:56:09,571 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.95 vs. limit=22.5 2023-12-24 06:56:13,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1557960.0, ans=0.125 2023-12-24 06:56:18,341 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.960e+01 4.220e+01 4.441e+01 4.666e+01 5.364e+01, threshold=8.881e+01, percent-clipped=0.0 2023-12-24 06:56:24,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1558026.6666666667, ans=0.04949747468305833 2023-12-24 06:56:41,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1558160.0, ans=0.125 2023-12-24 06:56:47,111 INFO [train.py:886] (1/4) Epoch 50, batch 200, loss[loss=0.01029, audio_tagging_loss=0.01029, over 25000.00 frames. ], tot_loss[loss=0.01253, audio_tagging_loss=0.01253, over 3146942.02 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 16.0 2023-12-24 06:56:54,130 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 06:56:56,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1558293.3333333333, ans=0.125 2023-12-24 06:56:58,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1558293.3333333333, ans=0.125 2023-12-24 06:57:02,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1558293.3333333333, ans=0.125 2023-12-24 06:57:07,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1558360.0, ans=0.1 2023-12-24 06:57:37,404 INFO [train.py:886] (1/4) Epoch 50, batch 250, loss[loss=0.008833, audio_tagging_loss=0.008833, over 25000.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 3550887.80 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 16.0 2023-12-24 06:57:45,919 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.32 vs. limit=12.0 2023-12-24 06:57:46,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1558560.0, ans=0.1 2023-12-24 06:57:48,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1558626.6666666667, ans=0.0 2023-12-24 06:57:56,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1558626.6666666667, ans=0.125 2023-12-24 06:58:00,321 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.630e+01 4.054e+01 4.252e+01 4.416e+01 4.947e+01, threshold=8.505e+01, percent-clipped=0.0 2023-12-24 06:58:03,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1558693.3333333333, ans=0.2 2023-12-24 06:58:15,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1558760.0, ans=0.1 2023-12-24 06:58:18,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1558826.6666666667, ans=0.125 2023-12-24 06:58:23,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1558826.6666666667, ans=0.125 2023-12-24 06:58:29,397 INFO [train.py:886] (1/4) Epoch 50, batch 300, loss[loss=0.01327, audio_tagging_loss=0.01327, over 24951.00 frames. ], tot_loss[loss=0.01181, audio_tagging_loss=0.01181, over 3859655.28 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 16.0 2023-12-24 06:58:34,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1558893.3333333333, ans=0.1 2023-12-24 06:58:36,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1558893.3333333333, ans=0.125 2023-12-24 06:58:43,801 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.30 vs. limit=15.0 2023-12-24 06:58:44,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1558960.0, ans=10.0 2023-12-24 06:58:49,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1559026.6666666667, ans=0.125 2023-12-24 06:58:50,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1559026.6666666667, ans=0.125 2023-12-24 06:59:03,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1559093.3333333333, ans=0.125 2023-12-24 06:59:20,862 INFO [train.py:886] (1/4) Epoch 50, batch 350, loss[loss=0.01092, audio_tagging_loss=0.01092, over 25000.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4101674.14 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 16.0 2023-12-24 06:59:26,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1559226.6666666667, ans=0.0 2023-12-24 06:59:43,068 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.541e+01 4.035e+01 4.228e+01 4.389e+01 4.773e+01, threshold=8.456e+01, percent-clipped=0.0 2023-12-24 06:59:47,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1559360.0, ans=0.125 2023-12-24 06:59:53,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=1559426.6666666667, ans=6.0 2023-12-24 06:59:56,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.06 vs. limit=15.0 2023-12-24 07:00:02,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1559493.3333333333, ans=0.125 2023-12-24 07:00:11,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1559560.0, ans=0.125 2023-12-24 07:00:12,433 INFO [train.py:886] (1/4) Epoch 50, batch 400, loss[loss=0.01116, audio_tagging_loss=0.01116, over 24750.00 frames. ], tot_loss[loss=0.01129, audio_tagging_loss=0.01129, over 4288628.52 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:00:45,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1559760.0, ans=0.2 2023-12-24 07:00:58,981 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.56 vs. limit=15.0 2023-12-24 07:01:04,258 INFO [train.py:886] (1/4) Epoch 50, batch 450, loss[loss=0.01005, audio_tagging_loss=0.01005, over 25000.00 frames. ], tot_loss[loss=0.01101, audio_tagging_loss=0.01101, over 4428146.59 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:01:15,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1559960.0, ans=0.125 2023-12-24 07:01:28,027 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.735e+01 4.049e+01 4.184e+01 4.376e+01 4.940e+01, threshold=8.368e+01, percent-clipped=0.0 2023-12-24 07:01:28,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1560026.6666666667, ans=0.0 2023-12-24 07:01:54,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1560160.0, ans=0.125 2023-12-24 07:01:57,592 INFO [train.py:886] (1/4) Epoch 50, batch 500, loss[loss=0.009592, audio_tagging_loss=0.009592, over 25000.00 frames. ], tot_loss[loss=0.01082, audio_tagging_loss=0.01082, over 4545854.31 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:01:58,018 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.81 vs. limit=22.5 2023-12-24 07:02:05,800 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.46 vs. limit=10.0 2023-12-24 07:02:09,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1560293.3333333333, ans=0.1 2023-12-24 07:02:10,716 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.35 vs. limit=15.0 2023-12-24 07:02:11,603 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.23 vs. limit=15.0 2023-12-24 07:02:49,015 INFO [train.py:886] (1/4) Epoch 50, batch 550, loss[loss=0.009824, audio_tagging_loss=0.009824, over 24013.00 frames. ], tot_loss[loss=0.01078, audio_tagging_loss=0.01078, over 4639849.95 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:02:56,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1560560.0, ans=0.125 2023-12-24 07:03:12,063 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.758e+01 4.103e+01 4.273e+01 4.476e+01 5.412e+01, threshold=8.546e+01, percent-clipped=0.0 2023-12-24 07:03:12,682 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=4.15 vs. limit=15.0 2023-12-24 07:03:41,673 INFO [train.py:886] (1/4) Epoch 50, batch 600, loss[loss=0.0115, audio_tagging_loss=0.0115, over 24750.00 frames. ], tot_loss[loss=0.01077, audio_tagging_loss=0.01077, over 4710223.54 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:03:50,547 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.18 vs. limit=15.0 2023-12-24 07:04:09,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1561026.6666666667, ans=0.1 2023-12-24 07:04:12,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1561093.3333333333, ans=0.125 2023-12-24 07:04:12,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1561093.3333333333, ans=0.1 2023-12-24 07:04:17,502 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.68 vs. limit=8.0 2023-12-24 07:04:17,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1561093.3333333333, ans=0.125 2023-12-24 07:04:34,497 INFO [train.py:886] (1/4) Epoch 50, batch 650, loss[loss=0.01018, audio_tagging_loss=0.01018, over 24750.00 frames. ], tot_loss[loss=0.01086, audio_tagging_loss=0.01086, over 4759784.14 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:04:47,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1561293.3333333333, ans=0.125 2023-12-24 07:04:56,134 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.774e+01 4.125e+01 4.275e+01 4.523e+01 5.661e+01, threshold=8.549e+01, percent-clipped=0.0 2023-12-24 07:05:20,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1561493.3333333333, ans=0.0 2023-12-24 07:05:25,712 INFO [train.py:886] (1/4) Epoch 50, batch 700, loss[loss=0.01058, audio_tagging_loss=0.01058, over 24750.00 frames. ], tot_loss[loss=0.0108, audio_tagging_loss=0.0108, over 4797912.39 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:05:38,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1561626.6666666667, ans=0.0 2023-12-24 07:05:56,541 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.00 vs. limit=15.0 2023-12-24 07:06:01,513 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.55 vs. limit=12.0 2023-12-24 07:06:09,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1561826.6666666667, ans=0.0 2023-12-24 07:06:10,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1561826.6666666667, ans=0.0 2023-12-24 07:06:15,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1561826.6666666667, ans=0.125 2023-12-24 07:06:18,173 INFO [train.py:886] (1/4) Epoch 50, batch 750, loss[loss=0.0111, audio_tagging_loss=0.0111, over 25000.00 frames. ], tot_loss[loss=0.01065, audio_tagging_loss=0.01065, over 4833103.73 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:06:23,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1561893.3333333333, ans=0.1 2023-12-24 07:06:29,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1561960.0, ans=0.125 2023-12-24 07:06:33,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1561960.0, ans=0.0 2023-12-24 07:06:40,460 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.722e+01 4.000e+01 4.156e+01 4.359e+01 5.451e+01, threshold=8.313e+01, percent-clipped=0.0 2023-12-24 07:06:40,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1562026.6666666667, ans=0.125 2023-12-24 07:07:03,443 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.44 vs. limit=6.0 2023-12-24 07:07:09,176 INFO [train.py:886] (1/4) Epoch 50, batch 800, loss[loss=0.009723, audio_tagging_loss=0.009723, over 25000.00 frames. ], tot_loss[loss=0.01055, audio_tagging_loss=0.01055, over 4865466.55 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:07:19,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1562293.3333333333, ans=0.0 2023-12-24 07:08:00,084 INFO [train.py:886] (1/4) Epoch 50, batch 850, loss[loss=0.01217, audio_tagging_loss=0.01217, over 24750.00 frames. ], tot_loss[loss=0.01063, audio_tagging_loss=0.01063, over 4882024.88 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:08:09,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1562626.6666666667, ans=0.125 2023-12-24 07:08:10,103 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.06 vs. limit=22.5 2023-12-24 07:08:23,900 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.686e+01 3.999e+01 4.249e+01 4.473e+01 4.944e+01, threshold=8.498e+01, percent-clipped=0.0 2023-12-24 07:08:26,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1562693.3333333333, ans=0.0 2023-12-24 07:08:48,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1562826.6666666667, ans=0.125 2023-12-24 07:08:51,885 INFO [train.py:886] (1/4) Epoch 50, batch 900, loss[loss=0.008302, audio_tagging_loss=0.008302, over 25000.00 frames. ], tot_loss[loss=0.01068, audio_tagging_loss=0.01068, over 4895840.09 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:09:08,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=1562960.0, ans=0.05 2023-12-24 07:09:29,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1563093.3333333333, ans=0.125 2023-12-24 07:09:34,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1563160.0, ans=0.95 2023-12-24 07:09:36,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1563160.0, ans=0.2 2023-12-24 07:09:37,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1563160.0, ans=0.0 2023-12-24 07:09:43,836 INFO [train.py:886] (1/4) Epoch 50, batch 950, loss[loss=0.008978, audio_tagging_loss=0.008978, over 24750.00 frames. ], tot_loss[loss=0.01071, audio_tagging_loss=0.01071, over 4904094.48 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:09:52,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1563226.6666666667, ans=0.125 2023-12-24 07:09:54,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1563293.3333333333, ans=0.125 2023-12-24 07:09:58,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1563293.3333333333, ans=0.0 2023-12-24 07:10:05,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1563360.0, ans=0.0 2023-12-24 07:10:07,346 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.705e+01 4.079e+01 4.256e+01 4.403e+01 5.816e+01, threshold=8.513e+01, percent-clipped=0.0 2023-12-24 07:10:10,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1563360.0, ans=0.1 2023-12-24 07:10:27,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1563493.3333333333, ans=0.1 2023-12-24 07:10:32,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1563493.3333333333, ans=0.05 2023-12-24 07:10:36,915 INFO [train.py:886] (1/4) Epoch 50, batch 1000, loss[loss=0.01112, audio_tagging_loss=0.01112, over 25000.00 frames. ], tot_loss[loss=0.01071, audio_tagging_loss=0.01071, over 4906795.47 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:10:48,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1563626.6666666667, ans=0.0 2023-12-24 07:11:28,045 INFO [train.py:886] (1/4) Epoch 50, batch 1050, loss[loss=0.008842, audio_tagging_loss=0.008842, over 24750.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4920682.69 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:11:51,237 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.709e+01 4.047e+01 4.197e+01 4.433e+01 5.378e+01, threshold=8.395e+01, percent-clipped=0.0 2023-12-24 07:11:51,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1564026.6666666667, ans=0.125 2023-12-24 07:12:05,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.23 vs. limit=12.0 2023-12-24 07:12:20,616 INFO [train.py:886] (1/4) Epoch 50, batch 1100, loss[loss=0.01242, audio_tagging_loss=0.01242, over 24750.00 frames. ], tot_loss[loss=0.01052, audio_tagging_loss=0.01052, over 4930034.25 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:12:42,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1564360.0, ans=0.125 2023-12-24 07:13:10,375 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.24 vs. limit=15.0 2023-12-24 07:13:11,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1564560.0, ans=0.125 2023-12-24 07:13:12,698 INFO [train.py:886] (1/4) Epoch 50, batch 1150, loss[loss=0.0113, audio_tagging_loss=0.0113, over 25000.00 frames. ], tot_loss[loss=0.01052, audio_tagging_loss=0.01052, over 4939647.12 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:13:12,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1564560.0, ans=0.0 2023-12-24 07:13:34,236 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.707e+01 4.052e+01 4.234e+01 4.430e+01 4.895e+01, threshold=8.468e+01, percent-clipped=0.0 2023-12-24 07:13:39,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1564693.3333333333, ans=0.0 2023-12-24 07:13:46,540 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.03 vs. limit=15.0 2023-12-24 07:13:47,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1564760.0, ans=0.125 2023-12-24 07:13:48,198 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.45 vs. limit=10.0 2023-12-24 07:13:54,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1564826.6666666667, ans=0.125 2023-12-24 07:13:59,342 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.70 vs. limit=22.5 2023-12-24 07:14:03,665 INFO [train.py:886] (1/4) Epoch 50, batch 1200, loss[loss=0.009059, audio_tagging_loss=0.009059, over 25000.00 frames. ], tot_loss[loss=0.01057, audio_tagging_loss=0.01057, over 4948239.45 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:14:03,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1564893.3333333333, ans=0.07 2023-12-24 07:14:16,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.78 vs. limit=15.0 2023-12-24 07:14:25,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1565026.6666666667, ans=0.0 2023-12-24 07:14:27,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1565026.6666666667, ans=0.125 2023-12-24 07:14:32,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1565026.6666666667, ans=0.0 2023-12-24 07:14:43,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1565093.3333333333, ans=0.125 2023-12-24 07:14:43,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1565093.3333333333, ans=0.1 2023-12-24 07:14:50,146 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.34 vs. limit=5.0 2023-12-24 07:14:51,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1565160.0, ans=0.125 2023-12-24 07:14:55,936 INFO [train.py:886] (1/4) Epoch 50, batch 1250, loss[loss=0.01223, audio_tagging_loss=0.01223, over 24750.00 frames. ], tot_loss[loss=0.01066, audio_tagging_loss=0.01066, over 4947209.73 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:15:03,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1565226.6666666667, ans=10.0 2023-12-24 07:15:19,848 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.704e+01 4.120e+01 4.290e+01 4.537e+01 5.051e+01, threshold=8.580e+01, percent-clipped=0.0 2023-12-24 07:15:34,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.51 vs. limit=12.0 2023-12-24 07:15:36,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1565426.6666666667, ans=0.2 2023-12-24 07:15:37,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1565493.3333333333, ans=0.2 2023-12-24 07:15:47,910 INFO [train.py:886] (1/4) Epoch 50, batch 1300, loss[loss=0.009814, audio_tagging_loss=0.009814, over 24083.00 frames. ], tot_loss[loss=0.01072, audio_tagging_loss=0.01072, over 4945459.91 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:16:26,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1565760.0, ans=0.125 2023-12-24 07:16:32,073 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2023-12-24 07:16:39,857 INFO [train.py:886] (1/4) Epoch 50, batch 1350, loss[loss=0.01034, audio_tagging_loss=0.01034, over 25000.00 frames. ], tot_loss[loss=0.01067, audio_tagging_loss=0.01067, over 4946915.71 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:16:42,230 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.24 vs. limit=15.0 2023-12-24 07:16:52,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1565960.0, ans=0.07 2023-12-24 07:17:01,929 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.679e+01 4.124e+01 4.286e+01 4.481e+01 5.287e+01, threshold=8.572e+01, percent-clipped=0.0 2023-12-24 07:17:18,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1566093.3333333333, ans=0.0 2023-12-24 07:17:30,313 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.39 vs. limit=22.5 2023-12-24 07:17:30,535 INFO [train.py:886] (1/4) Epoch 50, batch 1400, loss[loss=0.01065, audio_tagging_loss=0.01065, over 25000.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4950254.78 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:17:55,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1566360.0, ans=0.04949747468305833 2023-12-24 07:18:02,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1566426.6666666667, ans=0.125 2023-12-24 07:18:11,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1566493.3333333333, ans=0.0 2023-12-24 07:18:11,651 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.85 vs. limit=15.0 2023-12-24 07:18:19,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1566493.3333333333, ans=0.125 2023-12-24 07:18:20,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1566560.0, ans=0.1 2023-12-24 07:18:21,444 INFO [train.py:886] (1/4) Epoch 50, batch 1450, loss[loss=0.009493, audio_tagging_loss=0.009493, over 25000.00 frames. ], tot_loss[loss=0.01058, audio_tagging_loss=0.01058, over 4950187.65 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:18:22,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1566560.0, ans=0.125 2023-12-24 07:18:35,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1566626.6666666667, ans=0.125 2023-12-24 07:18:40,304 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.94 vs. limit=12.0 2023-12-24 07:18:41,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1566626.6666666667, ans=0.2 2023-12-24 07:18:44,689 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.709e+01 4.091e+01 4.245e+01 4.456e+01 5.361e+01, threshold=8.489e+01, percent-clipped=0.0 2023-12-24 07:18:44,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1566693.3333333333, ans=0.125 2023-12-24 07:18:57,900 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.74 vs. limit=15.0 2023-12-24 07:19:04,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1566826.6666666667, ans=0.125 2023-12-24 07:19:09,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1566826.6666666667, ans=0.125 2023-12-24 07:19:14,059 INFO [train.py:886] (1/4) Epoch 50, batch 1500, loss[loss=0.01071, audio_tagging_loss=0.01071, over 24750.00 frames. ], tot_loss[loss=0.01056, audio_tagging_loss=0.01056, over 4953914.62 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:19:26,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1566960.0, ans=0.125 2023-12-24 07:19:41,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1567026.6666666667, ans=0.2 2023-12-24 07:19:41,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1567026.6666666667, ans=0.125 2023-12-24 07:19:43,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1567026.6666666667, ans=0.125 2023-12-24 07:20:06,502 INFO [train.py:886] (1/4) Epoch 50, batch 1550, loss[loss=0.01005, audio_tagging_loss=0.01005, over 24750.00 frames. ], tot_loss[loss=0.01066, audio_tagging_loss=0.01066, over 4951266.98 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:20:06,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1567226.6666666667, ans=0.125 2023-12-24 07:20:11,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1567226.6666666667, ans=0.2 2023-12-24 07:20:12,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1567226.6666666667, ans=0.1 2023-12-24 07:20:28,750 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.759e+01 4.163e+01 4.336e+01 4.479e+01 4.983e+01, threshold=8.671e+01, percent-clipped=0.0 2023-12-24 07:20:42,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1567426.6666666667, ans=0.5 2023-12-24 07:20:44,400 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.38 vs. limit=22.5 2023-12-24 07:20:45,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1567426.6666666667, ans=0.0 2023-12-24 07:20:57,218 INFO [train.py:886] (1/4) Epoch 50, batch 1600, loss[loss=0.01056, audio_tagging_loss=0.01056, over 24024.00 frames. ], tot_loss[loss=0.01072, audio_tagging_loss=0.01072, over 4951713.00 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:21:03,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1567560.0, ans=0.125 2023-12-24 07:21:04,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1567560.0, ans=0.125 2023-12-24 07:21:27,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1567760.0, ans=0.0 2023-12-24 07:21:36,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.19 vs. limit=12.0 2023-12-24 07:21:49,919 INFO [train.py:886] (1/4) Epoch 50, batch 1650, loss[loss=0.008935, audio_tagging_loss=0.008935, over 25000.00 frames. ], tot_loss[loss=0.01063, audio_tagging_loss=0.01063, over 4950402.90 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:22:08,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1567960.0, ans=0.035 2023-12-24 07:22:14,136 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.627e+01 4.076e+01 4.290e+01 4.470e+01 5.188e+01, threshold=8.579e+01, percent-clipped=0.0 2023-12-24 07:22:24,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.04 vs. limit=22.5 2023-12-24 07:22:33,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1568160.0, ans=0.1 2023-12-24 07:22:38,138 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.16 vs. limit=22.5 2023-12-24 07:22:42,265 INFO [train.py:886] (1/4) Epoch 50, batch 1700, loss[loss=0.01023, audio_tagging_loss=0.01023, over 25000.00 frames. ], tot_loss[loss=0.01054, audio_tagging_loss=0.01054, over 4953348.37 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:22:52,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=1568293.3333333333, ans=0.95 2023-12-24 07:23:02,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1568360.0, ans=0.125 2023-12-24 07:23:28,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1568493.3333333333, ans=0.125 2023-12-24 07:23:34,036 INFO [train.py:886] (1/4) Epoch 50, batch 1750, loss[loss=0.01102, audio_tagging_loss=0.01102, over 25000.00 frames. ], tot_loss[loss=0.01046, audio_tagging_loss=0.01046, over 4951305.57 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:23:40,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1568560.0, ans=0.07 2023-12-24 07:23:53,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1568626.6666666667, ans=0.0 2023-12-24 07:23:54,598 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=15.0 2023-12-24 07:23:57,034 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.573e+01 3.982e+01 4.200e+01 4.347e+01 4.919e+01, threshold=8.401e+01, percent-clipped=0.0 2023-12-24 07:24:05,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1568760.0, ans=10.0 2023-12-24 07:24:07,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1568760.0, ans=0.07 2023-12-24 07:24:08,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1568760.0, ans=0.125 2023-12-24 07:24:10,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1568760.0, ans=0.125 2023-12-24 07:24:22,256 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.69 vs. limit=15.0 2023-12-24 07:24:26,392 INFO [train.py:886] (1/4) Epoch 50, batch 1800, loss[loss=0.01108, audio_tagging_loss=0.01108, over 25000.00 frames. ], tot_loss[loss=0.01044, audio_tagging_loss=0.01044, over 4949750.90 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:24:26,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1568893.3333333333, ans=0.2 2023-12-24 07:24:37,505 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=22.5 2023-12-24 07:24:38,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1568960.0, ans=0.95 2023-12-24 07:24:44,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1569026.6666666667, ans=0.0 2023-12-24 07:25:15,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1569160.0, ans=0.2 2023-12-24 07:25:16,822 INFO [train.py:886] (1/4) Epoch 50, batch 1850, loss[loss=0.01401, audio_tagging_loss=0.01401, over 24945.00 frames. ], tot_loss[loss=0.01053, audio_tagging_loss=0.01053, over 4949246.43 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:25:18,423 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.92 vs. limit=10.0 2023-12-24 07:25:30,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1569293.3333333333, ans=0.125 2023-12-24 07:25:30,809 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.14 vs. limit=15.0 2023-12-24 07:25:40,881 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.702e+01 4.153e+01 4.266e+01 4.443e+01 5.377e+01, threshold=8.532e+01, percent-clipped=0.0 2023-12-24 07:26:02,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=1569493.3333333333, ans=10.0 2023-12-24 07:26:03,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=1569493.3333333333, ans=22.5 2023-12-24 07:26:10,195 INFO [train.py:886] (1/4) Epoch 50, batch 1900, loss[loss=0.01248, audio_tagging_loss=0.01248, over 24750.00 frames. ], tot_loss[loss=0.01062, audio_tagging_loss=0.01062, over 4940712.36 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:26:36,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1569693.3333333333, ans=0.1 2023-12-24 07:26:56,234 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.86 vs. limit=15.0 2023-12-24 07:27:01,931 INFO [train.py:886] (1/4) Epoch 50, batch 1950, loss[loss=0.01113, audio_tagging_loss=0.01113, over 24750.00 frames. ], tot_loss[loss=0.01053, audio_tagging_loss=0.01053, over 4934722.49 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:27:02,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1569893.3333333333, ans=0.0 2023-12-24 07:27:09,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1569893.3333333333, ans=0.0 2023-12-24 07:27:11,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1569960.0, ans=0.1 2023-12-24 07:27:12,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.15 vs. limit=22.5 2023-12-24 07:27:14,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1569960.0, ans=0.125 2023-12-24 07:27:17,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1569960.0, ans=0.125 2023-12-24 07:27:17,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1569960.0, ans=0.2 2023-12-24 07:27:21,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=15.0 2023-12-24 07:27:23,193 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.684e+01 4.036e+01 4.252e+01 4.490e+01 5.188e+01, threshold=8.504e+01, percent-clipped=0.0 2023-12-24 07:27:37,734 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 07:27:38,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1570093.3333333333, ans=0.0 2023-12-24 07:27:49,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1570160.0, ans=0.0 2023-12-24 07:27:51,717 INFO [train.py:886] (1/4) Epoch 50, batch 2000, loss[loss=0.01069, audio_tagging_loss=0.01069, over 24750.00 frames. ], tot_loss[loss=0.01054, audio_tagging_loss=0.01054, over 4942268.14 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:27:53,988 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.53 vs. limit=15.0 2023-12-24 07:28:11,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1570293.3333333333, ans=10.0 2023-12-24 07:28:11,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1570293.3333333333, ans=0.025 2023-12-24 07:28:18,545 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.02 vs. limit=22.5 2023-12-24 07:28:38,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1570493.3333333333, ans=0.125 2023-12-24 07:28:44,776 INFO [train.py:886] (1/4) Epoch 50, batch 2050, loss[loss=0.008501, audio_tagging_loss=0.008501, over 25000.00 frames. ], tot_loss[loss=0.01047, audio_tagging_loss=0.01047, over 4943080.89 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 64.0 2023-12-24 07:28:54,682 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.91 vs. limit=10.0 2023-12-24 07:29:06,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1570693.3333333333, ans=0.0 2023-12-24 07:29:06,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1570693.3333333333, ans=0.0 2023-12-24 07:29:07,141 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.692e+01 4.007e+01 4.186e+01 4.409e+01 4.904e+01, threshold=8.372e+01, percent-clipped=0.0 2023-12-24 07:29:15,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1570760.0, ans=0.125 2023-12-24 07:29:16,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1570760.0, ans=0.0 2023-12-24 07:29:35,805 INFO [train.py:886] (1/4) Epoch 50, batch 2100, loss[loss=0.01162, audio_tagging_loss=0.01162, over 25000.00 frames. ], tot_loss[loss=0.01047, audio_tagging_loss=0.01047, over 4948587.10 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 64.0 2023-12-24 07:29:36,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1570893.3333333333, ans=0.125 2023-12-24 07:29:36,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1570893.3333333333, ans=0.0 2023-12-24 07:29:41,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.14 vs. limit=15.0 2023-12-24 07:29:44,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1570893.3333333333, ans=0.0 2023-12-24 07:29:45,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1570960.0, ans=0.0 2023-12-24 07:29:54,695 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.95 vs. limit=12.0 2023-12-24 07:29:57,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1571026.6666666667, ans=0.125 2023-12-24 07:29:58,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1571026.6666666667, ans=0.1 2023-12-24 07:30:06,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1571093.3333333333, ans=0.125 2023-12-24 07:30:28,013 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.69 vs. limit=15.0 2023-12-24 07:30:28,619 INFO [train.py:886] (1/4) Epoch 50, batch 2150, loss[loss=0.009864, audio_tagging_loss=0.009864, over 24750.00 frames. ], tot_loss[loss=0.01051, audio_tagging_loss=0.01051, over 4949666.94 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 64.0 2023-12-24 07:30:31,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1571226.6666666667, ans=0.125 2023-12-24 07:30:31,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=1571226.6666666667, ans=15.0 2023-12-24 07:30:40,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1571293.3333333333, ans=0.125 2023-12-24 07:30:41,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1571293.3333333333, ans=0.125 2023-12-24 07:30:44,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1571293.3333333333, ans=0.1 2023-12-24 07:30:51,641 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.676e+01 4.091e+01 4.279e+01 4.499e+01 5.273e+01, threshold=8.558e+01, percent-clipped=0.0 2023-12-24 07:30:55,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=1571360.0, ans=22.5 2023-12-24 07:31:21,072 INFO [train.py:886] (1/4) Epoch 50, batch 2200, loss[loss=0.01031, audio_tagging_loss=0.01031, over 24750.00 frames. ], tot_loss[loss=0.01057, audio_tagging_loss=0.01057, over 4949127.04 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 64.0 2023-12-24 07:32:04,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1571826.6666666667, ans=0.125 2023-12-24 07:32:06,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1571826.6666666667, ans=0.2 2023-12-24 07:32:12,252 INFO [train.py:886] (1/4) Epoch 50, batch 2250, loss[loss=0.008008, audio_tagging_loss=0.008008, over 25000.00 frames. ], tot_loss[loss=0.01054, audio_tagging_loss=0.01054, over 4945437.30 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 64.0 2023-12-24 07:32:35,510 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.741e+01 4.093e+01 4.254e+01 4.470e+01 6.173e+01, threshold=8.508e+01, percent-clipped=0.0 2023-12-24 07:33:04,680 INFO [train.py:886] (1/4) Epoch 50, batch 2300, loss[loss=0.01066, audio_tagging_loss=0.01066, over 24044.00 frames. ], tot_loss[loss=0.01046, audio_tagging_loss=0.01046, over 4947074.13 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 64.0 2023-12-24 07:33:09,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1572226.6666666667, ans=0.1 2023-12-24 07:33:14,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1572293.3333333333, ans=0.1 2023-12-24 07:33:31,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1572360.0, ans=0.125 2023-12-24 07:33:50,925 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.38 vs. limit=15.0 2023-12-24 07:33:56,439 INFO [train.py:886] (1/4) Epoch 50, batch 2350, loss[loss=0.01017, audio_tagging_loss=0.01017, over 24750.00 frames. ], tot_loss[loss=0.01047, audio_tagging_loss=0.01047, over 4954841.68 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 64.0 2023-12-24 07:33:58,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1572560.0, ans=0.0 2023-12-24 07:34:13,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1572626.6666666667, ans=0.2 2023-12-24 07:34:18,579 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.700e+01 4.055e+01 4.217e+01 4.418e+01 5.746e+01, threshold=8.434e+01, percent-clipped=0.0 2023-12-24 07:34:36,307 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 07:34:43,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1572826.6666666667, ans=0.125 2023-12-24 07:34:46,807 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.26 vs. limit=15.0 2023-12-24 07:34:48,220 INFO [train.py:886] (1/4) Epoch 50, batch 2400, loss[loss=0.01204, audio_tagging_loss=0.01204, over 25000.00 frames. ], tot_loss[loss=0.0104, audio_tagging_loss=0.0104, over 4952713.66 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 64.0 2023-12-24 07:34:53,369 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.86 vs. limit=22.5 2023-12-24 07:35:08,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1573026.6666666667, ans=0.125 2023-12-24 07:35:17,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1573026.6666666667, ans=0.2 2023-12-24 07:35:24,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1573093.3333333333, ans=0.0 2023-12-24 07:35:24,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1573093.3333333333, ans=0.1 2023-12-24 07:35:26,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1573093.3333333333, ans=0.0 2023-12-24 07:35:40,406 INFO [train.py:886] (1/4) Epoch 50, batch 2450, loss[loss=0.009401, audio_tagging_loss=0.009401, over 25000.00 frames. ], tot_loss[loss=0.0104, audio_tagging_loss=0.0104, over 4957809.77 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 64.0 2023-12-24 07:35:47,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1573226.6666666667, ans=0.0 2023-12-24 07:35:59,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1573293.3333333333, ans=0.04949747468305833 2023-12-24 07:36:04,787 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.654e+01 4.083e+01 4.271e+01 4.433e+01 5.085e+01, threshold=8.543e+01, percent-clipped=0.0 2023-12-24 07:36:04,988 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 07:36:12,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1573360.0, ans=0.125 2023-12-24 07:36:20,987 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.85 vs. limit=15.0 2023-12-24 07:36:33,468 INFO [train.py:886] (1/4) Epoch 50, batch 2500, loss[loss=0.01325, audio_tagging_loss=0.01325, over 24949.00 frames. ], tot_loss[loss=0.01056, audio_tagging_loss=0.01056, over 4952405.72 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:36:39,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1573560.0, ans=0.125 2023-12-24 07:36:47,334 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.97 vs. limit=15.0 2023-12-24 07:36:47,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1573626.6666666667, ans=0.125 2023-12-24 07:36:58,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1573693.3333333333, ans=0.125 2023-12-24 07:36:59,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1573693.3333333333, ans=0.1 2023-12-24 07:37:04,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1573760.0, ans=0.125 2023-12-24 07:37:11,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1573760.0, ans=0.0 2023-12-24 07:37:16,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1573826.6666666667, ans=0.0 2023-12-24 07:37:19,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1573826.6666666667, ans=0.125 2023-12-24 07:37:22,757 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.44 vs. limit=15.0 2023-12-24 07:37:25,234 INFO [train.py:886] (1/4) Epoch 50, batch 2550, loss[loss=0.01218, audio_tagging_loss=0.01218, over 24750.00 frames. ], tot_loss[loss=0.01065, audio_tagging_loss=0.01065, over 4951679.28 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:37:28,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1573893.3333333333, ans=0.125 2023-12-24 07:37:43,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1573960.0, ans=0.04949747468305833 2023-12-24 07:37:44,931 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2023-12-24 07:37:46,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1574026.6666666667, ans=0.2 2023-12-24 07:37:49,908 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.655e+01 4.126e+01 4.329e+01 4.523e+01 5.381e+01, threshold=8.659e+01, percent-clipped=0.0 2023-12-24 07:37:53,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1574026.6666666667, ans=0.125 2023-12-24 07:38:03,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1574093.3333333333, ans=0.125 2023-12-24 07:38:05,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1574160.0, ans=0.1 2023-12-24 07:38:11,578 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.87 vs. limit=15.0 2023-12-24 07:38:18,389 INFO [train.py:886] (1/4) Epoch 50, batch 2600, loss[loss=0.01015, audio_tagging_loss=0.01015, over 24084.00 frames. ], tot_loss[loss=0.01065, audio_tagging_loss=0.01065, over 4948486.22 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:39:09,503 INFO [train.py:886] (1/4) Epoch 50, batch 2650, loss[loss=0.01106, audio_tagging_loss=0.01106, over 25000.00 frames. ], tot_loss[loss=0.01054, audio_tagging_loss=0.01054, over 4948319.39 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:39:21,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1574626.6666666667, ans=0.2 2023-12-24 07:39:26,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.69 vs. limit=15.0 2023-12-24 07:39:33,610 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.680e+01 4.070e+01 4.297e+01 4.488e+01 5.436e+01, threshold=8.593e+01, percent-clipped=0.0 2023-12-24 07:39:35,845 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.51 vs. limit=15.0 2023-12-24 07:39:52,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1574826.6666666667, ans=0.125 2023-12-24 07:39:53,841 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 07:39:57,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1574826.6666666667, ans=0.07 2023-12-24 07:40:01,877 INFO [train.py:886] (1/4) Epoch 50, batch 2700, loss[loss=0.013, audio_tagging_loss=0.013, over 25000.00 frames. ], tot_loss[loss=0.0105, audio_tagging_loss=0.0105, over 4956433.47 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:40:06,027 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2023-12-24 07:40:25,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.78 vs. limit=15.0 2023-12-24 07:40:35,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1575093.3333333333, ans=0.0 2023-12-24 07:40:46,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1575160.0, ans=0.1 2023-12-24 07:40:53,317 INFO [train.py:886] (1/4) Epoch 50, batch 2750, loss[loss=0.009173, audio_tagging_loss=0.009173, over 25000.00 frames. ], tot_loss[loss=0.01043, audio_tagging_loss=0.01043, over 4956902.43 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:41:06,394 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.50 vs. limit=12.0 2023-12-24 07:41:16,431 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.715e+01 4.042e+01 4.295e+01 4.516e+01 5.122e+01, threshold=8.590e+01, percent-clipped=0.0 2023-12-24 07:41:19,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1575360.0, ans=0.1 2023-12-24 07:41:20,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1575360.0, ans=0.125 2023-12-24 07:41:28,133 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.88 vs. limit=10.0 2023-12-24 07:41:39,040 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.42 vs. limit=15.0 2023-12-24 07:41:41,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1575493.3333333333, ans=0.0 2023-12-24 07:41:45,223 INFO [train.py:886] (1/4) Epoch 50, batch 2800, loss[loss=0.01105, audio_tagging_loss=0.01105, over 24750.00 frames. ], tot_loss[loss=0.01052, audio_tagging_loss=0.01052, over 4957567.91 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:41:51,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1575560.0, ans=0.0 2023-12-24 07:41:58,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1575626.6666666667, ans=0.95 2023-12-24 07:42:09,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1575693.3333333333, ans=0.125 2023-12-24 07:42:13,775 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 07:42:20,780 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.48 vs. limit=15.0 2023-12-24 07:42:38,582 INFO [train.py:886] (1/4) Epoch 50, batch 2850, loss[loss=0.009965, audio_tagging_loss=0.009965, over 24750.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4949771.16 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:42:46,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1575893.3333333333, ans=0.2 2023-12-24 07:43:01,387 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.781e+01 4.104e+01 4.361e+01 4.546e+01 5.152e+01, threshold=8.721e+01, percent-clipped=0.0 2023-12-24 07:43:02,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1576026.6666666667, ans=0.125 2023-12-24 07:43:04,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1576026.6666666667, ans=0.125 2023-12-24 07:43:09,001 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.84 vs. limit=12.0 2023-12-24 07:43:17,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1576093.3333333333, ans=0.125 2023-12-24 07:43:25,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=1576160.0, ans=0.2 2023-12-24 07:43:28,303 INFO [train.py:886] (1/4) Epoch 50, batch 2900, loss[loss=0.007673, audio_tagging_loss=0.007673, over 24750.00 frames. ], tot_loss[loss=0.01065, audio_tagging_loss=0.01065, over 4947187.42 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:43:57,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1576360.0, ans=0.125 2023-12-24 07:44:14,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1576493.3333333333, ans=0.2 2023-12-24 07:44:20,092 INFO [train.py:886] (1/4) Epoch 50, batch 2950, loss[loss=0.01006, audio_tagging_loss=0.01006, over 24750.00 frames. ], tot_loss[loss=0.01054, audio_tagging_loss=0.01054, over 4949426.93 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:44:27,270 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.85 vs. limit=15.0 2023-12-24 07:44:37,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1576626.6666666667, ans=0.125 2023-12-24 07:44:42,231 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.52 vs. limit=15.0 2023-12-24 07:44:44,665 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.679e+01 4.048e+01 4.207e+01 4.410e+01 5.096e+01, threshold=8.415e+01, percent-clipped=0.0 2023-12-24 07:44:53,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1576760.0, ans=0.125 2023-12-24 07:45:12,383 INFO [train.py:886] (1/4) Epoch 50, batch 3000, loss[loss=0.008424, audio_tagging_loss=0.008424, over 25000.00 frames. ], tot_loss[loss=0.01042, audio_tagging_loss=0.01042, over 4949508.08 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:45:12,384 INFO [train.py:909] (1/4) Computing validation loss 2023-12-24 07:45:28,080 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.5477, 3.3978, 3.9962, 4.2667], device='cuda:1') 2023-12-24 07:45:33,532 INFO [train.py:917] (1/4) Epoch 50, validation: loss=0.03799, audio_tagging_loss=0.03799, over 3737520.00 frames. 2023-12-24 07:45:33,533 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14765MB 2023-12-24 07:45:34,826 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.39 vs. limit=15.0 2023-12-24 07:45:37,430 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.88 vs. limit=6.0 2023-12-24 07:45:50,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1576960.0, ans=0.0 2023-12-24 07:46:15,569 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.82 vs. limit=15.0 2023-12-24 07:46:23,544 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.27 vs. limit=15.0 2023-12-24 07:46:25,136 INFO [train.py:886] (1/4) Epoch 50, batch 3050, loss[loss=0.008741, audio_tagging_loss=0.008741, over 21716.00 frames. ], tot_loss[loss=0.01039, audio_tagging_loss=0.01039, over 4951769.50 frames. ], batch size: 107, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:46:27,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1577226.6666666667, ans=0.125 2023-12-24 07:46:30,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1577226.6666666667, ans=0.1 2023-12-24 07:46:46,959 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.51 vs. limit=22.5 2023-12-24 07:46:49,324 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.806e+01 4.000e+01 4.179e+01 4.391e+01 4.830e+01, threshold=8.357e+01, percent-clipped=0.0 2023-12-24 07:47:06,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.54 vs. limit=15.0 2023-12-24 07:47:09,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1577493.3333333333, ans=0.125 2023-12-24 07:47:15,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1577560.0, ans=0.125 2023-12-24 07:47:16,873 INFO [train.py:886] (1/4) Epoch 50, batch 3100, loss[loss=0.01117, audio_tagging_loss=0.01117, over 24750.00 frames. ], tot_loss[loss=0.01046, audio_tagging_loss=0.01046, over 4954180.85 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:47:20,818 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.36 vs. limit=6.0 2023-12-24 07:47:32,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1577626.6666666667, ans=0.0 2023-12-24 07:47:33,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1577626.6666666667, ans=0.0 2023-12-24 07:47:47,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1577760.0, ans=0.125 2023-12-24 07:48:00,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1577826.6666666667, ans=0.0 2023-12-24 07:48:06,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1577893.3333333333, ans=0.125 2023-12-24 07:48:07,474 INFO [train.py:886] (1/4) Epoch 50, batch 3150, loss[loss=0.0102, audio_tagging_loss=0.0102, over 23969.00 frames. ], tot_loss[loss=0.01054, audio_tagging_loss=0.01054, over 4948647.12 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:48:07,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1577893.3333333333, ans=0.125 2023-12-24 07:48:09,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1577893.3333333333, ans=0.2 2023-12-24 07:48:17,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1577960.0, ans=0.125 2023-12-24 07:48:27,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1577960.0, ans=0.125 2023-12-24 07:48:28,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1578026.6666666667, ans=0.125 2023-12-24 07:48:31,862 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.715e+01 4.159e+01 4.326e+01 4.547e+01 5.411e+01, threshold=8.653e+01, percent-clipped=0.0 2023-12-24 07:48:32,284 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.58 vs. limit=15.0 2023-12-24 07:48:44,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1578093.3333333333, ans=0.125 2023-12-24 07:48:52,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1578160.0, ans=0.125 2023-12-24 07:48:55,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1578160.0, ans=0.1 2023-12-24 07:48:58,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1578160.0, ans=0.125 2023-12-24 07:49:00,294 INFO [train.py:886] (1/4) Epoch 50, batch 3200, loss[loss=0.0103, audio_tagging_loss=0.0103, over 24750.00 frames. ], tot_loss[loss=0.01057, audio_tagging_loss=0.01057, over 4945628.51 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:49:13,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=1578293.3333333333, ans=15.0 2023-12-24 07:49:19,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1578360.0, ans=0.2 2023-12-24 07:49:29,242 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.21 vs. limit=15.0 2023-12-24 07:49:30,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1578426.6666666667, ans=0.125 2023-12-24 07:49:35,997 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.16 vs. limit=22.5 2023-12-24 07:49:46,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1578493.3333333333, ans=0.125 2023-12-24 07:49:52,067 INFO [train.py:886] (1/4) Epoch 50, batch 3250, loss[loss=0.008754, audio_tagging_loss=0.008754, over 24750.00 frames. ], tot_loss[loss=0.01051, audio_tagging_loss=0.01051, over 4945560.35 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:49:56,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1578560.0, ans=0.0 2023-12-24 07:49:58,930 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.64 vs. limit=15.0 2023-12-24 07:50:10,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1578626.6666666667, ans=0.0 2023-12-24 07:50:15,401 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.606e+01 4.040e+01 4.194e+01 4.403e+01 5.112e+01, threshold=8.389e+01, percent-clipped=0.0 2023-12-24 07:50:15,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1578693.3333333333, ans=0.1 2023-12-24 07:50:16,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1578693.3333333333, ans=0.125 2023-12-24 07:50:28,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1578760.0, ans=0.0 2023-12-24 07:50:34,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1578826.6666666667, ans=0.0 2023-12-24 07:50:44,532 INFO [train.py:886] (1/4) Epoch 50, batch 3300, loss[loss=0.0105, audio_tagging_loss=0.0105, over 25000.00 frames. ], tot_loss[loss=0.01047, audio_tagging_loss=0.01047, over 4948012.34 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:50:50,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1578893.3333333333, ans=0.05 2023-12-24 07:50:58,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1578960.0, ans=0.125 2023-12-24 07:50:59,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1578960.0, ans=0.1 2023-12-24 07:51:04,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1579026.6666666667, ans=0.0 2023-12-24 07:51:20,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1579093.3333333333, ans=0.1 2023-12-24 07:51:26,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1579160.0, ans=0.5 2023-12-24 07:51:28,160 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.91 vs. limit=15.0 2023-12-24 07:51:28,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1579160.0, ans=0.0 2023-12-24 07:51:36,592 INFO [train.py:886] (1/4) Epoch 50, batch 3350, loss[loss=0.01055, audio_tagging_loss=0.01055, over 25000.00 frames. ], tot_loss[loss=0.01047, audio_tagging_loss=0.01047, over 4947211.66 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:51:36,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1579226.6666666667, ans=0.125 2023-12-24 07:51:53,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1579293.3333333333, ans=0.125 2023-12-24 07:51:59,929 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.696e+01 4.063e+01 4.248e+01 4.414e+01 5.248e+01, threshold=8.495e+01, percent-clipped=0.0 2023-12-24 07:52:00,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1579360.0, ans=0.125 2023-12-24 07:52:01,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1579360.0, ans=0.2 2023-12-24 07:52:02,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1579360.0, ans=0.2 2023-12-24 07:52:20,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1579493.3333333333, ans=0.2 2023-12-24 07:52:27,571 INFO [train.py:886] (1/4) Epoch 50, batch 3400, loss[loss=0.012, audio_tagging_loss=0.012, over 25000.00 frames. ], tot_loss[loss=0.01054, audio_tagging_loss=0.01054, over 4950460.88 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:52:40,708 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.27 vs. limit=12.0 2023-12-24 07:52:51,945 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.84 vs. limit=15.0 2023-12-24 07:52:56,496 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.08 vs. limit=22.5 2023-12-24 07:52:58,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1579760.0, ans=0.0 2023-12-24 07:53:20,143 INFO [train.py:886] (1/4) Epoch 50, batch 3450, loss[loss=0.0123, audio_tagging_loss=0.0123, over 24750.00 frames. ], tot_loss[loss=0.01062, audio_tagging_loss=0.01062, over 4947348.49 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:53:37,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1579960.0, ans=0.1 2023-12-24 07:53:42,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1580026.6666666667, ans=0.0 2023-12-24 07:53:45,039 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.673e+01 4.041e+01 4.285e+01 4.464e+01 5.704e+01, threshold=8.570e+01, percent-clipped=0.0 2023-12-24 07:53:54,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1580093.3333333333, ans=0.0 2023-12-24 07:53:54,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1580093.3333333333, ans=0.0 2023-12-24 07:53:56,685 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.40 vs. limit=15.0 2023-12-24 07:54:13,396 INFO [train.py:886] (1/4) Epoch 50, batch 3500, loss[loss=0.01004, audio_tagging_loss=0.01004, over 25000.00 frames. ], tot_loss[loss=0.01067, audio_tagging_loss=0.01067, over 4947833.71 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:54:13,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1580226.6666666667, ans=0.125 2023-12-24 07:54:19,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1580226.6666666667, ans=0.0 2023-12-24 07:54:22,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1580293.3333333333, ans=0.1 2023-12-24 07:54:25,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1580293.3333333333, ans=0.125 2023-12-24 07:54:34,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1580360.0, ans=0.07 2023-12-24 07:55:04,407 INFO [train.py:886] (1/4) Epoch 50, batch 3550, loss[loss=0.009409, audio_tagging_loss=0.009409, over 24925.00 frames. ], tot_loss[loss=0.01059, audio_tagging_loss=0.01059, over 4952044.61 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:55:28,356 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.681e+01 4.036e+01 4.198e+01 4.391e+01 5.355e+01, threshold=8.396e+01, percent-clipped=0.0 2023-12-24 07:55:44,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1580760.0, ans=0.07 2023-12-24 07:55:47,109 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.02 vs. limit=15.0 2023-12-24 07:55:50,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1580826.6666666667, ans=0.125 2023-12-24 07:55:56,995 INFO [train.py:886] (1/4) Epoch 50, batch 3600, loss[loss=0.009164, audio_tagging_loss=0.009164, over 25000.00 frames. ], tot_loss[loss=0.01048, audio_tagging_loss=0.01048, over 4957196.01 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:55:59,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1580893.3333333333, ans=0.1 2023-12-24 07:56:01,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1580893.3333333333, ans=0.0 2023-12-24 07:56:06,830 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.95 vs. limit=6.0 2023-12-24 07:56:19,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1581026.6666666667, ans=0.125 2023-12-24 07:56:26,521 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.06 vs. limit=12.0 2023-12-24 07:56:28,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1581093.3333333333, ans=0.125 2023-12-24 07:56:30,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1581093.3333333333, ans=0.025 2023-12-24 07:56:48,314 INFO [train.py:886] (1/4) Epoch 50, batch 3650, loss[loss=0.01139, audio_tagging_loss=0.01139, over 25000.00 frames. ], tot_loss[loss=0.01048, audio_tagging_loss=0.01048, over 4961342.49 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:56:48,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1581226.6666666667, ans=0.2 2023-12-24 07:57:11,804 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.587e+01 3.982e+01 4.158e+01 4.360e+01 5.165e+01, threshold=8.317e+01, percent-clipped=0.0 2023-12-24 07:57:40,464 INFO [train.py:886] (1/4) Epoch 50, batch 3700, loss[loss=0.00974, audio_tagging_loss=0.00974, over 25000.00 frames. ], tot_loss[loss=0.01052, audio_tagging_loss=0.01052, over 4964338.02 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 16.0 2023-12-24 07:57:41,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1581560.0, ans=0.0 2023-12-24 07:57:44,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1581560.0, ans=0.0 2023-12-24 07:57:47,601 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.91 vs. limit=12.0 2023-12-24 07:57:49,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1581560.0, ans=0.125 2023-12-24 07:58:05,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1581693.3333333333, ans=0.0 2023-12-24 07:58:08,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1581693.3333333333, ans=0.125 2023-12-24 07:58:12,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1581760.0, ans=0.1 2023-12-24 07:58:22,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1581826.6666666667, ans=0.1 2023-12-24 07:58:27,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1581826.6666666667, ans=0.125 2023-12-24 07:58:30,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1581826.6666666667, ans=0.125 2023-12-24 07:58:33,655 INFO [train.py:886] (1/4) Epoch 50, batch 3750, loss[loss=0.01106, audio_tagging_loss=0.01106, over 24750.00 frames. ], tot_loss[loss=0.01055, audio_tagging_loss=0.01055, over 4963870.35 frames. ], batch size: 99, lr: 2.13e-03, grad_scale: 16.0 2023-12-24 07:58:33,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1581893.3333333333, ans=0.125 2023-12-24 07:58:37,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1581893.3333333333, ans=0.2 2023-12-24 07:58:57,810 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.781e+01 4.161e+01 4.351e+01 4.483e+01 8.860e+01, threshold=8.701e+01, percent-clipped=1.0 2023-12-24 07:59:01,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1582026.6666666667, ans=0.125 2023-12-24 07:59:08,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1582093.3333333333, ans=0.1 2023-12-24 07:59:10,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1582093.3333333333, ans=0.125 2023-12-24 07:59:24,595 INFO [train.py:886] (1/4) Epoch 50, batch 3800, loss[loss=0.01314, audio_tagging_loss=0.01314, over 24750.00 frames. ], tot_loss[loss=0.01067, audio_tagging_loss=0.01067, over 4956861.76 frames. ], batch size: 99, lr: 2.13e-03, grad_scale: 16.0 2023-12-24 07:59:26,901 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.38 vs. limit=22.5 2023-12-24 07:59:27,165 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=15.0 2023-12-24 07:59:28,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1582226.6666666667, ans=0.125 2023-12-24 07:59:31,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1582226.6666666667, ans=0.2 2023-12-24 07:59:45,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1582360.0, ans=0.07 2023-12-24 07:59:52,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1582360.0, ans=0.1 2023-12-24 07:59:59,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1582426.6666666667, ans=0.125 2023-12-24 08:00:14,928 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.77 vs. limit=15.0 2023-12-24 08:00:15,256 INFO [train.py:886] (1/4) Epoch 50, batch 3850, loss[loss=0.01138, audio_tagging_loss=0.01138, over 24750.00 frames. ], tot_loss[loss=0.01059, audio_tagging_loss=0.01059, over 4954313.16 frames. ], batch size: 99, lr: 2.13e-03, grad_scale: 16.0 2023-12-24 08:00:17,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1582560.0, ans=0.125 2023-12-24 08:00:17,323 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 08:00:26,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1582626.6666666667, ans=0.0 2023-12-24 08:00:37,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1582693.3333333333, ans=0.125 2023-12-24 08:00:40,108 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.778e+01 4.086e+01 4.269e+01 4.439e+01 5.542e+01, threshold=8.539e+01, percent-clipped=0.0 2023-12-24 08:01:02,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1582826.6666666667, ans=0.0 2023-12-24 08:01:06,018 INFO [train.py:886] (1/4) Epoch 50, batch 3900, loss[loss=0.01101, audio_tagging_loss=0.01101, over 25000.00 frames. ], tot_loss[loss=0.01052, audio_tagging_loss=0.01052, over 4951841.53 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 16.0 2023-12-24 08:01:07,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1582893.3333333333, ans=0.0 2023-12-24 08:01:22,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1582960.0, ans=0.0 2023-12-24 08:01:26,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1583026.6666666667, ans=0.125 2023-12-24 08:01:29,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1583026.6666666667, ans=0.0 2023-12-24 08:01:34,691 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.55 vs. limit=15.0 2023-12-24 08:01:56,766 INFO [train.py:886] (1/4) Epoch 50, batch 3950, loss[loss=0.00695, audio_tagging_loss=0.00695, over 24750.00 frames. ], tot_loss[loss=0.01045, audio_tagging_loss=0.01045, over 4947343.44 frames. ], batch size: 99, lr: 2.13e-03, grad_scale: 16.0 2023-12-24 08:02:02,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1583226.6666666667, ans=0.0 2023-12-24 08:02:15,573 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.12 vs. limit=10.0 2023-12-24 08:02:21,135 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.63 vs. limit=22.5 2023-12-24 08:02:22,396 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.534e+01 4.014e+01 4.237e+01 4.371e+01 9.981e+01, threshold=8.474e+01, percent-clipped=1.0 2023-12-24 08:02:23,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1583360.0, ans=0.125 2023-12-24 08:02:39,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1583493.3333333333, ans=0.125 2023-12-24 08:02:50,042 INFO [train.py:886] (1/4) Epoch 50, batch 4000, loss[loss=0.01175, audio_tagging_loss=0.01175, over 25000.00 frames. ], tot_loss[loss=0.01044, audio_tagging_loss=0.01044, over 4945532.57 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:02:53,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1583560.0, ans=0.1 2023-12-24 08:03:03,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1583626.6666666667, ans=0.1 2023-12-24 08:03:22,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1583760.0, ans=0.1 2023-12-24 08:03:26,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1583760.0, ans=0.125 2023-12-24 08:03:28,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1583760.0, ans=0.2 2023-12-24 08:03:32,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1583826.6666666667, ans=0.05 2023-12-24 08:03:40,187 INFO [train.py:886] (1/4) Epoch 50, batch 4050, loss[loss=0.01139, audio_tagging_loss=0.01139, over 24750.00 frames. ], tot_loss[loss=0.0105, audio_tagging_loss=0.0105, over 4952069.77 frames. ], batch size: 99, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:04:05,082 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.755e+01 4.142e+01 4.284e+01 4.513e+01 5.002e+01, threshold=8.568e+01, percent-clipped=0.0 2023-12-24 08:04:05,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1584026.6666666667, ans=0.0 2023-12-24 08:04:18,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1584093.3333333333, ans=0.125 2023-12-24 08:04:18,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1584093.3333333333, ans=0.125 2023-12-24 08:04:31,974 INFO [train.py:886] (1/4) Epoch 50, batch 4100, loss[loss=0.01042, audio_tagging_loss=0.01042, over 24750.00 frames. ], tot_loss[loss=0.01061, audio_tagging_loss=0.01061, over 4952694.29 frames. ], batch size: 99, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:04:32,393 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.70 vs. limit=22.5 2023-12-24 08:04:54,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1584360.0, ans=0.0 2023-12-24 08:04:56,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1584360.0, ans=0.0 2023-12-24 08:05:00,278 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.85 vs. limit=15.0 2023-12-24 08:05:04,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1584426.6666666667, ans=0.0 2023-12-24 08:05:05,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1584426.6666666667, ans=0.125 2023-12-24 08:05:24,707 INFO [train.py:886] (1/4) Epoch 50, batch 4150, loss[loss=0.01091, audio_tagging_loss=0.01091, over 25000.00 frames. ], tot_loss[loss=0.01053, audio_tagging_loss=0.01053, over 4954347.50 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:05:28,407 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 08:05:29,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1584560.0, ans=0.0 2023-12-24 08:05:35,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1584626.6666666667, ans=0.0 2023-12-24 08:05:36,226 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.54 vs. limit=22.5 2023-12-24 08:05:40,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1584626.6666666667, ans=0.07 2023-12-24 08:05:46,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1584693.3333333333, ans=0.1 2023-12-24 08:05:48,086 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.703e+01 4.057e+01 4.232e+01 4.457e+01 4.913e+01, threshold=8.465e+01, percent-clipped=0.0 2023-12-24 08:05:59,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1584760.0, ans=0.125 2023-12-24 08:06:02,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1584760.0, ans=0.125 2023-12-24 08:06:09,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1584826.6666666667, ans=0.05 2023-12-24 08:06:11,962 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 08:06:15,662 INFO [train.py:886] (1/4) Epoch 50, batch 4200, loss[loss=0.008804, audio_tagging_loss=0.008804, over 25000.00 frames. ], tot_loss[loss=0.01043, audio_tagging_loss=0.01043, over 4953494.10 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:06:25,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1584960.0, ans=0.015 2023-12-24 08:06:32,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1584960.0, ans=0.125 2023-12-24 08:06:36,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1585026.6666666667, ans=0.1 2023-12-24 08:06:43,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1585026.6666666667, ans=0.125 2023-12-24 08:06:45,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1585026.6666666667, ans=0.0 2023-12-24 08:06:59,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1585160.0, ans=0.0 2023-12-24 08:07:08,498 INFO [train.py:886] (1/4) Epoch 50, batch 4250, loss[loss=0.01099, audio_tagging_loss=0.01099, over 25000.00 frames. ], tot_loss[loss=0.01034, audio_tagging_loss=0.01034, over 4957208.04 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:07:12,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1585226.6666666667, ans=0.2 2023-12-24 08:07:28,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1585360.0, ans=0.1 2023-12-24 08:07:32,777 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.753e+01 4.073e+01 4.229e+01 4.382e+01 5.254e+01, threshold=8.458e+01, percent-clipped=0.0 2023-12-24 08:07:33,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1585360.0, ans=0.125 2023-12-24 08:07:33,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1585360.0, ans=0.2 2023-12-24 08:07:40,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1585426.6666666667, ans=0.125 2023-12-24 08:07:50,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1585493.3333333333, ans=0.125 2023-12-24 08:07:50,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1585493.3333333333, ans=0.1 2023-12-24 08:07:53,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.69 vs. limit=15.0 2023-12-24 08:07:58,901 INFO [train.py:886] (1/4) Epoch 50, batch 4300, loss[loss=0.01039, audio_tagging_loss=0.01039, over 25000.00 frames. ], tot_loss[loss=0.0103, audio_tagging_loss=0.0103, over 4962717.55 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:07:59,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1585560.0, ans=0.125 2023-12-24 08:08:08,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1585560.0, ans=0.0 2023-12-24 08:08:42,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1585826.6666666667, ans=0.125 2023-12-24 08:08:51,438 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.58 vs. limit=10.0 2023-12-24 08:08:52,020 INFO [train.py:886] (1/4) Epoch 50, batch 4350, loss[loss=0.01241, audio_tagging_loss=0.01241, over 24750.00 frames. ], tot_loss[loss=0.01042, audio_tagging_loss=0.01042, over 4963086.84 frames. ], batch size: 99, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:09:12,574 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.09 vs. limit=15.0 2023-12-24 08:09:16,479 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.00 vs. limit=6.0 2023-12-24 08:09:16,919 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.615e+01 4.119e+01 4.307e+01 4.459e+01 5.187e+01, threshold=8.614e+01, percent-clipped=0.0 2023-12-24 08:09:20,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1586026.6666666667, ans=0.0 2023-12-24 08:09:27,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1586093.3333333333, ans=0.125 2023-12-24 08:09:27,803 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.70 vs. limit=15.0 2023-12-24 08:09:33,361 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.66 vs. limit=15.0 2023-12-24 08:09:37,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1586160.0, ans=0.125 2023-12-24 08:09:44,419 INFO [train.py:886] (1/4) Epoch 50, batch 4400, loss[loss=0.01241, audio_tagging_loss=0.01241, over 24750.00 frames. ], tot_loss[loss=0.01051, audio_tagging_loss=0.01051, over 4954239.89 frames. ], batch size: 99, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:09:45,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1586226.6666666667, ans=0.04949747468305833 2023-12-24 08:10:13,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1586360.0, ans=0.125 2023-12-24 08:10:18,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1586426.6666666667, ans=0.125 2023-12-24 08:10:19,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1586426.6666666667, ans=0.1 2023-12-24 08:10:32,160 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 08:10:33,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1586493.3333333333, ans=0.125 2023-12-24 08:10:35,810 INFO [train.py:886] (1/4) Epoch 50, batch 4450, loss[loss=0.008715, audio_tagging_loss=0.008715, over 25000.00 frames. ], tot_loss[loss=0.01059, audio_tagging_loss=0.01059, over 4954113.28 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:10:36,387 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.56 vs. limit=22.5 2023-12-24 08:10:37,843 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 08:10:37,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1586560.0, ans=0.0 2023-12-24 08:10:48,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1586626.6666666667, ans=0.0 2023-12-24 08:11:00,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1586693.3333333333, ans=0.07 2023-12-24 08:11:01,845 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.798e+01 4.098e+01 4.310e+01 4.508e+01 5.882e+01, threshold=8.619e+01, percent-clipped=0.0 2023-12-24 08:11:10,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1586760.0, ans=0.125 2023-12-24 08:11:28,298 INFO [train.py:886] (1/4) Epoch 50, batch 4500, loss[loss=0.009296, audio_tagging_loss=0.009296, over 24750.00 frames. ], tot_loss[loss=0.0106, audio_tagging_loss=0.0106, over 4954644.07 frames. ], batch size: 99, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:11:40,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1586960.0, ans=0.1 2023-12-24 08:11:58,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1587093.3333333333, ans=0.125 2023-12-24 08:12:00,360 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.06 vs. limit=15.0 2023-12-24 08:12:11,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1587160.0, ans=0.2 2023-12-24 08:12:11,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1587160.0, ans=0.0 2023-12-24 08:12:20,254 INFO [train.py:886] (1/4) Epoch 50, batch 4550, loss[loss=0.009655, audio_tagging_loss=0.009655, over 24750.00 frames. ], tot_loss[loss=0.01055, audio_tagging_loss=0.01055, over 4955152.77 frames. ], batch size: 99, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:12:28,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1587226.6666666667, ans=0.2 2023-12-24 08:12:31,112 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.22 vs. limit=15.0 2023-12-24 08:12:32,129 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=15.0 2023-12-24 08:12:44,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.18 vs. limit=15.0 2023-12-24 08:12:44,700 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.783e+01 4.050e+01 4.235e+01 4.395e+01 5.112e+01, threshold=8.470e+01, percent-clipped=0.0 2023-12-24 08:12:51,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1587426.6666666667, ans=0.125 2023-12-24 08:12:55,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1587426.6666666667, ans=0.2 2023-12-24 08:13:04,344 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.42 vs. limit=15.0 2023-12-24 08:13:12,189 INFO [train.py:886] (1/4) Epoch 50, batch 4600, loss[loss=0.009266, audio_tagging_loss=0.009266, over 25000.00 frames. ], tot_loss[loss=0.01056, audio_tagging_loss=0.01056, over 4956704.82 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:13:14,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1587560.0, ans=0.125 2023-12-24 08:13:20,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1587626.6666666667, ans=0.0 2023-12-24 08:13:46,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1587760.0, ans=0.1 2023-12-24 08:13:57,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1587826.6666666667, ans=0.2 2023-12-24 08:14:01,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1587826.6666666667, ans=0.07 2023-12-24 08:14:04,479 INFO [train.py:886] (1/4) Epoch 50, batch 4650, loss[loss=0.01183, audio_tagging_loss=0.01183, over 24941.00 frames. ], tot_loss[loss=0.01048, audio_tagging_loss=0.01048, over 4960012.94 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:14:13,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1587960.0, ans=0.2 2023-12-24 08:14:13,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1587960.0, ans=0.0 2023-12-24 08:14:28,732 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.571e+01 4.073e+01 4.249e+01 4.503e+01 5.611e+01, threshold=8.499e+01, percent-clipped=0.0 2023-12-24 08:14:37,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1588093.3333333333, ans=0.0 2023-12-24 08:14:41,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1588093.3333333333, ans=0.125 2023-12-24 08:14:45,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1588160.0, ans=0.125 2023-12-24 08:14:54,263 INFO [train.py:886] (1/4) Epoch 50, batch 4700, loss[loss=0.01248, audio_tagging_loss=0.01248, over 24750.00 frames. ], tot_loss[loss=0.01055, audio_tagging_loss=0.01055, over 4956863.74 frames. ], batch size: 99, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:15:02,963 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.86 vs. limit=15.0 2023-12-24 08:15:09,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1588293.3333333333, ans=0.1 2023-12-24 08:15:21,644 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.24 vs. limit=15.0 2023-12-24 08:15:23,880 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.89 vs. limit=15.0 2023-12-24 08:15:42,037 INFO [train.py:886] (1/4) Epoch 50, batch 4750, loss[loss=0.009977, audio_tagging_loss=0.009977, over 24750.00 frames. ], tot_loss[loss=0.01068, audio_tagging_loss=0.01068, over 4949899.32 frames. ], batch size: 99, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:15:44,167 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 08:15:53,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1588626.6666666667, ans=0.5 2023-12-24 08:15:57,359 INFO [train.py:1099] (1/4) Done!